The right questions every data scientist should ask at the beginning of a data science project

June 17th, 2021

If you’ve worked on a data science or analytics project before, you might be familiar with the saying that a data scientist is someone who is very successful at solving the wrong problem.

It’s meant in jest, of course. But when you’re working with multiple data sources and stakeholders, it’s easy to see how starting out even slightly off track – like by diving in before fully understanding the business requirements – can cause problems later.

As a data scientist, I want to start working with the data as soon as possible. And while I’m always excited to reach that stage, I also know how important it is to first understand the problem statement. It’s the first and most critical step in any data science project.

Get the problem statement right and you have clear and measurable objectives, a project that’s more likely to stay on track in terms of timelines and budget, and happier clients. Get it wrong and you risk misunderstanding the client’s needs, failing to deliver business value, wasting time on invalid work or faulty interpretation, and having to adjust your approach midway as the real business problem becomes clear.

We always say that data is only as good as the questions you ask. So, here are my tips for asking better questions to help you start your next data science project on the right foot.

Why asking the right questions matters

Clients usually contact us when they want to solve a problem. The issue with data science projects is that the exact problem might be unclear.

For example, the client might have a data set that they want to get more value from, or a data process they want to optimise. In both scenarios, the problem statement is too vague to address effectively.

Every successful data science project has a defined problem statement that details measurable goals and objectives the client wants to achieve. The role of the data scientist is to ask the right questions to understand the client’s problems and translate them into data science problems.

What to ask before beginning a data science project 

Unfortunately, there’s no standard questions template that will elicit all the information you need. The right questions to ask will vary depending on a business’ strategy, goals, budget, and target customers.

What doesn’t change, however, is the need to ensure that the questions you ask align with business needs and are understood by all stakeholders (i.e. not filled with technical jargon). They should also be answered by both business and technical staff, which is best facilitated in a cross-department discovery workshop.

Here are the questions I like to ask before beginning a data science project. I divide them into categories: business problems, possible risks, and technical problems.

Business problem

  • What is the opportunity?
  • What are the pain points?
  • What is driving the need for this project?
  • How will the results impact the business?
  • What assumptions do we need to make about the project/data etc.?
  • What are the scenarios in which this project will be helpful or valuable?
  • What analysis have you already completed?
  • What are the success criteria of this project (i.e. A/B testing, comparison to KPIs)?
  • What are the deliverables (i.e. data visualisation, reports, values, apps, frameworks)?
  • Who are the final users of the analysed results? What are their needs, technical skill levels, and domain knowledge?
  • What are the top priorities for completing the project if there are multiple goals?

Possible risks 

  • What are the cost limitations of this project?
  • Tell us about the risks associated with this project?
  • What privacy and security concerns should we be aware of about the data?

Technical problems

  • How will this project be resourced from a technical perspective (i.e. computer systems, platforms, tools, infrastructure limitations, data feed limitations)?
  • What architecture/platform is in use, or will be used?
  • What are the available data sources, and what is the data feed frequency?
  • Tell us about the data quality, format, volume and refresh frequency
  • Is there any data dictionary for the database?
  • Who will be the contact and support person/team if anything technical is needed?

Tips for success

  • Ask lots of questions: Your job is to get an accurate picture of the problem that needs to be solved, figure out what approach you can apply to it, and estimate what the ultimate result will look like. That might means asking more questions than you – or the client – think is necessary.
  • Get agreement from the client: Once you understand the problem statement, ensure the client is up to speed too. When you both agree on the output and delivery, and are aware of the cost of budgeted limitations, you’ll save time going back and forth and adjusting the goals as the project moves forward.
  • Ask specific questions that begin a dialogue: Avoid asking yes/no questions. Try to open a dialogue to get more context and information.
  • Turn the business problem (qualitative concepts) into a data science problem (quantitative measures): A successful data science project is one that can be measured. Be clear about how the client’s business problem can be solved using data.

Where to from here?

It’s true that the complex nature of data science projects makes them difficult to get right. Laying a strong foundation by articulating a clear and well-defined business problem is the first step towards completing a project that delivers tangible business value.

If you’d like to learn more about Antares and our approach to data science projects, get in touch. Our team is always ready to have a chat.

By Yang Li, Data Scientist, Antares Solutions