Skilled data scientists share something in common. They can build product solutions… with data.
It is no longer good enough to be a data scientist who can solve math and statistics problems applied to Python, R or Julia programming.
Modern data scientists require a new mindset: design thinking.
The data science field is transforming in 2020 at the speed that software engineering changed in 2010.
Products, frameworks, and programming languages will fade out of popularity; design thinking is always relevant.
What is design thinking?
At Stanford University’s Design School, programs focus on a deep interest in developing an understanding of the people for whom we’re designing the products.
Data scientists and students know me for the Data Science Standards¹, a framework I created to launch data science products in businesses.
Here are the 5 Stages of Design Thinking with step-by-step actions and questions to guide you in your data science journey.
Step 1: Data Collection
Your ability to ask actionable questions to aggregate, browse, and collect data can mean the difference between a successful product and research that is never implemented.
Product success requires thorough data navigation skills and a checklist that focuses on a repeatable process.
Ask yourself these questions when collecting data:
-Where is my data stored?
-How large is the data size?
-What quantity and quality of data will I need to launch this product or service?
-Who manages the data that I need to access?
-When is the data updated?
-Why is this data relevant for my product?
Step 2: Data Refinement
Large quantities of data are good; high quality data is better. World class Kaggle Grandmasters win competitions and Data Scientists are promoted at work when they invest their time to refine data.
Products managers and software engineers do not take responsibility for data refinement, which requires skilled data scientists to make the difficult decisions on what makes reliable and responsible data.
Start with these questions when refining data:
-Who has insight into data dictionaries for data features?
-What data requires querying, feature engineering, and pre-processing? By what techniques?
-When will the required data be ready in a high quality/high quantity state to move to the next stage of the Data Science Workflow?
-Where will the refined data be stored?
-Why will data need to be refined?
-How will the refined data be tested and validated for consistent performance?
Step 3: Data Expansion
Even with the best data available for a data scientist, a problem may not be solvable. Frequently, more data can be the difference between a dead-end product or a product that leads the market with unique insights.
Successful products in 2020 require both data refinement and data expansion. Integrations with APIs, similar datasets, and alternative data gives data science teams the confidence to potentially discover important insights from data. Data expansion enables feature enrichment and extends the data science workflow success rate for products.
Apply these questions when expanding data:
-Who controls data access?
-What budget is available to acquire or generate more data?
-When do you stop expanding data or continue to iterate with machine learning?
-Where can you acquire high quality data sources?
-Why are more data features needed to improve your product or solution?
-How will you decide what data is most relevant to expand your data?
Step 4: Data Learning
Analytics and business intelligence test what data variables may be important; data learning runs models on features to predict insights for a product.
Data Learning considers how compute, storage, and machine learning frameworks can accelerate your workflow.
Ask yourself these questions during the Data Learning stage
-Who determines what benchmarks are needed for a successful model?
-What machine learning frameworks and algorithms will you choose for what you will predict?
-When do you decide that your modeling results are significant or ready for production?
-Where will you process data learning locally or on what cloud systems?
-Why does your feature request or product need machine learning?
-How much compute time and compute resources are available to model the data?
Step 5: Data Maintenance
Your machine learning has exceeded benchmarks and you have implemented your solution into production with your data engineer and software engineers.
But now what?
All machine learning and data reduces in quality over time. Skilled data scientists monitor their machine learning to verify results and they maintain quality in production.
Apply these questions to better monitor your data:
-Who is responsible for making changes to data models when performance changes?
-What triggers, pipelines or data jobs do you implement to monitor the quality of your data in production?
-When performance falls below required benchmarks, what data governance processes do you action?
-Where will you commit time in your schedule on a recurring basis to monitor your data pipeline for quality control?
-Why are your data modeling results reducing in quality in production?
-How do you communicate data modeling results to your product managers, data engineers, and software engineers and with what frequency?
In Summary:
For your current and next data science product features, think about all 5 Steps of Design Thinking in your Data Science workflow: (1) Data Cleaning, (2) Data Refinement, (3) Data Expansion, (4) Data Learning, and (5) Data Maintenance.
And remember — design thinking is an iterative process!
With Design Thinking applied to your data science workflow, you will be a better data scientist starting today.
If you are interested to explore Design Thinking with AI, checkout a course called the IBM AI Enterprise Workflow, now available on Coursera.
Works Cited:
¹ Data Science Standards
An audio version of this Medium article is available on Spotify and Apple Podcasts.