It’s probably safe to assume that most people view data science as being a strictly scientific discipline—it’s right there in the name after all. Yet, many data scientists describe their discipline as being as much an art as a science. Why is that?
Data science is creative problem solving
There are very few disciplines that do not require some degree of creativity—the use of creativity is what enables professionals to overcome obstacles. What distinguishes data science from a non-artistic discipline is how it relies upon creativity to accomplish its core tasks; as there are multiple ways to analyse and model datasets, data scientists must frequently work within vaguely-defined parameters and decide for themselves how best to define their analytics projects.
Consider a situation in which a company asks a data scientist to discover what customer demographics are most valuable to the business. At first blush, this is a question with an objective answer; however, it actually leaves much to the discretion of the data scientist charged with answering it. For instance, what is the most valuable demographic:
● The one with the highest number of customers?
● The one that spends the most money in total?
● The one that spends the most money per-capita?
● Some combination of the above?
The term “demographic” must also be considered. “Men” is a demographic grouping, but so is “Women between the ages of 34 and 49 who live in Melbourne”. To decide how precise their demographics should be, a data scientist needs to decide which groupings will yield the most useful information while also producing statistically significant results.
To make these decisions, data scientists need to identify what parts of the data are valuable, a practice called data curation. The word “curation” is a giveaway that this process is driven by individual expertise—the relevance or value of data is at least partially determined by the specific context the project is taking place. Data scientists also need to decide how to design an algorithmic approach to their analysis, and that decision can be significantly shaped by their individual approach to working with code.
Because the notion of value and usefulness is context-dependent, data science projects do not end when a “correct” answer is found, they end when the data scientist is satisfied with their results. This is similar to the process of writing an article—while the contents of the article must be correct, its final form and content are primarily determined by the writer’s discretion.
Data science relies on an exploratory workflow
To further understand why data science is viewed as art, it’s important to see how the data science workflow differs from those used in pure sciences, such as chemistry. Whereas data science relies on an exploratory workflow, one that frequently begins without a clear initial hypothesis, pure sciences rely on a strict adherence to procedure to test predefined hypotheses. The quality of any given pure science experiment can only be validated through the repetition and/or analysis of the exact procedure used.
Data science’s exploratory workflow patterns do not require this strict adherence to procedure. Traditional scientific methods rely on testing whether a hypothesis is false; however, data science is typically concerned with analysing data to find insights that are true—two data scientists can use different approaches to analyse the same data, and both of their conclusions may be valid even if they differ significantly. This ability to produce multiple different-but-valid models of a single dataset illustrates how data science is more often concerned with discovering information that is useful rather than true.
With a focus on usefulness over procedure, many decisions involved in a data analysis are left to the discretion of the individual conducting it. Decisions such as the number of variables to be included in a model, or what modelling technique to use (e.g., linear regression? LOESS? logistic regression?), often have no objectively correct answer. Data scientists must decide for themselves which method will produce the most useful results.
But... it’s still a science
The things which make data science an art do not make it a non-science. Key scientific elements, such as a focus on empirical evidence and reproducibility, are all still important parts of the discipline—they are simply contextualised by the exploratory workflow, creative problem solving requirements, and undefined project endpoints which define an art.
Consider machine learning and its use in powering image search engines. The creation of an image search engine requires extensive use of complex mathematical concepts which must be correctly deployed for the engine to function properly. This need for mathematical rigour is undoubtedly a scientific need. At the same time, although different image search engines produce different results, the results produced by one engine may not necessarily be more or less “correct” than the results produced by another.
The above screenshots are the top results of a “related image” search for a single dog photo using Bing and Google’s search engines. Each engine returns different top results, but both return valid images of dogs, and when two models return results of relatively equal usefulness, those models can be considered to be of relatively equal quality. The results provided by these engines are reflections of the differing design priorities of the teams that created them.
The art of useful information
The principles that make data science an art concern the idea that the quality of a data science project stems from its usefulness, rather than from it being objectively correct. This paradigm is aptly described by the statistician George Box’s famous assertion that “no model is correct, but some are useful”. The effective use of data requires scientific principles to be applied creatively with the expert judgment of the data scientist using them.
Data science is a practical combination of both an art and science, making it an ideal career for anyone that wants to practice a discipline which blends technical skills with professional judgment and creativity. This blend of technical skills and creative intuition requires significant practical learning.
The University of New South Wales’ flexible and affordable online Master of Data Science programme offers the full range of instruction necessary to begin a career as a professional data scientist. The program is designed to get students career-ready, and is taught by experts with real-world experience working in industry. Students learn more than just technical skills; they also receive the guidance necessary to learn how to use those technical skills to maximise the impact and usefulness of their work.
If you’re looking to enter a career in data science but also want to maintain your current job while you learn, UNSW’s online Master of Data Science offers an ideal path forward. Get in touch with our enrolment team on 1300 974 990.