Thursday, 1 August 2019

What maths does a data scientist need to know?

Data science is an attractive career option for anyone looking to enter a profession that provides great career outlooks and which sits at the forefront of many technological innovations occurring today. There are many ways to gain the skills to become a data scientist from self-learning courses to more formal education like UNSW's Master of Data Science. Unfortunately, one issue which discourages many potential data scientists from pursuing a career in the field is uncertainty over whether they possess the necessary ability to learn the maths skills that the field requires.

A career in data science should not be ruled out by those without advanced maths skills. While data science is not a viable career for those who simply do not want to do maths, for everyone else, it remains an accessible option. The core mathematics skills that are necessary to enter the field can be effectively learned as part of a general data science education.

How much mathematics is actually required to do data science?

Many articles which discuss the maths skills that are required to become a data scientist provide readers with an extensive list of prerequisites—long enough to dishearten even those who are confident in their learning ability.

These lists are accurate representations of maths concepts that are used by data scientists; however, they are not actually prerequisites. A solid foundation in statistics and maths is necessary to begin a professional data science career, but expertise is not. This is why many top data scientists do not focus on maths as the core of their entry-level training programs. For instance, the influential statistician Hadley Wickham organises most of his introductory data science textbook around data analytics—when he does turn to maths, it is to explain how to understand one of the analytics tasks that make up the chief focus of his book.

maths skills required for data science

A data scientist’s focus is on “useful” maths

A data scientist’s core competency is their ability to analyse and interpret data. Most data scientists will at some point use a tool that leverages maths which they don’t understand—for instance, a deep learning algorithm—because they do understand how to interpret the results that the algorithm produces. The use of a statistical model to analyse data does not necessarily require complete knowledge of the maths the model relies upon, but it does require enough knowledge to know how to apply and interpret the model properly.

Whereas data scientists do not need to have a strong understanding of the maths that underlie deep learning algorithms, they do need to have a firm grip on core statistical techniques such as linear regression, logistic regression, and various population sampling methods. These techniques are used frequently in basic data analytics tasks, and entry-level data scientists should expect to develop an intuitive understanding of how they work.

The majority of entry-level data scientists and machine learning practitioners will spend most of their time conducting exploratory data analysis, doing basic predictive analysis, etc. The maths principles that underlie many tools used to perform these tasks do not need to be understood to use them properly, which is why many introductions to predictive analytics—such as An Introduction to Statistical Learning and Applied Predictive Modeling—contain very little information on linear algebra and calculus, despite being rooted in both subjects.

Predictive analytics can be performed without extensive knowledge of mathematics because predictive modelling tools do most of the maths involved on the data scientist’s behalf. The manner in which computing tools can reduce the need to learn certain maths concepts is expressed by the statistician Andrew Gelman in his well-regarded book on the use of regression techniques for data analysis:

“Most books define regression in terms of matrix operations. We avoid much of this matrix algebra for the simple reason that it is now done automatically by computers .... [the computations] are important but can be done out of sight of the user.”

The concept of ‘usefulness’ is key to the question of which maths skills are necessary to learn. This article has mentioned that entry-level data scientists must understand certain regression techniques as a core competency; however, Gelman’s quote underlines how the extent to which a data scientist must understand these techniques does not need to go beyond the point at which that knowledge stops being useful to their work.

When data scientists need to be maths experts

The level of maths knowledge that a data scientist must have depends on their role and is very high within certain professional niches. For instance, a sophisticated knowledge of mathematics is required for academic positions, bleeding-edge work in top tech companies, and many senior data science roles in general industry.

These positions require superior maths skills because they require individuals who are able to innovate within their field. In academia, individuals’ careers are progressed through the production of novel research—an academic data scientist specialising in biostatistics would obviously need a strong understanding of statistics to be able to contribute new insights to their profession.

Tech giants such as Google, Facebook, etc, also focus heavily on the creation of novel research. Research and development is a key part of how these companies seek to maintain a competitive edge. Initiatives from these firms, such as Google’s attempts to use machine learning to create a self-driving car, rely on data scientists being able to use complex maths to solve problems that no one else has been able to.

Data scientists entering general industry/business—where the majority of jobs in the field are—do not require the same level of maths skills as academics or high-tech workers. The role of data scientists working in industry is to create value for the business they work for, which usually involves applying pre-existing tools (e.g. regression algorithms) to conduct data analytics. Entry-level data scientists working in this field will primarily focus on tasks such as data preparation, cleaning, data visualisation, and exploratory data analysis—tasks which do not require high-level maths knowledge.

The maths skills required of data scientists working in general industry are higher for senior-level positions. Entry-level data scientists without advanced maths qualifications should expect to continue developing their maths skills as they progress through their career. Although expert maths skills are not necessary to work in industry, a strong foundation is necessary for a data scientist to show employers that they have the potential to advance within the company.

Exploring the data science maths skills gradient

The maths qualifications discussed in this article reflect the fact that in data science, the mathematics learning process typically takes place alongside—and as part of—the overall data science learning process. The University of New South Wales’ online data science program is designed to reflect the progressive learning process that this article has outlined.

UNSW’s Master of Data Science program may be entered by those with a formal maths background or by those who have progressed their maths skills through UNSW’s certificate and diploma programs. Their data science certificate course teaches the basic skills necessary to move on to the diploma program (e.g., statistical inference), which in-turn provides instruction in the concepts that are necessary to complete the master’s program (e.g., regression analysis).

By the time students move on to UNSW’s master’s program, they may have completed the full set of maths courses necessary for their degree; however, since different subfields require different levels of entry-level maths skills, students may also take additional courses in subjects such as Bayesian Inference and Multivariate Analysis. The progressive learning that students experience as they move from the certificate program to the master’s program reflects the type of ongoing learning that data scientists engage in throughout their careers.

Where to learn the maths skills necessary to become a data scientist

A professional data scientist’s work requires a strong understanding of core concepts in math and statistics. The University of New South Wales’ flexible online data science programs not only provide the ideal platform to gain the full range of math skills needed to succeed in the discipline, but it also provides a strong foundation for students to build off of as they engage in independent learning throughout their careers.

UNSW is internationally ranked as a Top 150 school in both mathematics and statistics. Whether you enter the Masters of Data Science program directly or move through the diploma and certificate programs on the way to the master’s program, your data science education will be delivered by internationally-recognised experts. Furthermore, UNSW courses are focused on teaching the skills that employers value, which is why UNSW is one of Australia’s top three universities for graduate employability.

To learn more about what UNSW’s data science programs have to offer, get in touch with our enrolment team at 1300 974 990.