Big Data

Mathematics as The Start Point of a Data Scientist’s Journey

Mathematics is a vital subject in the field of data science since concepts within mathematics aid in identifying patterns and creating algorithms. The understanding of various statistics and probability theory concepts are key for the implementation of algorithms in data science. Machine Learning on the other hand focuses on computers having the ability to operate without being programmed to do so. These is made possible through use of mathematical concepts.

Understanding mathematics is an important step towards the exciting journey of becoming a professional data scientist. It is important for one to understand the mathematical equations and relations for them to be successful in the field. There are various topics in mathematics that are related to the various phases of the data science life cycle. In this article, we will discuss the two major areas in mathematics with the topics under them, that we consider helpful to a data scientist’s journey.

Chris Orwa, a Renown Data Scientist Training Predictive Analytics Lab Students on How Much Mathematics is Needed in Data Science.


The word “Colchicum” originated from Latin meaning a small limestone pebble, used in an abacus to calculate. Later on, Isaac Newton and Gottfried Wilhelm Leibniz independent of each other, developed the modern calculus in the 17th-century. Calculus is the branch of mathematics that deals with the finding and properties of derivatives and integrals of functions, by methods originally based on the summation of infinitesimal differences. Both Differential calculus and Integral calculus have many concepts under them. Below are some of these topics that are important to a data scientist.

Topic 1: Limit and continuity

Limit and Continuity is one of the most crucial things to understand in order to prepare for calculus. This concept is used to express the behavior of a function and to examine this behavior at points on the screen or at infinity. That is, Limits describe how a function behaves near a point, instead of at that point. This simple yet powerful idea is the basis of all of calculus as it is used for defining derivatives and the continuity concept. A function on the other hand can either be continuous or discontinuous and one easy way to test for the continuity of a function is to see whether the graph of a function can be traced with a pen without lifting the pen from the paper. To understand more on how  this powerful idea is useful in your data science journey, Visit

Topic 2: Integral and methods

In calculus, the integral of a function is used to generalize the calculation of the volume, area, mass of a function. Integral of a continuous positive function in the range (a, b) is all about finding the area between the lines x = a, x = b, and the negative bend F. So, the integral F between a and b is basically the area under the graph. Some integrals may be estimated using numerical integral methods, with the most common method being the rectangular method. Other popular methods used for estimating the integral value include the Trapezoidal method and the Simpson method. Despite the fact that numerical methods don’t give us the exact number of integrals, they sometimes help us when integrals are difficult to solve or are not functionally solvable.

Topic 3: Function

In Calculus, a function is a binary relation over two sets that associates to every element of the first set exactly one element of the second set. It is a generator that produces a unique output for each given input. Something worth noting, is that the reverse is NOT used to define a function meaning that a function can even generate the same output for several distinct inputs. In the mathematical expression, the function is a relation in which the first element is paired as input, while the second element is output. Typical examples are functions from integers to integers or from the real numbers to real numbers.

Timothy Oriedo a Data Scientist at Predictive Analytics Lab and Saikumar Allakar, Data Scientist at Equity Bank, When They Hosted Data Science Kaggle Community in July 2019

Topic 4: Derivative

The concept of derivative was developed by a French mathematician Pierre de Fermat who wanted to determine the extremes of certain functions. The derivative measures the steepness of the graph of a function at some particular point on the graph. Thus, the derivative is the ratio of change in the value of the function to change in the independent variable. If the independent variable happens to be ‘time’, we often think of this ratio as a rate of change e.g. velocity. If we get to the graph of the function at some point so that the function looks almost like a straight line, the derivative at that point is the slope of the line. This is the same as saying that the derivative is the slope of the tangent line to the graph of the function at the given point. The derivative is also, itself, a function as it varies from place to place. For example, the velocity of a car may change from moment to moment as the driver speeds up or slows down. To learn how Derivatives is  applied in data science, visit

Topic 5: Sequences and series

In mathematics, a sequence is a function of the range of natural numbers or a subset of natural numbers. These functions are broadly used in differential and integral calculus and other branches of mathematics. The different types of sequences include: real sequence, finite sequence, infinite sequence, boundary sequence, monotone sequence, arithmetic sequence, geometric sequence, convergence sequence, divergent sequence, Fibonacci sequence, Cauchy sequence, triangular number sequence. Sometimes, the name of the sequence changes. For instance, sequences are called arithmetic functions in analytic number theory. A series on the other hand, is a sum of a sequence of terms. It can also be defined as a list of numbers with addition operations between them. There are different types of series including finite series, geometric series, diverging series, power series, convergence series, , alternate series, telescopic series, infinite series, and series with positive sentences.

Topic 6: Hyperbolic functions

Hyperbolic functions occur in the solutions of many linear differential equations, of some cubic equations, in calculations of angles and distances in hyperbolic geometry, and of Laplace’s equation in Cartesian coordinates. They are analogs of the ordinary trigonometric functions defined for the hyperbola rather than on the circle: just as the points (cos t, sin t) form a circle with a unit radius, the points (cosh t, sinh t) form the right half of the equilateral hyperbola. Laplace’s equations are important in heat transfer, fluid dynamics, electromagnetic theory, and special relativity. Watch the below Video Recording of the Orientation Class for Cohort 11, that took place on 24/6/2020 at Predictive analytics Lab.

Topic 7: Vector functions

A vector is an element of vector space and has n-components in the n-dimensional space. Each vector has two components, which are the length of the vector and the direction of the vector. This makes it possible for them to be added together, subtracted, or multiplied. To add on that, each vector has a beginning and an end. Those that have a unit length are called unit vectors and a vector whose length is zero is called a zero or null vector. In mathematics, a unit vector in a normed vector space is a vector of length 1 and is often denoted by a lowercase letter with a circumflex. The term direction vector is used to describe a unit vector being used to represent spatial direction, and such quantities are commonly denoted as d. Vectors have many applications in several sciences like physics, and without them, many physical components such as acceleration cannot be defined and interpreted.

Discrete Mathematics

Discrete mathematics has grown in recent years due to the advancement of big data. This branch of mathematics, deals with objects that can assume only distinct, separated values. The term discrete mathematics is used in contrast with continuous mathematics, which deals with objects that can vary smoothly like calculus. Whereas discrete objects can often be characterized by integers, continuous objects require real numbers. To get various certified courses in your data science journey, visit Below are the most important topics in discrete mathematics.

Saikumar Allakar a prominence Data Scientist, Training Predictive Analytics Lab Students on Data Science Life Cycle at Intercontinental Hotel Nairobi.

Topic 1: Combinatorics

A Data Science student should start with an introduction to combinatorics, since its basics are critical in the field. It is an area of discrete mathematics that deals with counting, both as a means and an end in obtaining results, and certain properties of finite structures. Combinatorics is closely related to several other areas of mathematics with many applications in logic and statistical physics, and evolutionary biology.

Topic 2: Number theory

Number theory is the study of the set of positive whole numbers 1, 2, 3, 4, 5, 6, 7, . . ., frequently referred to as the set of natural numbers. It is one of the few branches of mathematics whose historical background dates back to ancient times. At a time when mankind may have had no knowledge of other sciences and had to use numbers to make a living, this branch of mathematics is known as the beginner of its problems. That is , although it is extremely difficult to solve many problems of number theory, their faces can be so simple that they even arouse the interest and curiosity of those who have not seen or disliked mathematics as a subject, and try to find it.

Topic 3: Mathematical logic and reasoning

Familiarizing oneself with the methods of reasoning and proof in mathematics, helps in understanding and developing the concept in a better way. The role of reasoning in human life is irrefutable as we all need to be empowered in our daily lives. The submission of the intellect to the reasoning gift has made it possible for people to interact with each other and develop various sciences which enables them to develop in various fields. Argumentation and proof similarly have a special place in mathematics, as it is not possible to understand mathematics without considering the argument, and this will limit mathematics instructions to memorizing algorithms. To get one on one executive coaching on mathematics and data science at large, visit

Topic 4: Graph theory

Graphs are mathematical structures that are used to study pairwise relationships between objects and entities.This branch of Discrete Mathematics has found multiple applications in Computer Science, Chemistry, Operations Research, etc. The Data Science and Analytics field has also used Graphs to model various structures and problems. As a Data Scientist, you should be able to solve problems in an efficient manner and Graphs provide a mechanism to do that in cases where the data is arranged in a specific way. One of the most well-known types of graphs in data science are the graphs of social networks.

Chris Orwa a Senior Data Scientist, Training Predictive Analytics Lab Participants on Mathematical Thinking in Data Science at Intercontinental Hotel Nairobi, in November 2019.

Topic 5: Set theory

The set theory is a common language among logical mathematicians around the world. Basically, it is the mathematical theory of well-determined collections, called sets, of objects that are called members, or elements, of the set. Pure set theory deals exclusively with sets, so the only sets under consideration are those whose members are also sets. Set theory is the foundation of modern mathematics as mathematical inference methods are regulated using a combination of logical reasoning and set theory. Besides, all the definitions of mathematical concepts are based on set theory. In order for you to progress in data science then, you must become familiar with the basic concepts and results of the set theory and the language in which they are expressed.

Final Thoughts

Mathematics is needed to see mathematical structures in the objects you work with as a data scientist, and understand their properties. Hence, it is not by chance that math puzzles are often used for job interviews in almost all professions. While some data science algorithms feel like fairy-tale at times, we can understand the fine points of many algorithms without needing much more than Calculus and Discrete Mathematics. Click here to get numerous resources that will help you improve your math’s skills.

Nevertheless, as a data analyst, if you don’t want to learn mathematics, you can rely on machine learning libraries like scikit-learn to do all of this for you. But it’s very helpful for a data scientist or analyst to have a solid understanding of the mathematics behind those algorithms so they can choose the best algorithm for their problems and datasets and thus make more accurate predictions. Therefore, if you want to thrive in the data science journey, then embrace the pain, and dive into the math. It’s not as tough as you think, and we’ve even got trainers to walk you through the journey at.


Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *


Adblock Detected

Please consider supporting us by disabling your ad blocker