## Machine Learning Prerequisites: Probability Theory

##### Why is probability imperative in understanding how machine learning algorithms function? What are some useful resources that will strengthen your understanding of the subject? Read on.

Probability is the branch of mathematics concerning numerical descriptions of how likely an event is to occur or how likely it is that a proposition is true.

Probability is a measure of the likelihood of an event to occur.

Such definitions of probability are ubiquitous on the net. In this article, we will not go into the details like what is probability? Or how it works? Rather, we will take a look at why probability is an essential element to understand machine learning. We will, further, dive into what we call - "A beginners guide to Probability."

**Correlation between Probability and Machine Learning**

If we take the analogy of human anatomy for machine learning, then, Linear Algebra and Calculus form the brain while Probability & Statistics is the heart. It is the bedrock of machine learning. One cannot develop a deep understanding and application of machine learning without it. But why?

Few points, listed below, explain how ML and Probability are directly linked-

*1. Classification based on probability*

Classification is a machine learning predictive modeling problems are those where an example is assigned a given label. E.g., we have a set of 5 features, and we have to decide if a given animal is a dog or a cat or a horse.

We can tackle this problem in two ways:

- Assign the animal to any one of the classes definitively based on the features
- Compute the probability of the animal belonging to each class.

In the second case, we classify based on the highest probability. So if the probabilities are 0.7, 0.2, and 0.1 respectively for the animal to be either a dog, a cat, or a horse, we classify the animal as a dog.

*2. Algorithms predominantly based on probability*

A few algorithms in Machine Learning are specifically designed to harness the tools and methods of probability. One such algorithm is Naive Bayes, constructed using Bayes theorem. The linear regression algorithm can be viewed as a probabilistic model that minimizes the MSE of the predictions. While the logistic regression algorithm can be considered a probabilistic model that reduces the negative log-likelihood of predicting the positive class label.

3. Training with probabilistic frameworks

There are many machine learning models trained using an iterative algorithm that is, at its core, has a probabilistic framework. Some examples include:

- Maximum Likelihood Estimation (Frequentist).
- Maximum a Posteriori Estimation (Bayesian).

4. Tuning the models with probabilistic frameworks

In machine learning, it is often necessary to optimize your model's hyperparameters, such as k for the kNN algorithm or the learning rate in a neural network. The typical approaches to this include grid searching ranges of hyperparameters or random sampling hyperparameter combinations.

5. Evaluating models with probabilistic measures

Most of the machine learning revolves around improving the performance of the model. There are many metrics used to summarize the performance of a model based on predicted probabilities. E.g., Log Loss (also called cross-entropy) or Brier Score. For a simple yes or no classification (called Binary classification), we have ROC curves and AUC, to name a few metrics.

6. Feature selection for models

As one goes deeper into machine learning, we understand that only fitting a given data to a model is not enough. We need proper feature selection to improve the performance of our models. But how do we decide which features are to be selected? Probability theory plays a vital role in answering some of these questions.**

**

**Where to begin?**

In the previous section, we have established that probability is an essential subject for building a base in machine learning. However, this leads to the next question, where do we begin learning probability theory?

There are numerous resources available, both online and offline, for probability theory. Below, we list down some of our favorite resources to kick start your probability journey.

**I. Online Courses**

1. Introduction to Probability - HarvardX: STAT110x on edX

"A comprehensive introduction to probability as a language and toolbox for understanding statistics, science, risk, and randomness" is what the course promises and precisely what it delivers. This course by Harvard has a storytelling-based approach to the probability theory that makes it super easy for the novice learner. Unlike other courses, this does not have a lecturer explaining each topic. Instead, it has short animated videos followed by extremely concise reads to get a complete grasp of a topic. Though this course is an excellent starter to probability, it is not explicitly taught to cater to the data science or ML enthusiasts. Nevertheless, if you are a beginner and want to learn the basics of Probability this is a great place to start.

2. Introduction to Probability - MIT OpenCourseWare

This is a no non-sense probability course you ought to take if you want a deep understanding of the subject. It dives right into the topic, starting with "Sample Spaces" in the first 2 minutes. Prof. John Tsitsiklis is the instructor for this course, and he explains each concept using what I think is the perfect proportion of speed and depth. This MIT course is way more broad and deep than your usual online courses. It has a lot of really great intuitive exercises with solutions. Be warned, this course isn't going to be easy, but it is worth the time and effort. At the end of this course, I assure you that you will be ready to take any domain based on probabilistic frameworks, including machine learning and data science.

3. Statistics and Probability - Khan Academy

The advantage of Khan Academy is that it is perfect for clearing your doubts. While the other resources and courses are amazing to learn probability, Khan Academy is helpful for you to improve your understanding of particular topics. The platform is easy to navigate through and offers high granularity in terms of content. With precise and clear explanations, this is a reliable platform for brief content consulting.

**II. Books and Blogs**

1. Probability! - Matt DosSantos DiSorbo, Professor Joe Blitzstein

The writing style of this book can be described as 'wordy.' It indeed reads quite differently than conventional textbooks, which rarely waste words. For those who are new to probability, an excessive explanation can often be a blessing - that is precisely what this book offers. It over-explains to ensure that everything is covered to the point of no doubt. In general, this book subverts many themes of conventional textbooks to produce a streamlined, engaging narrative for those who are new to the 'language' of rigorous probability. It is an excellent resource for beginners and a must-have weapon in your probability arsenal.

2. Machine Learning: A Probabilistic Perspective - Kevin P. Murphy

This book offers a comprehensive and self-contained introduction to machine learning, based on a unified, probabilistic approach. The level of coverage provided in this book combines breadth and depth. It offers all necessary background material on topics such as linear algebra, probability and optimization, and linear algebra. It provides with discussion of recent developments in the field. The style of writing is informal but accessible. It provides complete pseudo-code for important algorithms; this book is perfect for anyone who wants to jump right into machine learning and learn probability just enough to understand it.

Regardless of the name, this is not a strictly bayesian blog. The posts here explore a plethora of issues in Bayesian and frequentist statistics. It also deals with topics usually left for a rigorous study of probability with measure theory. This blog consists of extremely focused articles that give a deep insight into the question at hand. It is not a public content posting channel. Hence the posts are limited. Through real-life examples and datasets, this blog brings probability concepts closer to beginners and experts alike.

**III. Engaging and Intuitive Resources**

If you are looking for some relief after a text-heavy probability read, this is your go-to place. Seeing Theory is a project designed and created by Daniel Kunin with support from Brown University's Royce Fellowship Program. It lets you explore probability and statistics concepts interactively and visually, which help develop intuition. Through beautiful visualizations, this site provides a unique look at various notions of probability using well-chosen datasets. It is designed aptly for beginners.

2. Brilliant

Brilliant helps you see concepts visually, interact with them, and pose questions that get you to think. Though not very specific to probability, Brilliant builds quantitative skills in math, science, and computer science with fun and challenging interactive explorations. All the courses on this site are crafted by award-winning teachers, researchers, and professionals from MIT, Caltech, Duke, Microsoft, Google, and more. This site caters to people at all levels - beginner, intermediate, advanced, and expert. You won't find 20 courses on probability theory; instead, you'll find one or two restrained and pointed tracks investigating the subject in a fun and intuitive way. So, when in a mood to learn probability in a unique and quirky manner, head to Brilliant.

3. Data Skeptic

This last resource is a podcast. So if you dislike pouring over books or are looking to utilize your commute time, this is your answer. The Data Skeptic podcast does not deal with probability predominantly, but there is sufficient content related to it. This podcast is casual and funny, yet extremely informative. It has both long discussions and short ones. It blends theoretical concepts for refreshing or getting curious and digging deeper into each idea and real-life application of the science and technologies in business and research.

Like this article? Learn about other machine learning prerequisites - linear algebra, calculus, and Git and Anaconda.