{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Probability \n", "\n", "Probability allows us to talk about uncertainty, in certain terms. Once, we are able to quantify uncertainties, we can deterministically make deductions about the future. The language of statistics also allows us to talk about uncertainty in uncertain but tractable terms that we can reason about." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Random Variable" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "A **random variable** is a mathematical formalization of an abstract quantity that has some degree of uncertainty associated with the values it may take on. \n", "The set of all possible values that a random variable can take on is called its **range**. \n", "\n", "A random variable is very much similar to a variable in computer programming. **In the context of `pandas`, a random variable is a column or feature in a `DataFrame`.**\n", "\n", "Just as numerical features in a `DataFrame` can be either discrete or continuous, random variables can also be either discrete or continuous. The two types require different mathematical formalizations as we will see later.\n", "\n", "Random variables are usually denoted by capital letters, such as $X$ or $Y$. The values that a random variable can take on are denoted by lower case letters, such as $x$ or $y$.\n", "\n", "It is important to note that in the real world, it is _often impossible to obtain the range_ of a random variable. Since most real-world datasets are **samples**, **`df['X'].unique()` does not necessarily give us the range of $X$**.\n", "\n", "It is also important to remember that **$x$ is a single value** but **$X$ is a collection of values** (i.e. `pd.Series`). " ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "In the example below, $C$ (coin) and $D$ (dice) are two random variables. " ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CD
0T1
1T3
2T3
3H2
4T1
\n", "
" ], "text/plain": [ " C D\n", "0 T 1\n", "1 T 3\n", "2 T 3\n", "3 H 2\n", "4 T 1" ] }, "execution_count": 68, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd \n", "\n", "data = pd.read_csv('../data/experiment.csv')\n", "data.head()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "The ranges of $C$ and $D$ are $\\{H, T\\}$ and $\\{1, 2, 3, 4, 5, 6\\}$ respectively. It is worth repeating for emphasis that the ranges of the two variables is independent of observed data, since the observed data is a limited sample." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Experiment, Outcome $\\omega$ and Sample Space $\\Omega$\n", "\n", "An **outcome**, denoted by $\\omega$, is the set of values that one or more random variables take on as a result of an **experiment**.\n", "\n", "An **experiment** is a process that yields outcomes out of set of all possible outcomes. \n", "\n", "The **sample space**, denoted by $\\Omega$, is the set of all possible outcomes. \n", "\n", "The important operative word here is _\"possible\"_. The sample space is _not_ the set of all _observed_ outcomes, the set of all possible outcomes.\n", "\n", "If an experiment involves two random variables say $X$ and $Y$ which _can_ take on $n$ possible values (i.e. $~\\text{range}_X = \\{x_1, x_2, \\ldots, x_n\\})$ and $m$ possible values (i.e. $~\\text{range}_Y = \\{y_1, y_2, \\ldots, y_m\\}$) respectively, then the sample space $\\Omega$ is the set of all possible combinations of $x_i$ and $y_j$ and is of size $n \\times m$. \n", "\n", "
\n", "
\n", "\n", "|**$\\omega_i$** | **$X$** | **$Y$** | \n", "|:----:|:----:|:----:|\n", "|$\\omega_1$ | $x_1$ | $y_1$ | \n", "|$\\omega_2$ | $x_1$ | $y_2$ | \n", "|: | : | : | \n", "| $\\omega_{m}$ | $x_1$ | $y_m$ |\n", "| $\\omega_{m+1}$ | $x_2$ | $y_1$ |\n", "| $\\omega_{m+2}$ | $x_2$ | $y_2$ |\n", "|: | : | : | \n", "| $\\omega_{n \\times m}$ | $x_n$ | $y_m$ |\n", "\n", "
\n", "
\n", "\n", "In other words, the sample space is the **cross product of the ranges of all random variables** involved in the experiment.\n", "\n", "In our example, the experiment is the act of tossing a coin and rolling a dice. \n", "\n", "Each row in the data is an outcome $w_i$ from the set of all possible outcomes $\\Omega$. \n", "\n", "$C$ variable can take on two ($n=2$) values: $\\{H, T\\}$ and $D$ variable can take on six $m=6$ value: $\\{1, 2, 3, 4, 5, 6\\}$. This means that the sample space $\\Omega$ is of size $n \\times m = 2 \\times 6 = 12$.\n", "\n", "However, the observed outcomes are only 11, as shown below. " ] }, { "cell_type": "code", "execution_count": 73, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CD
0H1
1H2
2H3
3H4
4H5
5H6
6T1
7T2
8T3
9T4
10T5
\n", "
" ], "text/plain": [ " C D\n", "0 H 1\n", "1 H 2\n", "2 H 3\n", "3 H 4\n", "4 H 5\n", "5 H 6\n", "6 T 1\n", "7 T 2\n", "8 T 3\n", "9 T 4\n", "10 T 5" ] }, "execution_count": 73, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.groupby(['C', 'D']).count().reset_index()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "This means that the sample space $\\Omega$ is not the set of all observed outcomes. This is despite the fact that many observed outcomes are observed more than once. The missing outcome, that is never observed, is $w_{12} = (T, 6)$." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Probability Model $P(X)$" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Probability model is a function that assigns a probability score $P(\\omega_i)$ to each possible outcome $\\omega_i$ for every $\\omega_i \\in \\Omega$ such that \n", "\n", "\n", "\n", "$$ 0 \\lt P(\\omega_i) \\lt 1 ~~~\\text{and}~~~ \\sum_{\\omega \\in \\Omega} P(\\omega_i) = 1$$\n", "\n", "For example, if we have a random variable $D$ for rolling a die, the probability model assigns a probability to each number that we can roll. The probability model is usually denoted by $P(\\omega_i)$ or $P(D=d)$\n", "\n", "\n", "$\\omega$ | $D$ | $P(D=d)$ |\n", ":-------:|:----:|:-----:|\n", "$\\omega_1$ | $1$ | $P(D=1)$ |\n", "$\\omega_2$ | $2$ | $P(D=2)$ |\n", "$\\omega_3$ | $3$ | $P(D=3)$ |\n", "$\\omega_4$ | $4$ | $P(D=4)$ |\n", "$\\omega_5$ | $5$ | $P(D=5)$ |\n", "$\\omega_6$ | $6$ | $P(D=6)$ |\n", "\n", "such that $0 \\leq P(D=d) \\leq 1$ and and $\\sum_{d \\in D} P(d=D) = 1$." ] }, { "cell_type": "code", "execution_count": 80, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DP(D)
010.166667
120.166667
230.166667
340.166667
450.166667
560.166667
\n", "
" ], "text/plain": [ " D P(D)\n", "0 1 0.166667\n", "1 2 0.166667\n", "2 3 0.166667\n", "3 4 0.166667\n", "4 5 0.166667\n", "5 6 0.166667" ] }, "execution_count": 80, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fair_die = pd.read_csv('../data/fair_die.csv')\n", "fair_die" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "The code cell above shows the probability model for the random variable $D$ for a fair die in our examples, where each number has a probability of $\\frac{1}{6}$." ] }, { "cell_type": "code", "execution_count": 350, "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "import seaborn as sns \n", "from matplotlib import pyplot as plt\n", "\n", "axs = sns.catplot(data=fair_die, kind='bar', x=\"D\", y=\"P(D)\", color=\"lightblue\");\n", "axs.set(title=\"Probability distribution of rolling a fair die\");" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "``` {admonition} A word of caution on mathematical notation and dimensionality: \n", "\n", "Uppercase letters ($X, Y ...$) often refer to a random variable. Lowercase letters ($x, y ...$) often refer to a particular outcome of a random variable.\n", "\n", "The following refer to a probability value (`int`, `float` etc.):\n", "* $P(X = x)$ \n", " * also written in shorthand as $P(x)$\n", "* $P(X = x ∧ Y = y)$ \n", " * also written in shorthand as $P(x, y)$\n", "\n", "The following refer to a collection of values (`pd.Series`, `pd.DataFrame` etc.):\n", "\n", "* $P(X)$\n", "* $P(X ∧ Y)$\n", " * also written as P(X, Y)\n", "* $P(X = x, Y)$\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Probability of an Event $P(\\phi)$\n", "\n", "An event $\\phi$ is a set of possible worlds $\\{\\omega_i, \\omega_j, ... \\omega_n\\}$. In other words, an event $\\phi$ is a subset of $\\Omega$ i.e. $\\phi \\subset \\Omega$\n", "\n", "If we continue with the example of rolling a die, we can define an event $\\phi$ as the set of all possible worlds where the die rolls an even number. From the table above, we can see that there are three possible worlds where the die rolls an even number. \n", "\n", "Therefore, the event $\\phi$ is the set $\\{\\omega_2, \\omega_4, \\omega_6\\}$ or $\\{D=2, D=4, D=6\\}$.\n", "\n", "\n", "\n", "$P (\\phi) = \\sum_{\\omega \\in \\phi} P(\\omega)$ is the sum of probabilities of the set of possible worlds defining $\\phi$\n", "\n", "$P (\\phi_1) = P(\\text{Die rolls an even number}) = P(\\omega_2) +P(\\omega_4) + P(\\omega_6) = 0.167 + 0.167 + 0.167 \\approx 0.5 $\n" ] }, { "cell_type": "code", "execution_count": 97, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.5" ] }, "execution_count": 97, "metadata": {}, "output_type": "execute_result" } ], "source": [ "event_condition = fair_die['D'].apply(lambda x: x % 2 == 0)\n", "\n", "event = fair_die[event_condition]\n", "\n", "P_event = event['P(D)'].sum()\n", "\n", "round(P_event, 2)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Joint Probability $P(A, B)$" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Joint probability is the probability of two events occurring together.The joint probability is usually denoted by $P(A, B)$, which is shorthand for $P(A \\wedge B)$ read as _Probability of $A$ AND $B$._\n", "\n", "Note that $P(A, B) = P(B, A)$ since $A \\wedge B = B \\wedge A$.\n", "\n", "For example, if we are rolling two dice, the joint probability is the probability of rolling a 1 on the first die and a 2 on the second die. \n", "\n", "In Data Science, we rarely know the true joint probability. Instead, we estimate the joint probability from data. We will talk more about this when we talk about Statistics. \n" ] }, { "cell_type": "code", "execution_count": 355, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CDP(C, D)
0H10.24
1H20.13
2H30.09
3H40.01
4H50.03
5H60.01
6T10.19
7T20.09
8T30.13
9T40.04
10T50.00
11T60.04
\n", "
" ], "text/plain": [ " C D P(C, D)\n", "0 H 1 0.24\n", "1 H 2 0.13\n", "2 H 3 0.09\n", "3 H 4 0.01\n", "4 H 5 0.03\n", "5 H 6 0.01\n", "6 T 1 0.19\n", "7 T 2 0.09\n", "8 T 3 0.13\n", "9 T 4 0.04\n", "10 T 5 0.00\n", "11 T 6 0.04" ] }, "execution_count": 355, "metadata": {}, "output_type": "execute_result" } ], "source": [ "joint_probs = pd.read_csv('../data/experiment_probs.csv')\n", "joint_probs" ] }, { "cell_type": "code", "execution_count": 267, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1.0" ] }, "execution_count": 267, "metadata": {}, "output_type": "execute_result" } ], "source": [ "joint_probs['P(C, D)'].sum()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Note that sum of joint probabilities is 1 i.e. $\\sum P(C, D) = 1$ at the end of the day, since the sum of all probabilities is 1. \n", "\n", "The following three are all true at the same time: \n", "\n", "1. $\\sum_{C, D} P(C, D) = 1$ where $P(C, D)$ is a probability table with 12 rows and 3 columns: $C, D, P(C, D)$.\n", "\n", "\n", "2. $\\sum_{C} P(C) = 1$ where $P(C)$ is a probability table with 2 rows (${H, T}$) and 2 columns: $C, P(C)$.\n", "\n", "\n", "3. $\\sum_{D} P(D) = 1$ where $P(D)$ is a probability table with 6 rows (${1, 2, 3, 4, 5, 6}$) and 2 columns: $D, P(D)$.\n" ] }, { "cell_type": "code", "execution_count": 268, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CDP(C, D)
0H10.24
1H20.13
2H30.09
3H40.01
4H50.03
5H60.01
6T10.19
7T20.09
8T30.13
9T40.04
10T50.00
11T60.04
\n", "
" ], "text/plain": [ " C D P(C, D)\n", "0 H 1 0.24\n", "1 H 2 0.13\n", "2 H 3 0.09\n", "3 H 4 0.01\n", "4 H 5 0.03\n", "5 H 6 0.01\n", "6 T 1 0.19\n", "7 T 2 0.09\n", "8 T 3 0.13\n", "9 T 4 0.04\n", "10 T 5 0.00\n", "11 T 6 0.04" ] }, "execution_count": 268, "metadata": {}, "output_type": "execute_result" } ], "source": [ "joint_probs" ] }, { "cell_type": "code", "execution_count": 377, "metadata": {}, "outputs": [], "source": [ "joint_probs[\"CD_vals\"] = joint_probs.apply(lambda x: \"C=%s and D=%s\" % (x['C'], x['D']), axis=1)" ] }, { "cell_type": "code", "execution_count": 378, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CDP(C, D)CD_vals
0H10.24C=H and D=1
1H20.13C=H and D=2
2H30.09C=H and D=3
3H40.01C=H and D=4
4H50.03C=H and D=5
5H60.01C=H and D=6
6T10.19C=T and D=1
7T20.09C=T and D=2
8T30.13C=T and D=3
9T40.04C=T and D=4
10T50.00C=T and D=5
11T60.04C=T and D=6
\n", "
" ], "text/plain": [ " C D P(C, D) CD_vals\n", "0 H 1 0.24 C=H and D=1\n", "1 H 2 0.13 C=H and D=2\n", "2 H 3 0.09 C=H and D=3\n", "3 H 4 0.01 C=H and D=4\n", "4 H 5 0.03 C=H and D=5\n", "5 H 6 0.01 C=H and D=6\n", "6 T 1 0.19 C=T and D=1\n", "7 T 2 0.09 C=T and D=2\n", "8 T 3 0.13 C=T and D=3\n", "9 T 4 0.04 C=T and D=4\n", "10 T 5 0.00 C=T and D=5\n", "11 T 6 0.04 C=T and D=6" ] }, "execution_count": 378, "metadata": {}, "output_type": "execute_result" } ], "source": [ "joint_probs" ] }, { "cell_type": "code", "execution_count": 351, "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "axs = sns.barplot(data=joint_probs, x=\"CD_vals\", y=\"P(C, D)\", color=\"lightblue\");\n", "axs.set(title=\"Joint probability distribution of C and D\\n Note that the joint probabilities sum to 1\", \\\n", " xlabel=\"C and D values\", \\\n", " ylabel=\"P(C, D)\");\n", "plt.xticks(rotation=90);" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "## Marginal Probability $P(A)$\n", "\n", "Because most data sets are multi-dimensional i.e. involving multiple random variables, we can sometimes find ourselves in a situation where we want to know the joint probability $P(A, B)$ of two random variables $A$ and $B$ but we don't know $P(A)$ or $P(B)$. In such cases, we compute the **marginal probability** of one variable from joint probability over multiple random variables. \n", "\n", "Marginalizing is the process of summing over one or more variables (say B) to get the probability of another variable (say A). This summing takes place over the joint probability table.\n", "\n", "$$ P(A) = \\sum_{b \\in \\Omega_B} P(A, B=b) $$\n" ] }, { "cell_type": "code", "execution_count": 212, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "C\n", "H 0.51\n", "T 0.49\n", "Name: P(C), dtype: float64" ] }, "execution_count": 212, "metadata": {}, "output_type": "execute_result" } ], "source": [ "P_C = joint_probs.groupby('C').sum()['P(C, D)']\n", "P_C.name = 'P(C)'\n", "P_D = joint_probs.groupby('D').sum()['P(C, D)']\n", "P_D.name = 'P(D)'\n", "P_C" ] }, { "cell_type": "code", "execution_count": 224, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(D\n", " 1 0.43\n", " 2 0.22\n", " 3 0.22\n", " 4 0.05\n", " 5 0.03\n", " 6 0.05\n", " Name: P(D), dtype: float64,\n", " 1.0)" ] }, "execution_count": 224, "metadata": {}, "output_type": "execute_result" } ], "source": [ "P_D, P_D.sum()" ] }, { "cell_type": "code", "execution_count": 376, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CDP(C, D)
0H10.24
1H20.13
2H30.09
3H40.01
4H50.03
5H60.01
6T10.19
7T20.09
8T30.13
9T40.04
10T50.00
11T60.04
\n", "
" ], "text/plain": [ " C D P(C, D)\n", "0 H 1 0.24\n", "1 H 2 0.13\n", "2 H 3 0.09\n", "3 H 4 0.01\n", "4 H 5 0.03\n", "5 H 6 0.01\n", "6 T 1 0.19\n", "7 T 2 0.09\n", "8 T 3 0.13\n", "9 T 4 0.04\n", "10 T 5 0.00\n", "11 T 6 0.04" ] }, "execution_count": 376, "metadata": {}, "output_type": "execute_result" } ], "source": [ "joint_probs" ] }, { "cell_type": "code", "execution_count": 368, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CDP(C, D)
0H10.24
1H20.13
2H30.09
3H40.01
4H50.03
5H60.01
6T10.19
7T20.09
8T30.13
9T40.04
10T50.00
11T60.04
\n", "
" ], "text/plain": [ " C D P(C, D)\n", "0 H 1 0.24\n", "1 H 2 0.13\n", "2 H 3 0.09\n", "3 H 4 0.01\n", "4 H 5 0.03\n", "5 H 6 0.01\n", "6 T 1 0.19\n", "7 T 2 0.09\n", "8 T 3 0.13\n", "9 T 4 0.04\n", "10 T 5 0.00\n", "11 T 6 0.04" ] }, "execution_count": 368, "metadata": {}, "output_type": "execute_result" } ], "source": [ "joint_probs" ] }, { "cell_type": "code", "execution_count": 403, "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "fig, axs = plt.subplots(1, 3, figsize=(15, 5), sharey=True)\n", "\n", "axs[0].bar(joint_probs[\"CD_vals\"], joint_probs[\"P(C, D)\"], color=\"lightblue\");\n", "axs[1].bar(P_C.index, P_C);\n", "axs[2].bar(P_D.index, P_D, color=\"navy\");\n", "\n", "axs[0].tick_params('x', labelrotation=90)\n", "\n", "axs[0].set_title(\"P(C, D)\");\n", "axs[1].set_title(\"P(C)\");\n", "axs[2].set_title(\"P(D)\");\n", "\n", "fig.suptitle(\"P(C) and P(D) (center and right) marginalized from joint probability P(C, D) left\");" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "As we look at new concepts in probability, it is important to stay mindful of i) what the probability sums to ii) what are the dimensions of the table that represents the probability.\n", "\n", "You can see from the cell below that the dimensions of marginal probability table is the length of the range of the variable.\n", "\n", "You can see from the code below that both the computed marginal probabilities in add up to 1. " ] }, { "cell_type": "code", "execution_count": 230, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(1.0, 1.0)" ] }, "execution_count": 230, "metadata": {}, "output_type": "execute_result" } ], "source": [ "P_C.sum().round(3), P_D.sum().round(3)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Independent Random Variables\n", "\n", "Random variables can be either independent or dependent. If two random variables are independent, then the value of one random variable does not affect the value of the other random variable. \n", "\n", "For example, if we are rolling two dice, we can use two random variables to represent the numbers that we roll. The two random variables are independent because the value of one die does not affect the value of the other die. If two random variables are dependent, then the value of one random variable does affect the value of the other random variable. For example, if we are measuring the temperature and the humidity, we can use two random variables to represent the temperature and the humidity. The two random variables are dependent because the temperature affects the humidity and the humidity affects the temperature.\n", "\n", "More formally, **two random variables $X$ and $Y$ are independent if and only if $P(X, Y) = P(X) \\cdot P(Y)$**." ] }, { "cell_type": "code", "execution_count": 214, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CDP(C, D)P(C)P(D)
0H10.240.510.43
1T10.190.490.43
2H20.130.510.22
3T20.090.490.22
4H30.090.510.22
5T30.130.490.22
6H40.010.510.05
7T40.040.490.05
8H50.030.510.03
9T50.000.490.03
10H60.010.510.05
11T60.040.490.05
\n", "
" ], "text/plain": [ " C D P(C, D) P(C) P(D)\n", "0 H 1 0.24 0.51 0.43\n", "1 T 1 0.19 0.49 0.43\n", "2 H 2 0.13 0.51 0.22\n", "3 T 2 0.09 0.49 0.22\n", "4 H 3 0.09 0.51 0.22\n", "5 T 3 0.13 0.49 0.22\n", "6 H 4 0.01 0.51 0.05\n", "7 T 4 0.04 0.49 0.05\n", "8 H 5 0.03 0.51 0.03\n", "9 T 5 0.00 0.49 0.03\n", "10 H 6 0.01 0.51 0.05\n", "11 T 6 0.04 0.49 0.05" ] }, "execution_count": 214, "metadata": {}, "output_type": "execute_result" } ], "source": [ "P_C.name = \"P(C)\"\n", "P_D.name = \"P(D)\"\n", "merged = pd.merge(joint_probs, P_C, on='C')\n", "merged = pd.merge(merged, P_D, on='D')\n", "merged" ] }, { "cell_type": "code", "execution_count": 217, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CDP(C, D)P(C)P(D)P(C) P(D)
0H10.240.510.430.2193
1T10.190.490.430.2107
2H20.130.510.220.1122
3T20.090.490.220.1078
4H30.090.510.220.1122
5T30.130.490.220.1078
6H40.010.510.050.0255
7T40.040.490.050.0245
8H50.030.510.030.0153
9T50.000.490.030.0147
10H60.010.510.050.0255
11T60.040.490.050.0245
\n", "
" ], "text/plain": [ " C D P(C, D) P(C) P(D) P(C) P(D)\n", "0 H 1 0.24 0.51 0.43 0.2193\n", "1 T 1 0.19 0.49 0.43 0.2107\n", "2 H 2 0.13 0.51 0.22 0.1122\n", "3 T 2 0.09 0.49 0.22 0.1078\n", "4 H 3 0.09 0.51 0.22 0.1122\n", "5 T 3 0.13 0.49 0.22 0.1078\n", "6 H 4 0.01 0.51 0.05 0.0255\n", "7 T 4 0.04 0.49 0.05 0.0245\n", "8 H 5 0.03 0.51 0.03 0.0153\n", "9 T 5 0.00 0.49 0.03 0.0147\n", "10 H 6 0.01 0.51 0.05 0.0255\n", "11 T 6 0.04 0.49 0.05 0.0245" ] }, "execution_count": 217, "metadata": {}, "output_type": "execute_result" } ], "source": [ "merged['P(C) P(D)'] = merged['P(C)'] * merged['P(D)']\n", "merged" ] }, { "cell_type": "code", "execution_count": 218, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
P(C, D)P(C) P(D)
00.240.22
10.190.21
20.130.11
30.090.11
40.090.11
50.130.11
60.010.03
70.040.02
80.030.02
90.000.01
100.010.03
110.040.02
\n", "
" ], "text/plain": [ " P(C, D) P(C) P(D)\n", "0 0.24 0.22\n", "1 0.19 0.21\n", "2 0.13 0.11\n", "3 0.09 0.11\n", "4 0.09 0.11\n", "5 0.13 0.11\n", "6 0.01 0.03\n", "7 0.04 0.02\n", "8 0.03 0.02\n", "9 0.00 0.01\n", "10 0.01 0.03\n", "11 0.04 0.02" ] }, "execution_count": 218, "metadata": {}, "output_type": "execute_result" } ], "source": [ "merged[['P(C, D)', 'P(C) P(D)']].round(2)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "The two random variables $C$ and $D$ therefore are NOT independent because $P(C, D) \\neq P(C) \\cdot P(D)$." ] }, { "cell_type": "code", "execution_count": 408, "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "sns.barplot(data=merged, x=\"CD_vals\", y=\"P(C, D)\", color=\"navy\", alpha=0.5, label=\"$P(C, D)$\");\n", "sns.barplot(data=merged, x=\"CD_vals\", y=\"P(C) P(D)\", color=\"orange\", alpha=0.5, label=\"$P(C)\\cdot P(D)$\");\n", "plt.xticks(rotation=90);\n", "plt.ylabel(\"Probability\");\n", "plt.legend();\n", "plt.title(\"$C$ and $D$ are NOT independent since $P(C, D) \\\\neq P(C) \\cdot P(D)$\");" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Conditional Probability $P(A | B)$\n", "\n", "Conditional probability is the probability of one event occurring given that another event has occurred. \n", "\n", "The conditional probability is usually denoted by $P(A | B)$ and is defined as:\n", "\n", "$$ P(A | B) = \\frac{P(A, B)}{P(B)} $$\n", "\n", "The denominator is the marginal probability of $B$.\n", "\n", "\n", "\n", "
\n", "\n", "For example, if we are flipping two coins, the conditional probability of flipping heads in the second toss, knowing the first toss was tails is: \n", "\n", "| Possible world | $\\text{Coin}_1$ | $\\text{Coin}_2$ | $P(\\omega)$ |\n", "|:----------------:|:-------------:|:-------------:|:-------------:|\n", "| $\\omega_1$ | H | H | 0.25 |\n", "| $\\omega_2$ | H | T | 0.25 |\n", "| $\\omega_3$ | T | H | 0.25 |\n", "| $\\omega_4$ | T | T | 0.25 |\n", "\n", "$$ P(\\text{Coin}_2 = H | \\text{Coin}_1 = T) = \\frac{P(\\text{Coin}_2 = H, \\text{Coin}_1 = T)}{P(\\text{Coin}_1 = T)} = \\frac{0.25}{0.5} = 0.5 $$" ] }, { "cell_type": "code", "execution_count": 219, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CDP(C, D)P(C)P(D)P(C) P(D)
0H10.240.510.430.2193
1T10.190.490.430.2107
2H20.130.510.220.1122
3T20.090.490.220.1078
4H30.090.510.220.1122
5T30.130.490.220.1078
6H40.010.510.050.0255
7T40.040.490.050.0245
8H50.030.510.030.0153
9T50.000.490.030.0147
10H60.010.510.050.0255
11T60.040.490.050.0245
\n", "
" ], "text/plain": [ " C D P(C, D) P(C) P(D) P(C) P(D)\n", "0 H 1 0.24 0.51 0.43 0.2193\n", "1 T 1 0.19 0.49 0.43 0.2107\n", "2 H 2 0.13 0.51 0.22 0.1122\n", "3 T 2 0.09 0.49 0.22 0.1078\n", "4 H 3 0.09 0.51 0.22 0.1122\n", "5 T 3 0.13 0.49 0.22 0.1078\n", "6 H 4 0.01 0.51 0.05 0.0255\n", "7 T 4 0.04 0.49 0.05 0.0245\n", "8 H 5 0.03 0.51 0.03 0.0153\n", "9 T 5 0.00 0.49 0.03 0.0147\n", "10 H 6 0.01 0.51 0.05 0.0255\n", "11 T 6 0.04 0.49 0.05 0.0245" ] }, "execution_count": 219, "metadata": {}, "output_type": "execute_result" } ], "source": [ "merged" ] }, { "cell_type": "code", "execution_count": 308, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 0.470588\n", "1 0.387755\n", "2 0.254902\n", "3 0.183673\n", "4 0.176471\n", "5 0.265306\n", "6 0.019608\n", "7 0.081633\n", "8 0.058824\n", "9 0.000000\n", "10 0.019608\n", "11 0.081633\n", "Name: P(D | C), dtype: float64" ] }, "execution_count": 308, "metadata": {}, "output_type": "execute_result" } ], "source": [ "merged['P(D | C)'] = merged['P(C, D)'] / merged['P(C)']\n", "merged['P(D | C)']" ] }, { "cell_type": "code", "execution_count": 312, "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "merged[['C', 'D', 'P(D | C)']]\n", "\n", "axs = sns.catplot(data=merged, x=\"D\", y=\"P(D | C)\", hue=\"C\", kind=\"bar\");\n", "axs.set(title=\"Conditional probability distribution of D given C\\nNote that the blue bars add up to 1\");" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Note that the sum of conditional probabilites, unlike joint probability, is not 1. " ] }, { "cell_type": "code", "execution_count": 410, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2.0" ] }, "execution_count": 410, "metadata": {}, "output_type": "execute_result" } ], "source": [ "merged[\"P(D | C)\"].sum()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "This is because \n", "\n", "$$ \\sum_C \\sum_D P(D|C) = \\sum_D P(D|C=\\text{Heads}) + \\sum_D P(D|C=\\text{Tails}) $$\n", "\n", "And $\\sum_D P(D|C=\\text{Heads})$ and $\\sum_D P(D|C=\\text{Tails})$ are individually probability distributions that each sum to 1, over different values of $D$. \n", "\n", "In other words, in the plot above, the blue bars add up to 1 and the orange bars add up to 1. " ] }, { "cell_type": "code", "execution_count": 415, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(1.0, 1.0)" ] }, "execution_count": 415, "metadata": {}, "output_type": "execute_result" } ], "source": [ "heads = merged[merged[\"C\"] == \"H\"]\n", "tails = merged[merged[\"C\"] == \"T\"]\n", "\n", "heads[\"P(D | C)\"].sum(), tails[\"P(D | C)\"].sum()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Product Rule $P(A, B)$\n", "\n", "Rearranging the definition of conditional probability, we get the product rule:\n", "\n", "$$ P(A, B) = P(A | B) \\cdot P(B) $$\n", "\n", "Similarly, we can also write:\n", "\n", "$$ P(A, B) = P(B | A) \\cdot P(A)$$\n", "\n", "In summary, \n", "\n", "$$ P(A, B) = P(A | B) \\cdot P(B) = P(B | A) \\cdot P(A)$$\n", "\n", "## Chain Rule $P(A, B, C)$\n", "\n", "The chain rule is a generalization of the product rule to more than two events.\n", "\n", "$ P(A, B, C) = P(A | B, C) \\cdot P(B, C) $\n", "\n", "$P(A, B, C) = P(A | B, C) \\cdot P(B | C) \\cdot P(C)$\n", "\n", "since $P(B, C) = P(B | C) \\cdot P(C)$ as per the product rule.\n", "\n", "**Chain rule essentially allows expressing the joint probability of multiple random variables as a product of conditional probabilities.** This is useful because conditional probabilities are often easier to estimate from data than joint probabilities.\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Inclusion-Exclusion Principle $P(A \\vee B)$\n", "\n", "Inclusion-Exclusion Principle is a way of calculating the probability of two events occurring i.e. $ P(A=a ~\\text{OR}~ B=b) $ denoted generally as $P(A = a \\vee B = b)$.\n", "\n", "It is defined as:\n", "\n", "$$ P(A = a \\vee B = b) = P(A = a) + P(B = b) - P(A = a \\wedge B = b) $$\n", "\n", "\n", "\n", "For example, if we are rolling two dice, the Inclusion-Exclusion Principle can be used to calculate the probability of rolling a 1 on the first die or a 2 on the second die.\n", "\n", "$P(\\text{Coin}_1=H \\vee \\text{Coin}_2=T) $\n", "\n", "$ = P(\\text{Coin}_2=H) + P(\\text{Coin}_1=T) - P(\\text{Coin}_2=H ∧ \\text{Coin}_1=T)$\n", "\n", "$ = 0.5 + 0.5 - 0.25 $\n", "\n", "$ = 0.75$ \n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Bayes Theorem $P(A|B)$" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Bayes theorem is a way of calculating conditional probability. For example, if we are rolling two dice, Bayes theorem can be used to calculate the probability of rolling a 1 on the first die given that we rolled a 2 on the second die.\n", "\n", "$$ P(A | B) = \\frac{P(B | A) \\cdot P(A)}{P(B)} $$\n", "\n", "$P(A|B)$ in the context of Bayes theorem is called the **Posterior** probability. \n", "\n", "$P(B|A)$ is called the **Likelihood**. \n", "\n", "$P(A)$ is called the **Prior** probability. \n", "\n", "$P(B)$ is called the **Evidence**, also known as _Marginal Likelihood_." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "$$ P(\\text{Posterior}) = \\frac{P(\\text{Likelihood})\\cdot P(\\text{Prior})}{P(\\text{Evidence})}$$\n", "\n", "\n", "\n", "
" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Bayes Theorem allows a formal method of updating prior beliefs with new evidence and is the foundation of Bayesian Statistics. We will talk more about this when we talk about Statistics. " ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "In machine learning, the task is often to find $P(Y | X_1 = x_1, X_2 = x_2, \\ldots X_D = x_D)$ i.e. the probability of an unknown Y, given some values for $D$ features ($X_1, X_2 \\ldots X_D$). Bayes theorem allows us to calculate this probability from the data. \n", "\n", "Let's assume we are interested in predicting if a person is a football player ($Y_F=1$) or not ($Y_F=0$), given their height ($X_H$) and weight ($X_W$).\n", "\n", "Say, we observe a person who is 7 feet tall and weighs 200 pounds. We can use Bayes theorem to calculate the probability of this person being a football player using the following equation:\n", "\n", "$P(Y | X_H = 7, X_W = 200) = \\frac{P(X_H = 7, X_W = 200 | Y_F) \\cdot P(Y_F)}{P(X_H = 7, X_W = 200)}$" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Note that here $P(X_H = 7, X_W = 200 | Y_F)$ is the **Likelihood** probability of observing someone who is 7 feet tall and weighs 200 pounds, knowing if they are a football player. \n", "\n", "$P(Y_F)$ is the **Prior** probability of a person being a football player out of the entire population. \n", "\n", "$P(X_H = 7, X_W = 200)$ is the probability of the **Evidence** i.e. probability of observing _anyone_ who is 7 feet tall and weighs 200 pounds in the entire population." ] } ], "metadata": { "kernelspec": { "display_name": "base", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.12" }, "orig_nbformat": 4 }, "nbformat": 4, "nbformat_minor": 2 }