The median is the middle value in a data set when the values are ordered from smallest to largest. If there is an even number of values, the median is the average of the two middle values.
The variance is a measure of how spread out the values in a data set are. It is calculated by taking the average of the squared differences between each value and the mean.
where \(x_i\) is the \(i\)-th value in the data set, $ $ is the mean of the data set, \(n\) is the number of values, and $ $ is the standard deviation of the data set.
Similarly, we can use the .max() and .min() methods to compute the maximum and minimum values of a Series or DataFrame.
elections['%'].max(), elections['%'].min()
(61.34470329, 0.098088334)
The .sum() method computes the sum of all the values in a Series or DataFrame.
The .describe() method computes summary statistics for a Series or DataFrame. It computes the mean, standard deviation, minimum, maximum, and the quantiles of the data.
elections['%'].describe()
count 182.000000
mean 27.470350
std 22.968034
min 0.098088
25% 1.219996
50% 37.677893
75% 48.354977
max 61.344703
Name: %, dtype: float64
elections.describe()
Year
Popular vote
%
count
182.000000
1.820000e+02
182.000000
mean
1934.087912
1.235364e+07
27.470350
std
57.048908
1.907715e+07
22.968034
min
1824.000000
1.007150e+05
0.098088
25%
1889.000000
3.876395e+05
1.219996
50%
1936.000000
1.709375e+06
37.677893
75%
1988.000000
1.897775e+07
48.354977
max
2020.000000
8.126892e+07
61.344703
.describe()
If many statistics are required from a DataFrame (minimum value, maximum value, mean value, etc.), then .describe() can be used to compute all of them at once.
elections.describe()
Year
Popular vote
%
count
182.000000
1.820000e+02
182.000000
mean
1934.087912
1.235364e+07
27.470350
std
57.048908
1.907715e+07
22.968034
min
1824.000000
1.007150e+05
0.098088
25%
1889.000000
3.876395e+05
1.219996
50%
1936.000000
1.709375e+06
37.677893
75%
1988.000000
1.897775e+07
48.354977
max
2020.000000
8.126892e+07
61.344703
A different set of statistics will be reported if .describe() is called on a Series.
count 182
mean 12353635
std 19077149
min 100715
25% 387639
50% 1709375
75% 18977751
max 81268924
Name: Popular vote, dtype: int64
x = elections['%']y = elections['Popular vote']cov =sum((x-x.mean()) * (y-y.mean())) /len(x)round(cov)
243614836
x.cov(y), round(x.cov(y, ddof=0))
(244960774.11030602, 243614836)
Percentile and Quantile
Percentiles and quantiles are measures of position in a data set. They divide the data set into equal parts.
Percentile
A percentile is a value below which a given percentage of the data falls. For example, the 25th percentile is the value below which 25% of the data falls.
Quantile
A quantile is a value below which a given fraction of the data falls. For example, the 0.25 quantile is the value below which 25% of the data falls.
The .quantile() method can be used to compute the quantiles of a Series or DataFrame.
elections.quantile(0.25)
Year 1889.000000
Popular vote 387639.500000
% 1.219996
Name: 0.25, dtype: float64