import pandas as pd
= "https://raw.githubusercontent.com/fahadsultan/csc272/main/data/elections.csv"
url
= pd.read_csv(url) elections
Index and Columns
.columns
The .columns
attribute returns the column labels of the DataFrame. It is a pandas Index object.
elections.head()
Year | Candidate | Party | Popular vote | Result | % | |
---|---|---|---|---|---|---|
0 | 1824 | Andrew Jackson | Democratic-Republican | 151271 | loss | 57.210122 |
1 | 1824 | John Quincy Adams | Democratic-Republican | 113142 | win | 42.789878 |
2 | 1828 | Andrew Jackson | Democratic | 642806 | win | 56.203927 |
3 | 1828 | John Quincy Adams | National Republican | 500897 | loss | 43.796073 |
4 | 1832 | Andrew Jackson | Democratic | 702735 | win | 54.574789 |
elections.columns
Index(['Year', 'Candidate', 'Party', 'Popular vote', 'Result', '%'], dtype='object')
type(elections.columns)
pandas.core.indexes.base.Index
.index
.index
attribute returns the row labels of the DataFrame. It is also a pandas Index object.
elections.index
RangeIndex(start=0, stop=182, step=1)
type(elections.index)
pandas.core.indexes.range.RangeIndex
Converting column to index
The .set_index()
method is used to set the DataFrame index using existing columns.
"Candidate") elections.set_index(
Year | Party | Popular vote | Result | % | |
---|---|---|---|---|---|
Candidate | |||||
Andrew Jackson | 1824 | Democratic-Republican | 151271 | loss | 57.210122 |
John Quincy Adams | 1824 | Democratic-Republican | 113142 | win | 42.789878 |
Andrew Jackson | 1828 | Democratic | 642806 | win | 56.203927 |
John Quincy Adams | 1828 | National Republican | 500897 | loss | 43.796073 |
Andrew Jackson | 1832 | Democratic | 702735 | win | 54.574789 |
... | ... | ... | ... | ... | ... |
Jill Stein | 2016 | Green | 1457226 | loss | 1.073699 |
Joseph Biden | 2020 | Democratic | 81268924 | win | 51.311515 |
Donald Trump | 2020 | Republican | 74216154 | loss | 46.858542 |
Jo Jorgensen | 2020 | Libertarian | 1865724 | loss | 1.177979 |
Howard Hawkins | 2020 | Green | 405035 | loss | 0.255731 |
182 rows × 5 columns
Converting Index to Column
The .reset_index()
method is used to reset the index of a DataFrame. By default, the original index is stored in a new column called index
.
elections.reset_index()
index | Year | Candidate | Party | Popular vote | Result | % | |
---|---|---|---|---|---|---|---|
0 | 0 | 1824 | Andrew Jackson | Democratic-Republican | 151271 | loss | 57.210122 |
1 | 1 | 1824 | John Quincy Adams | Democratic-Republican | 113142 | win | 42.789878 |
2 | 2 | 1828 | Andrew Jackson | Democratic | 642806 | win | 56.203927 |
3 | 3 | 1828 | John Quincy Adams | National Republican | 500897 | loss | 43.796073 |
4 | 4 | 1832 | Andrew Jackson | Democratic | 702735 | win | 54.574789 |
... | ... | ... | ... | ... | ... | ... | ... |
177 | 177 | 2016 | Jill Stein | Green | 1457226 | loss | 1.073699 |
178 | 178 | 2020 | Joseph Biden | Democratic | 81268924 | win | 51.311515 |
179 | 179 | 2020 | Donald Trump | Republican | 74216154 | loss | 46.858542 |
180 | 180 | 2020 | Jo Jorgensen | Libertarian | 1865724 | loss | 1.177979 |
181 | 181 | 2020 | Howard Hawkins | Green | 405035 | loss | 0.255731 |
182 rows × 7 columns
Creating New Columns
Creating new columns in a DataFrame is a common task when working with data. In this notebook, we will see how to create new columns in a DataFrame based on existing columns or other values.
New columns can be created by assigning a value to a new column name. For example, to create a new column named new_column
with a constant value 10
, we can use the following code:
'constant'] = 10
elections[
elections.head()
Year | Candidate | Party | Popular vote | Result | % | constant | |
---|---|---|---|---|---|---|---|
0 | 1824 | Andrew Jackson | Democratic-Republican | 151271 | loss | 57.210122 | 10 |
1 | 1824 | John Quincy Adams | Democratic-Republican | 113142 | win | 42.789878 | 10 |
2 | 1828 | Andrew Jackson | Democratic | 642806 | win | 56.203927 | 10 |
3 | 1828 | John Quincy Adams | National Republican | 500897 | loss | 43.796073 | 10 |
4 | 1832 | Andrew Jackson | Democratic | 702735 | win | 54.574789 | 10 |
If we want to create a new column based on an existing column, we can refer to the existing column by its name, within the square brackets, on the right side of the assignment operator. For example, to create a new column named new_column
with the values of the existing column column1
, we can use the following code:
'total_voters'] = ((elections['Popular vote']* 100) / elections['%']).astype(int)
elections[
elections.head()
Year | Candidate | Party | Popular vote | Result | % | constant | total_voters | |
---|---|---|---|---|---|---|---|---|
0 | 1824 | Andrew Jackson | Democratic-Republican | 151271 | loss | 57.210122 | 10 | 264413 |
1 | 1824 | John Quincy Adams | Democratic-Republican | 113142 | win | 42.789878 | 10 | 264412 |
2 | 1828 | Andrew Jackson | Democratic | 642806 | win | 56.203927 | 10 | 1143702 |
3 | 1828 | John Quincy Adams | National Republican | 500897 | loss | 43.796073 | 10 | 1143703 |
4 | 1832 | Andrew Jackson | Democratic | 702735 | win | 54.574789 | 10 | 1287655 |
Note that the new column will have the same length as the DataFrame and the calculations are element-wise. That is, the value of the new column at row i
will be calculated based on the value of the existing column at row i
.
These element-wise operations are vectorized and are very efficient.
Renaming Columns
The .rename()
method is used to rename the columns or index labels of a DataFrame.
={"Candidate":"Name"}) elections.rename(columns
Year | Name | Party | Popular vote | Result | % | |
---|---|---|---|---|---|---|
0 | 1824 | Andrew Jackson | Democratic-Republican | 151271 | loss | 57.210122 |
1 | 1824 | John Quincy Adams | Democratic-Republican | 113142 | win | 42.789878 |
2 | 1828 | Andrew Jackson | Democratic | 642806 | win | 56.203927 |
3 | 1828 | John Quincy Adams | National Republican | 500897 | loss | 43.796073 |
4 | 1832 | Andrew Jackson | Democratic | 702735 | win | 54.574789 |
... | ... | ... | ... | ... | ... | ... |
177 | 2016 | Jill Stein | Green | 1457226 | loss | 1.073699 |
178 | 2020 | Joseph Biden | Democratic | 81268924 | win | 51.311515 |
179 | 2020 | Donald Trump | Republican | 74216154 | loss | 46.858542 |
180 | 2020 | Jo Jorgensen | Libertarian | 1865724 | loss | 1.177979 |
181 | 2020 | Howard Hawkins | Green | 405035 | loss | 0.255731 |
182 rows × 6 columns
Sorting Data by Column Value
Ordering a DataFrame
can be useful for isolating extreme values. For example, the first 5 entries of a row sorted in descending order (that is, from highest to lowest) are the largest 5 values. .sort_values
allows us to order a DataFrame
or Series
by a specified column. We can choose to either receive the rows in ascending
order (default) or descending
order.
# Sort the "Count" column from highest to lowest
= "%", ascending=False).head() elections.sort_values(by
Year | Candidate | Party | Popular vote | Result | % | |
---|---|---|---|---|---|---|
114 | 1964 | Lyndon Johnson | Democratic | 43127041 | win | 61.344703 |
91 | 1936 | Franklin Roosevelt | Democratic | 27752648 | win | 60.978107 |
120 | 1972 | Richard Nixon | Republican | 47168710 | win | 60.907806 |
79 | 1920 | Warren Harding | Republican | 16144093 | win | 60.574501 |
133 | 1984 | Ronald Reagan | Republican | 54455472 | win | 59.023326 |
We do not need to explicitly specify the column used for sorting when calling .value_counts()
on a Series
. We can still specify the ordering paradigm – that is, whether values are sorted in ascending or descending order.
# Sort the "Name" Series alphabetically
"Candidate"].sort_values(ascending=True).head() elections[
75 Aaron S. Watkins
27 Abraham Lincoln
23 Abraham Lincoln
108 Adlai Stevenson
105 Adlai Stevenson
Name: Candidate, dtype: object