Index and Columns

import pandas as pd 

url = "https://raw.githubusercontent.com/fahadsultan/csc272/main/data/elections.csv"

elections = pd.read_csv(url)

.columns

The .columns attribute returns the column labels of the DataFrame. It is a pandas Index object.

elections.head()
Year Candidate Party Popular vote Result %
0 1824 Andrew Jackson Democratic-Republican 151271 loss 57.210122
1 1824 John Quincy Adams Democratic-Republican 113142 win 42.789878
2 1828 Andrew Jackson Democratic 642806 win 56.203927
3 1828 John Quincy Adams National Republican 500897 loss 43.796073
4 1832 Andrew Jackson Democratic 702735 win 54.574789
elections.columns
Index(['Year', 'Candidate', 'Party', 'Popular vote', 'Result', '%'], dtype='object')
type(elections.columns)
pandas.core.indexes.base.Index

.index

.index attribute returns the row labels of the DataFrame. It is also a pandas Index object.

elections.index
RangeIndex(start=0, stop=182, step=1)
type(elections.index)
pandas.core.indexes.range.RangeIndex

Converting column to index

The .set_index() method is used to set the DataFrame index using existing columns.

elections.set_index("Candidate")
Year Party Popular vote Result %
Candidate
Andrew Jackson 1824 Democratic-Republican 151271 loss 57.210122
John Quincy Adams 1824 Democratic-Republican 113142 win 42.789878
Andrew Jackson 1828 Democratic 642806 win 56.203927
John Quincy Adams 1828 National Republican 500897 loss 43.796073
Andrew Jackson 1832 Democratic 702735 win 54.574789
... ... ... ... ... ...
Jill Stein 2016 Green 1457226 loss 1.073699
Joseph Biden 2020 Democratic 81268924 win 51.311515
Donald Trump 2020 Republican 74216154 loss 46.858542
Jo Jorgensen 2020 Libertarian 1865724 loss 1.177979
Howard Hawkins 2020 Green 405035 loss 0.255731

182 rows × 5 columns

Converting Index to Column

The .reset_index() method is used to reset the index of a DataFrame. By default, the original index is stored in a new column called index.

elections.reset_index()
index Year Candidate Party Popular vote Result %
0 0 1824 Andrew Jackson Democratic-Republican 151271 loss 57.210122
1 1 1824 John Quincy Adams Democratic-Republican 113142 win 42.789878
2 2 1828 Andrew Jackson Democratic 642806 win 56.203927
3 3 1828 John Quincy Adams National Republican 500897 loss 43.796073
4 4 1832 Andrew Jackson Democratic 702735 win 54.574789
... ... ... ... ... ... ... ...
177 177 2016 Jill Stein Green 1457226 loss 1.073699
178 178 2020 Joseph Biden Democratic 81268924 win 51.311515
179 179 2020 Donald Trump Republican 74216154 loss 46.858542
180 180 2020 Jo Jorgensen Libertarian 1865724 loss 1.177979
181 181 2020 Howard Hawkins Green 405035 loss 0.255731

182 rows × 7 columns


Creating New Columns

Creating new columns in a DataFrame is a common task when working with data. In this notebook, we will see how to create new columns in a DataFrame based on existing columns or other values.

New columns can be created by assigning a value to a new column name. For example, to create a new column named new_column with a constant value 10, we can use the following code:

elections['constant'] = 10

elections.head()
Year Candidate Party Popular vote Result % constant
0 1824 Andrew Jackson Democratic-Republican 151271 loss 57.210122 10
1 1824 John Quincy Adams Democratic-Republican 113142 win 42.789878 10
2 1828 Andrew Jackson Democratic 642806 win 56.203927 10
3 1828 John Quincy Adams National Republican 500897 loss 43.796073 10
4 1832 Andrew Jackson Democratic 702735 win 54.574789 10


If we want to create a new column based on an existing column, we can refer to the existing column by its name, within the square brackets, on the right side of the assignment operator. For example, to create a new column named new_column with the values of the existing column column1, we can use the following code:


elections['total_voters'] = ((elections['Popular vote']* 100) / elections['%']).astype(int)

elections.head()
Year Candidate Party Popular vote Result % constant total_voters
0 1824 Andrew Jackson Democratic-Republican 151271 loss 57.210122 10 264413
1 1824 John Quincy Adams Democratic-Republican 113142 win 42.789878 10 264412
2 1828 Andrew Jackson Democratic 642806 win 56.203927 10 1143702
3 1828 John Quincy Adams National Republican 500897 loss 43.796073 10 1143703
4 1832 Andrew Jackson Democratic 702735 win 54.574789 10 1287655

Note that the new column will have the same length as the DataFrame and the calculations are element-wise. That is, the value of the new column at row i will be calculated based on the value of the existing column at row i.

These element-wise operations are vectorized and are very efficient.

Renaming Columns

The .rename() method is used to rename the columns or index labels of a DataFrame.

elections.rename(columns={"Candidate":"Name"})
Year Name Party Popular vote Result %
0 1824 Andrew Jackson Democratic-Republican 151271 loss 57.210122
1 1824 John Quincy Adams Democratic-Republican 113142 win 42.789878
2 1828 Andrew Jackson Democratic 642806 win 56.203927
3 1828 John Quincy Adams National Republican 500897 loss 43.796073
4 1832 Andrew Jackson Democratic 702735 win 54.574789
... ... ... ... ... ... ...
177 2016 Jill Stein Green 1457226 loss 1.073699
178 2020 Joseph Biden Democratic 81268924 win 51.311515
179 2020 Donald Trump Republican 74216154 loss 46.858542
180 2020 Jo Jorgensen Libertarian 1865724 loss 1.177979
181 2020 Howard Hawkins Green 405035 loss 0.255731

182 rows × 6 columns

Sorting Data by Column Value

Ordering a DataFrame can be useful for isolating extreme values. For example, the first 5 entries of a row sorted in descending order (that is, from highest to lowest) are the largest 5 values. .sort_values allows us to order a DataFrame or Series by a specified column. We can choose to either receive the rows in ascending order (default) or descending order.

# Sort the "Count" column from highest to lowest
elections.sort_values(by = "%", ascending=False).head()
Year Candidate Party Popular vote Result %
114 1964 Lyndon Johnson Democratic 43127041 win 61.344703
91 1936 Franklin Roosevelt Democratic 27752648 win 60.978107
120 1972 Richard Nixon Republican 47168710 win 60.907806
79 1920 Warren Harding Republican 16144093 win 60.574501
133 1984 Ronald Reagan Republican 54455472 win 59.023326

We do not need to explicitly specify the column used for sorting when calling .value_counts() on a Series. We can still specify the ordering paradigm – that is, whether values are sorted in ascending or descending order.

# Sort the "Name" Series alphabetically
elections["Candidate"].sort_values(ascending=True).head()
75     Aaron S. Watkins
27      Abraham Lincoln
23      Abraham Lincoln
108     Adlai Stevenson
105     Adlai Stevenson
Name: Candidate, dtype: object