Selection: subset of columns

To select a column in a DataFrame, we can use the bracket notation. That is, name of the DataFrame followed by the column name in square brackets: df['column_name'].

For example, to select a column named Candidate from the election DataFrame, we can use the following code:

import pandas as pd 

url = "https://raw.githubusercontent.com/fahadsultan/csc272/main/data/elections.csv"

elections = pd.read_csv(url)
candidates = elections['Candidate']
print(candidates)
0         Andrew Jackson
1      John Quincy Adams
2         Andrew Jackson
3      John Quincy Adams
4         Andrew Jackson
             ...        
177           Jill Stein
178         Joseph Biden
179         Donald Trump
180         Jo Jorgensen
181       Howard Hawkins
Name: Candidate, Length: 182, dtype: object

This extracts a single column as a Series. We can confirm this by checking the type of the output.

type(candidates)
pandas.core.series.Series

To select multiple columns, we can pass a list of column names. For example, to select both Candidate and Votes columns from the election DataFrame, we can use the following line of code:

elections[['Candidate', 'Party']]
Candidate Party
0 Andrew Jackson Democratic-Republican
1 John Quincy Adams Democratic-Republican
2 Andrew Jackson Democratic
3 John Quincy Adams National Republican
4 Andrew Jackson Democratic
... ... ...
177 Jill Stein Green
178 Joseph Biden Democratic
179 Donald Trump Republican
180 Jo Jorgensen Libertarian
181 Howard Hawkins Green

182 rows × 2 columns

This extracts multiple columns as a DataFrame. We can confirm as well this by checking the type of the output.

type(elections[['Candidate', 'Party']])

This is how we can select columns in a DataFrame. Next, let’s learn how to filter rows.

[]

The [] selection operator is the most baffling of all, yet the most commonly used. It only takes a single argument, which may be one of the following:

  1. A list of column labels
  2. A single column label

Say we wanted the first four rows of our elections DataFrame.