import pandas as pd
url = "https://raw.githubusercontent.com/fahadsultan/csc272/main/data/elections.csv"
elections = pd.read_csv(url)Selection: subset of columns
To select a column in a DataFrame, we can use the bracket notation. That is, name of the DataFrame followed by the column name in square brackets: df['column_name'].
For example, to select a column named Candidate from the election DataFrame, we can use the following code:
candidates = elections['Candidate']
print(candidates)0 Andrew Jackson
1 John Quincy Adams
2 Andrew Jackson
3 John Quincy Adams
4 Andrew Jackson
...
177 Jill Stein
178 Joseph Biden
179 Donald Trump
180 Jo Jorgensen
181 Howard Hawkins
Name: Candidate, Length: 182, dtype: object
This extracts a single column as a Series. We can confirm this by checking the type of the output.
type(candidates)pandas.core.series.Series
To select multiple columns, we can pass a list of column names. For example, to select both Candidate and Votes columns from the election DataFrame, we can use the following line of code:
elections[['Candidate', 'Party']]| Candidate | Party | |
|---|---|---|
| 0 | Andrew Jackson | Democratic-Republican |
| 1 | John Quincy Adams | Democratic-Republican |
| 2 | Andrew Jackson | Democratic |
| 3 | John Quincy Adams | National Republican |
| 4 | Andrew Jackson | Democratic |
| ... | ... | ... |
| 177 | Jill Stein | Green |
| 178 | Joseph Biden | Democratic |
| 179 | Donald Trump | Republican |
| 180 | Jo Jorgensen | Libertarian |
| 181 | Howard Hawkins | Green |
182 rows × 2 columns
This extracts multiple columns as a DataFrame. We can confirm as well this by checking the type of the output.
type(elections[['Candidate', 'Party']])This is how we can select columns in a DataFrame. Next, let’s learn how to filter rows.
[]
The [] selection operator is the most baffling of all, yet the most commonly used. It only takes a single argument, which may be one of the following:
- A list of column labels
- A single column label
Say we wanted the first four rows of our elections DataFrame.