Data Wranging and Manipulation

Pandas

Pandas is a powerful Python library that is widely used in data science and data analysis. It provides data structures and functions that make working with tabular data easy and intuitive.

It is generally accepted in the data science community as the industry- and academia-standard tool for manipulating tabular data.

Some of the core functionalities of Pandas covered in this course include:

  • Reading and Writing Data (read_csv, to_csv, etc.)
  • Extracting Relevant Rows and Columns (Filtering, Slicing, Selection, loc, iloc)
  • Merging multiple datasets (merge, concat, etc.)
  • Grouping data (groupby)
  • Aggregating data (sum, mean, count, etc.)
  • Reshaping data (pivot)

Jupyter Notebooks

Jupyter Notebooks are an open-source web application that allows you to create and share documents that contain live code, equations, visualizations, and narrative text.

They are widely used in data science, scientific computing, and machine learning for data cleaning and transformation, numerical simulation, statistical modeling, data visualization, and much more.

Google Colab

Google Colab is a free, cloud-based platform that allows you to write and execute Python code in your web browser. It provides a Jupyter notebook environment with access to powerful computing resources, including GPUs and TPUs.

Note that code running on Google Colab is on a remote server, not on your local machine. This means that any files you want to use in your code must be uploaded to the Colab environment or accessed from a cloud storage service like Google Drive.