Unsupervised Learning

9. Unsupervised Learning#

In contrast to supervised learning, in unsupervised learning there are no available labels in the dataset. The goal of unsupervised learning is to “infer” labels. This is done by learning the structure of the data by analyzing how different observations (rows) relate to one another.

In the figure below, the top row shows the inputs to supervised and unsupervised learning problems. Note that points in the top right panel are colored based on available labels. In contrast, the panel in top left shows the same data, but without labels.

The bottom row shows the outputs of the two problems. Note that supervised learning (right column) uses the labels to infer the underlying distributions so that a new incoming point could be categorized into one of the two labels. Unsupervised learning, in contrast, infers the structure of the data, and assigns labels to the data based on the inferred structure. This structure is captured by relative proximity or distances of observations to each other, not too differently from nearest neighbor.

Note that in unsupervised learning, there is no need to split the data into training and test sets. This is because there are no labels to predict. Instead, the goal is to learn the structure of the data, and then use this structure to infer labels for new data points.