2. Encoding and Representation#
Data exists in many forms, formats and modalities. Even missing data is a form of data.
Regardless of its form, format or modality, all data ultimately needs to be transformed into a matrix (or matrices) of numbers for it to be used for any machine learning or data mining algorithm.
The process of transforming raw data into numeric matrices is often referred to as Encoding and the resulting numeric matrices are referred to as Representations. These representations may focus on certain aspects (structure, content, semantics etc.) of the data more than others and may be more or less suitable for certain tasks.

Fig. 2.1 All data is ultimately transformed into a matrix of numbers.#