from matplotlib import pyplot as plt
'dark_background')
plt.style.use(
import librosa
= librosa.load('../assets/StarWars3.wav')
y, sr ;
plt.plot(y)'Time (samples)');
plt.xlabel('Amplitude');
plt.ylabel('Star Wars Theme\nSampling rate: %s Hz\nLength: %s seconds' % (sr, len(y)/sr)); plt.title(
Audio
In this section, we will learn how to use representations of audio data in machine learning.
Audio files can be represented in a variety of ways. The most common is the waveform, which is a time series of the amplitude of the sound wave at each time point. The waveform is a one-dimensional array of numbers. The sampling rate is the number of samples per second.
To load an audio file, we can use the librosa
library. The librosa.load
function returns the waveform and the sampling rate.
You may have to install the librosa
library using !pip install librosa
in a new code cell for the code below to work.
The audio file can be downloaded from this link.
len(y) sr,
(22050, 66150)
= librosa.stft(y)
S S.shape
(1025, 130)
Power Spectral Density (PSD) is a measure of the power of a signal at different frequencies. The PSD is calculated using the Fourier Transform. The PSD is a useful representation of audio data because it is often easier to distinguish different sounds in the frequency domain than in the time domain.
import numpy as np
= plt.subplots()
fig, ax = librosa.display.specshow(librosa.amplitude_to_db(np.abs(S), ref=np.max),
img ='log', x_axis='time', ax=ax);
y_axis'Power spectrogram');
ax.set_title(=ax, format="%+2.0f dB"); fig.colorbar(img, ax