Before starting with this post, you may want to learn the basics of **Python**. If you’re an experienced programmer and head Python for the first time, you will likely find it very easy to understand. One important thing about Python: *Python requires perfect indentation (4 spaces) to validate the code*. So if you get an error and you code seems perfect, review if you have indented correctly each line. Python has also a particular way to deal with arrays, more close the the **R** programming language than to **C**-like style.

Python’s core functionality is extended with thousands of available free **libraries**, many of them are incredibly handy. Even if Python is not a compiled language, many of its libraries are written in C, being Python a wrapper for them.

The libraries used on this article are:

*scipy –*Scientific library for Python.*numpy*– Numeric library for Python.

To **load** a ** wav** file in Python:

# Loads the scipy.io package for later usage of io.wavefile module. from scipy import io # # Location of the wav file in the file system. fileName = '/Downloads/Music_Genres/genres/country/country.00053.wav' # # Loads sample rate (bps) and signal data (wav). Notice that Python # functions can return multiple variables at the same time. sample_rate, data = io.wavfile.read(fileName) # # Print in sdout the sample rate, number of items. #and duration in seconds of the wav file. print "Sample rate:{0}, data size:{1}, duration:{2} seconds" \ .format(sample_rate,data.shape,len(data)/sample_rate)

The output generated should seem like:

Sample rate:22050, data size:(661794L,), duration:30 seconds

The output shows that the *wav* file contains in all 661,794 **samples** (the *data *variable is an array with 661,794 elements). The sample **rate** is 22,050 samples per second. Dividing 661,794 elements by 22,050 samples per second, we obtain 30 seconds, the **length** in seconds of the *wav* file.

### The Fourier Transform

The Fourier transform is the method that we will use to extract the prevalent frequencies from the *wav* file. Each frequency corresponds to a musical tone; knowing the frequencies from a particular time interval we are able to know which are the most frequent tones within that interval, being possible to infer the *key* and *chords* played during that time lapse.

This article is not going to enter into the details of the *Fourier transform*, only on how to use it to extract information regarding the **frequency** **power** from the *wav* signal analyzed. The video below is an intuitive introduction to the Fourier transform in case the reader is interested on it. It also includes examples of how to implement it algorithmically. It is quite advisable to watch it once now and then come back again to review it after the training in Fourier transform is completed.

Basically, given a signal, a *wav *file on this post, which is composed by a number *n *of samples \(x[n]\). We can get the frequency power within the signal with the *FFT* (*Fast Fourier Transform) *function. The *FFT *function is an improvement that optimizes the Fourier transform.

The *FFT* function receives two arguments, the signar \(x\) and the number of items to retrieve \(k, k\leq n\). The commonly choosen *k* value is \(\frac{n}{2}\) because the *FFT* result, \(fft[k]\) is usually symmetric around that length. This means that in order to calculate the *FFT*, only a half of the total signal length is required to retrieve the different frequencies occurrence. So, in plain words, if the original signal file has 100 samples, only 50 samples are needed to process the complete *FFT* transform.

In *Python* language there are two useful functions to calculate and get the *Fourier *transform from a sample array, like the one where the *data* variable from the *wav *file is stored:

*fftfreq*– Returns the frequency corresponding to each \(x_i\) sample from the signal data sample file \(x[n]\) corresponding to the power of the fourier transform. This is the frequency to which each*fft*element corresponds to.*fft*– Returns the*fourier transform*data from the sample file. The position of the elements returned correspond to the position of the*fftfreq,*so that using both arrays the*fft*power elements correspond by position to the*fftfreq*frequencies.

For instance, if the *fourier transform* function returns fft = {0,0.5,1} and \(\)fftfreq = {100,200,300}\(\), it means that the signal has a power of 0 for frequency 100Hz, a power of 0.5 for 200Hz and a power of 1 within 300Hz; being 300Hz the frequency most frequent.

The following code would extract from a *wav* file the first 10 second, apply the fourier transform and the frequencies associated to each item within the spectral data.

import scipy.io # Package that implements the fast # fourier transform functions. from scipy import fftpack import numpy as np # Loads wav file as array. fileName = './country.00053.wav' sample_rate, data = io.wavfile.read(fileName) # Extracting 10 seconds. N is the numbers of samples to # extract or elements from the array. seconds_to_extract = 10 N = seconds_to_extract * sample_rate # Knowing N and sample rate, fftfreq gets the frequency # Associated to each FFT unit. f = fftpack.fftfreq(N, 1.0/sample_rate) # Extracts fourier transform data from the sample # returning the spectrum analysis subdata = data[:N] F = fftpack.fft(subdata)

*F* contains the power and *f* the frequency each item within *F* is related to. The higher the power, the higher the frequency prevalence across the signal. Filtering the frequencies using the *f *matrix and extracting the power we could get a graph like the next one:

On the y-axis, \(|F|\) is the absolute value of each unit from *F* and the values of *f* are the Frequency (Hz) on the x-axis. The green and orange lines can be ignored. To get the subset of frequencies [200-900] displayed on the chart, the next code was used:

# Interval limits Lower_freq = 200 Upper_freq = 900 # f (frequencies) between lower frequency AND # f (frequencies) upper frequencies. filter_subset = (f >= Lower_freq) * (f <= Upper_freq) # Extracts filtered items from the frequency list. f_subset = f[filter_subset] # Extracts filtered items from the Fourier transform power list. F_subset = F[filter_subset]