There are two commonly used terms in Probability and generically in Statistics:

Exclusive or disjoint events.

Independent events.

A great number of theorems and applications within the Statistics field depend on whether the studied events are either mutually exclusive or not, and if they are either mutually independent or not as well.

Disjoint or mutually exclusive events

Two events are disjoint if they cannot occur at the same time. For instance, the age ranges probabilities for a customers are disjoint. It cannot occur simultaneously that a particular customer is more than twenty and less than twenty year old.

Other example is the status of an order. It may be in preparation, at the magazine, en route or delivered to the consignee; being those states mutually exclusive as well.

On the other hand non-disjoint events may coexist at the same point in time. A customer may live in a particular town and be at concurrently more than twenty year old. Those two conditions are not mutually exclusive. Those type of events are not disjoint or mutually exclusive. In the same way, an order may either be in preparation and being assembled or in preparation and ready for delivery all together.

Depending if two or more events are or not disjoint, the way to calculate their probabilities is different. And the outcome of the probabilistic calculus will vary therefore based on it.

Dependent events

Two events are independent when the outcome of one does not depend on the other. In terms of probability, two events are independent when the probability of one of them is not affected by the probability of the other event.

This is the case of the games of chance like lotteries and casinos. Every time the die is rolled the chances to obtain a particular outcome do not change; at each roll the probability of obtaining any of the six possible values for a six-sided die is equal to \(\)\(\frac{1}{6}\).

Conversely, dependent events are affected by their respective probabilities. In this case we talk about conditional probability and that probability is expressed using the nomenclature \(P(A|B)\). An example may be the probability of selling an on-line product when the user has already opened an account on the site and returns. It is different if the second event (account opened previously) occurs or not.

Before starting with this post, you may want to learn the basics of Python. If you’re an experienced programmer and head Python for the first time, you will likely find it very easy to understand. One important thing about Python: Python requires perfect indentation (4 spaces) to validate the code. So if you get an error and you code seems perfect, review if you have indented correctly each line. Python has also a particular way to deal with arrays, more close the the R programming language than to C-like style.

Python’s core functionality is extended with thousands of available free libraries, many of them are incredibly handy. Even if Python is not a compiled language, many of its libraries are written in C, being Python a wrapper for them.

The libraries used on this article are:

scipy – Scientific library for Python.

numpy – Numeric library for Python.

To load a wav file in Python:

# Loads the scipy.io package for later usage of io.wavefile module.from scipy import io
## Location of the wav file in the file system.
fileName ='/Downloads/Music_Genres/genres/country/country.00053.wav'## Loads sample rate (bps) and signal data (wav). Notice that Python# functions can return multiple variables at the same time.
sample_rate, data = io.wavfile.read(fileName)## Print in sdout the sample rate, number of items. #and duration in seconds of the wav file. print"Sample rate:{0}, data size:{1}, duration:{2} seconds" \
.format(sample_rate,data.shape,len(data)/sample_rate)

The output generated should seem like:

Sample rate:22050, data size:(661794L,), duration:30 seconds

The output shows that the wav file contains in all 661,794 samples (the data variable is an array with 661,794 elements). The sample rate is 22,050 samples per second. Dividing 661,794 elements by 22,050 samples per second, we obtain 30 seconds, the length in seconds of the wav file.

The Fourier Transform

The Fourier transform is the method that we will use to extract the prevalent frequencies from the wav file. Each frequency corresponds to a musical tone; knowing the frequencies from a particular time interval we are able to know which are the most frequent tones within that interval, being possible to infer the key and chords played during that time lapse.

This article is not going to enter into the details of the Fourier transform, only on how to use it to extract information regarding the frequencypower from the wav signal analyzed. The video below is an intuitive introduction to the Fourier transform in case the reader is interested on it. It also includes examples of how to implement it algorithmically. It is quite advisable to watch it once now and then come back again to review it after the training in Fourier transform is completed.

Basically, given a signal, a wav file on this post, which is composed by a number n of samples \(x[n]\). We can get the frequency power within the signal with the FFT (Fast Fourier Transform) function. The FFT function is an improvement that optimizes the Fourier transform.

The FFT function receives two arguments, the signar \(x\) and the number of items to retrieve \(k, k\leq n\). The commonly choosen k value is \(\frac{n}{2}\) because the FFT result, \(fft[k]\) is usually symmetric around that length. This means that in order to calculate the FFT, only a half of the total signal length is required to retrieve the different frequencies occurrence. So, in plain words, if the original signal file has 100 samples, only 50 samples are needed to process the complete FFT transform.

In Python language there are two useful functions to calculate and get the Fourier transform from a sample array, like the one where the data variable from the wav file is stored:

fftfreq – Returns the frequency corresponding to each \(x_i\) sample from the signal data sample file \(x[n]\) corresponding to the power of the fourier transform. This is the frequency to which each fft element corresponds to.

fft – Returns the fourier transform data from the sample file. The position of the elements returned correspond to the position of the fftfreq, so that using both arrays the fft power elements correspond by position to the fftfreq frequencies.

For instance, if the fourier transform function returns fft = {0,0.5,1} and \(\)fftfreq = {100,200,300}\(\), it means that the signal has a power of 0 for frequency 100Hz, a power of 0.5 for 200Hz and a power of 1 within 300Hz; being 300Hz the frequency most frequent.

The following code would extract from a wav file the first 10 second, apply the fourier transform and the frequencies associated to each item within the spectral data.

import scipy.io# Package that implements the fast# fourier transform functions.from scipy import fftpack
import numpy as np
# Loads wav file as array.
fileName ='./country.00053.wav'
sample_rate, data = io.wavfile.read(fileName)# Extracting 10 seconds. N is the numbers of samples to# extract or elements from the array.
seconds_to_extract =10
N = seconds_to_extract * sample_rate
# Knowing N and sample rate, fftfreq gets the frequency# Associated to each FFT unit.
f = fftpack.fftfreq(N,1.0/sample_rate)# Extracts fourier transform data from the sample# returning the spectrum analysis
subdata = data[:N]
F = fftpack.fft(subdata)

F contains the power and f the frequency each item within F is related to. The higher the power, the higher the frequency prevalence across the signal. Filtering the frequencies using the f matrix and extracting the power we could get a graph like the next one:

On the y-axis, \(|F|\) is the absolute value of each unit from F and the values of f are the Frequency (Hz) on the x-axis. The green and orange lines can be ignored. To get the subset of frequencies [200-900] displayed on the chart, the next code was used:

# Interval limits
Lower_freq =200
Upper_freq =900# f (frequencies) between lower frequency AND# f (frequencies) upper frequencies.
filter_subset =(f >= Lower_freq) * (f <= Upper_freq)# Extracts filtered items from the frequency list.
f_subset = f[filter_subset]# Extracts filtered items from the Fourier transform power list.
F_subset = F[filter_subset]

On the previous post, Spectral Analysis and Harmony, it is shown an elementary introduction to harmony and digital signal. We are now going to study the range of tones between A3 an A5. Our central axis is A tone (or A4) which frequency is equal to 440Hz.

The next table shows all the tones and frequencies within the chromatic scale belonging to the range between A3 and A5. The piano key number corresponding to each tone is also displayed.

Scientific
name

Key
number

Helmholtz
name

Frequency
(Hz)

A3

37

a

220.000

A♯3/B♭3

38

a♯/b♭

233.082

B3

39

b

246.942

C4 Middle C

40

c′ 1-line octave

261.626

C♯4/D♭4

41

c♯′/d♭′

277.183

D4

42

d′

293.665

D♯4/E♭4

43

d♯′/e♭′

311.127

E4

44

e′

329.628

F4

45

f′

349.228

F♯4/G♭4

46

f♯′/g♭′

369.994

G4

47

g′

391.995

G♯4/A♭4

48

g♯′/a♭′

415.305

A4 – A440

49

a′

440.000

A♯4/B♭4

50

a♯′/b♭′

466.164

B4

51

b′

493.883

C5 Tenor C

52

c′′ 2-line octave

523.251

C♯5/D♭5

53

c♯′′/d♭′′

554.365

D5

54

d′′

587.330

D♯5/E♭5

55

d♯′′/e♭′′

622.254

E5

56

e′′

659.255

F5

57

f′′

698.456

F♯5/G♭5

58

f♯′′/g♭′′

739.989

G5

59

g′′

783.991

G♯5/A♭5

60

g♯′′/a♭′′

830.609

A5

61

a′′

880.000

The difference or leap between two tones is called interval. One interesting feature of the chromatic scale is that it is composed by constant intervals. For instance, tone A3 is equal to 220Hz, tone A4 to 440Hz and tone A5 to 880Hz. Each tone frequency is double its analogue tone from the precedent respective octave.

The important idea is that we can analyze tones as numbers and operate with basic arithmetics with them with their frequencies. Who said emotions cannot be explained by Science? Do not be intimidated if you don’t know neither music theory nor Optical Physics; These texts will led you by the hand on a trip at which end you will know how to extract the waves, tones and emotions from digital music even without knowing none of those.

Frequency analysis in a nutshell

In order to analyze the frequencies that compose a piece of music, we take a part from it and extract a subset of frequencies. Like using an equalizer we filter the sound between two specific frequencies or tones. For instance, we could read the first ten seconds of a music mp3 file and generate a table displaying how many times tone A appears within that sequence. Going farther we could analyze how many tones appear and how many times each tone is played within those 10 first seconds.

In order to extract the signal frequency occurrences, we can use a frequency spectrum graph. This graph displays how many times a frequency appears on a signal and its power or prevalence other the rest. In this case, the signal is the first 10 seconds of music. Let’s see an example:

From the graph on the right we can see that the most used frequencies, those having higher \(|F|\), are one next to the 200Hz, another between the 300Hz and 400Hz and a third one between the 400Hz and the 500Hz. The x-axis shows the frequency spectrum (or range) we are analyzing, and the y-axis the power of the signal. The higher the line at a certain point on the x-axis, the more the power that signal has over that frequency.

To get an insight of the most used tones, the frequencies that have more power can be extracted, and in this case the dominant frequencies within the signal are in particular 220Hz, 246.942Hz, 329.628Hz and 440Hz. Rounding those frequencies to the nearest integer and comparing them to the ones in the table above, we can extract some of the main tones within the first ten seconds of the song.

Scientific
name

Frequency
(Hz)

A3

220.000

B3

247

E4

330

A4 A440

440

From the data above it can be determined that the dominant key within the first seconds is composed by tones A, B and E. That key corresponds to chord A2Sus (A 2nd suspended). A chord is how it’s called the sound composed by multiple tones, multiple frequencies. The names of the different chords are not described in this article, since there are many of them.

In terms of music harmony A2Sus, or generically speaking 2nd suspended chords are tones that create a sensation of waiting for something to be resolved. The listener is holding on until the song resolves in something. We could say that the first ten seconds of this song are causing an emotion of expectation.

For more information on music and emotions, search in Google “emotions chords harmony”. For a good introduction to the matter I would recommend the paper Music and Emotions.

This article and the previous one, Emotions Within Digital Signals, set the basis to successfully tackle the problem of extracting emotions from music sequences. I will explain how to perform that task using Python language in the post Python for Digital Signal Processing.

Music and artistic expression are conceived to provoke emotions to people. Music and visual arts travel in waves through the air across distances, from the transmitter to the receiver. Music is maybe the most influencing form of art, capable of producing deep emotional effects, evoking feelings and awakening memories when one is exposed to it.

The sound perceived is nothing else that the effect from the vibration of the eardrum hit by the sound waves traveling through the air. Like a pendulum, a fast one, the eardrum oscillates and that oscillation is felt as an emotion by our brain.

The fundamental unit in music is the tone. When one sings a song that one is reproducing a sequence of tones in a certain order to produce a melody. In music secondary tones usually follow the lead tone or principal melody. When multiple tones sound at the same time we may call it a chord. Chords define the temper of the music, and are in large part responsible for the emotions that individuals will appreciate when hearing the music.

The tone is the basic musical unit. Western music uses twelve typical tones (C, C#, D, D#, E, F, F#, G, G#, A, A#, B). That range of tones is called octave, and that tone structure is also commonly called chromatic scale.

Each chord is composed exclusively and always by two or more of those tones in western music. For instance, the C Major Chord I is formed by tones C, E and G played at the same time.

Same way we call chord to the sound of multiple tones at once, we call key to the group of tones the music evolves through. Key is similar to chord, and the basic difference is that keys are tones across time within the same space or plane, and chords are tones on the same instant but across different planes. To summarize we can assume that chords are multiple simultaneous tones and keys are multiple tones belonging always to the same space of tones.

For instance C Major chord would consist on tones C, E and G played at the same time for two seconds. C Major key could consist instead on C tone played on second 1, E tone played on second 2, and finally G tone played alone on second 3.

Remember that we said that music are just waves, in fact tones are waves too, and each tone has an unique corresponding wave. If we examine the most common waves within each part of a musical piece, we can find out which notes are defining that music within each time interval. We can therefore extract the tones, chords and the key of that music just by analyzing the frequency of the waves it is composed of.

Frequency is the time a wave completes a cycle. It is measured in hertzs. One hertz is equal to one cycle per second. Each tone has a fixed frequency that never changes. For instance, tone A corresponds to a frequency of 440Hz. Instruments are usually tempered using that tone A as a basis, meaning that all instruments that we can hear and produce notes will produce the same frequencies for the same tones.

In the next post, Spectral Analysis and Harmony, we will see how can we take advantage of wave analysis (Digital Signal Processing) and Music theory (Harmony) to programmatically identify feelings from music files.