## Deep Learning Nonlinear Regression

In this article we put to work a perceptron to predict a high difficulty level nonlinear regression problem. The data has been generated using an exponential function with this shape:

The graph above corresponds to the values of the dataset that can be downloaded from the Statistical Reference Dataset of the Information Technology Laboratory of the United States on this link: http://www.itl.nist.gov/div898/strd/nls/data/eckerle4.shtml

Neural networks are especially appropriate to learn patterns and remember shapes. Perceptrons are very basic but yet very powerful neural networks types. Their structure is basically an array of weighted values that is recalculated and balanced iteratively. They can implement activation layers or functions to modify the output within a certain range or list of values.

In order to create the neural network we are going to use Keras, one of the most popular Python libraries. The code is as follows:

The first thing to do is to import the elements that we will use. We will not use aliases for the purpose of clarity:

# Numeric Python Library.
import numpy
# Python Data Analysis Library.
import pandas
# Scikit-learn Machine Learning Python Library modules.
#   Preprocessing utilities.
from sklearn import preprocessing
#   Cross-validation utilities.
from sklearn import cross_validation
# Python graphical library
from matplotlib import pyplot

# Keras perceptron neuron layer implementation.
from keras.layers import Dense
# Keras Dropout layer implementation.
from keras.layers import Dropout
# Keras Activation Function layer implementation.
from keras.layers import Activation
# Keras Model object.
from keras.models import Sequential

In the previous code we have imported the numpy and pandas libraries to manage the data structures and perform operations with matrices. The two scikit-learn modules will be used to scale the data and to prepare the test and train data sets.

The matplotlib package will be used to render the graphs.

From Keras, the Sequential model is loaded, it is the structure the Artificial Neural Network model will be built upon. Three types of layers will be used:

1. Dense: Those are the basic layers made with weighted neurons that form the perceptron. An entire perceptron could be built with these type of layers.
2. Activation: Activation functions transform the output data from other layers.
3. Dropout: This is a special type of layer used to avoid over-fitting by leaving out of the learning process a number of neuron.

# Peraring dataset
# Imports csv into pandas DataFrame object.

# Converts dataframes into numpy objects.
Eckerle4_dataset = Eckerle4_df.values.astype("float32")
# Slicing all rows, second column...
X = Eckerle4_dataset[:,1]
# Slicing all rows, first column...
y = Eckerle4_dataset[:,0]

# Data Scaling from 0 to 1, X and y originally have very different scales.
X_scaler = preprocessing.MinMaxScaler(feature_range=(0, 1))
y_scaler = preprocessing.MinMaxScaler(feature_range=(0, 1))
X_scaled = ( X_scaler.fit_transform(X.reshape(-1,1)))
y_scaled = (y_scaler.fit_transform(y.reshape(-1,1)))

# Preparing test and train data: 60% training, 40% testing.
X_train, X_test, y_train, y_test = cross_validation.train_test_split( \
X_scaled, y_scaled, test_size=0.40, random_state=3)

The predictor variable is saved in variable X and the dependent variable in y. The two variables have values that differ several orders of magnitude; and the neural networks work better with values next to zero. For those two reasons the variables are scaled to remove their original magnitudes and put them within the same magnitude. Their values are proportionally transformed within 0 and 1.

The data is divided into two sets. One will be used to train the neural network, using 60% of all the samples; and the other will contain 40% of the data, that will be used to test if the model works well with out-of-the-sample data.

Now we are going to define the neural network. It will consist in an input layer to receive the data, several intermediate layers, to process the weights, and a final output layer to return the prediction (regression) results.

The objective is that the network learns from the train data and finally can reproduce the original function with only 60% of the data. It could be less, it could be more; I have chosen 60% randomly. In order to verify that the network has learnt the function, we will ask it to predict which response should return the test data that was not used to create the model.

Now let’s think about the neural network topology. If we study the chart, there are three areas that differ considerably. Those are the left tail, up to the 440 mark, a peak between the 440 and 465 marks approximately, and the second tail on the right, from the 465 mark on. For this reason we will use three neuron intermediate layers, so that the first one learns any of these areas, the second one other area, and the third one the final residuals that should correspond to the third area. We will have therefore 3 layers in our network plus one input and one output layer too. The basic layer structure of the neural network should be similar to this, a sequence of layers, from left to right with this topology:

INPUT LAYER(2) > [HIDDEN(i)] > [HIDDEN(j)] > [HIDDEN(k)] > OUTPUT(1)

An input layer that accepts two values X and y, a first intermediate layer that has i neurons, a second hidden layer that has j neurons, an intermediate layer that has k neurons, and finally, an output layer that returns the regression result for each sample X, y.

# New sequential network structure.
model = Sequential()

# Input layer with dimension 1 and hidden layer i with 128 neurons.
# Dropout of 20% of the neurons and activation layer.
# Hidden layer j with 64 neurons plus activation layer.
# Hidden layer k with 64 neurons.
# Output Layer.

# Model is derived and compiled using mean square error as loss
# function, accuracy as metric and gradient descent optimizer.

# Training model with train data. Fixed random seed:
numpy.random.seed(3)
model.fit(X_train, y_train, nb_epoch=256, batch_size=2, verbose=2)

Now the model is trained by iterating 256 times through all the train data, taking each time two sampless.

In order to graphically see the accuracy of the model, now we apply the regression model to new data that has not been used to create the model. We will also plot the predicted values versus the actual values.

~ Predict the response variable with new data
predicted = model.predict(X_test)

# Plot in blue color the predicted adata and in green color the
# actual data to verify visually the accuracy of the model.
pyplot.plot(y_scaler.inverse_transform(predicted), color="blue")
pyplot.plot(y_scaler.inverse_transform(y_test), color="green")
pyplot.show()

And the produced graph shows that the network has adopted the same shape as the function:

This demonstrates the exceptional power of neural networks to solve complex statistical problems, especially those in which causality is not crucial such as image processing or speech recognition.

## Python for Digital Signal Processing

Before starting with this post, you may want to learn the basics of Python. If you’re an experienced programmer and head Python for the first time, you will likely find it very easy to understand. One important thing about Python: Python requires perfect indentation (4 spaces) to validate the code. So if you get an error and you code seems perfect, review if you have indented correctly each line. Python has also a particular way to deal with arrays, more close the the R programming language than to C-like style.

Python’s core functionality is extended with thousands of available free libraries, many of them are incredibly handy. Even if Python is not a compiled language, many of its libraries are written in C, being Python a wrapper for them.

• scipy – Scientific library for Python.
• numpy – Numeric library for Python.

To load a wav file in Python:

# Loads the scipy.io package for later usage of io.wavefile module.
from scipy import io
#
# Location of the wav file in the file system.
#
# Loads sample rate (bps) and signal data (wav). Notice that Python
# functions can return multiple variables at the same time.
#
# Print in sdout the sample rate, number of items.
#and duration in seconds of the wav file.
print "Sample rate:{0}, data size:{1}, duration:{2} seconds" \
.format(sample_rate,data.shape,len(data)/sample_rate)

The output generated should seem like:

Sample rate:22050, data size:(661794L,), duration:30 seconds

The output shows that the wav file contains in all 661,794 samples (the data variable is an array with 661,794 elements). The sample rate is 22,050 samples per second. Dividing 661,794 elements by 22,050 samples per second, we obtain 30 seconds, the length in seconds of the wav file.

### The Fourier Transform

The Fourier transform is the method that we will use to extract the prevalent frequencies from the wav file. Each frequency corresponds to a musical tone; knowing the frequencies from a particular time interval we are able to know which are the most frequent tones within that interval, being possible to infer the key and chords played during that time lapse.

This article is not going to enter into the details of the Fourier transform, only on how to use it to extract information regarding the frequency power from the wav signal analyzed. The video below is an intuitive introduction to the Fourier transform in case the reader is interested on it. It also includes examples of how to implement it algorithmically. It is quite advisable to watch it once now and then come back again to review it after the training in Fourier transform is completed.

Basically, given a signal, a wav file on this post, which is composed by a number n of samples $$x[n]$$. We can get the frequency power within the signal with the FFT (Fast Fourier Transform) function. The FFT function is an improvement that optimizes the Fourier transform.

The FFT function receives two arguments, the signar $$x$$ and the number of items to retrieve $$k, k\leq n$$. The commonly choosen k value is $$\frac{n}{2}$$ because the FFT result, $$fft[k]$$ is usually symmetric around that length. This means that in order to calculate the FFT, only a half of the total signal length is required to retrieve the different frequencies occurrence. So, in plain words, if the original signal file has 100 samples, only 50 samples are needed to process the complete FFT transform.

In Python language there are two useful functions to calculate and get the Fourier transform from a sample array, like the one where the data variable from the wav file is stored:

• fftfreq – Returns the frequency corresponding to each $$x_i$$ sample from the signal data sample file $$x[n]$$ corresponding to the power of the fourier transform. This is the frequency to which each fft element corresponds to.
• fft – Returns the fourier transform data from the sample file. The position of the elements returned correspond to the position of the fftfreq, so that using both arrays the fft power elements correspond by position to the fftfreq frequencies.

For instance, if the fourier transform function returns fft = {0,0.5,1} and fftfreq = {100,200,300}, it means that the signal has a power of 0 for frequency 100Hz, a power of 0.5 for 200Hz and a power of 1 within 300Hz; being 300Hz the frequency most frequent.

The following code would extract from a wav file the first 10 second, apply the fourier transform  and the frequencies associated to each item within  the spectral data.

import scipy.io

# Package that implements the fast
# fourier transform functions.
from scipy import fftpack
import numpy as np

# Loads wav file as array.
fileName = './country.00053.wav'

# Extracting 10 seconds. N is the numbers of samples to
# extract or elements from the array.
seconds_to_extract = 10
N = seconds_to_extract * sample_rate

# Knowing N and sample rate, fftfreq gets the frequency
# Associated to each FFT unit.
f = fftpack.fftfreq(N, 1.0/sample_rate)

# Extracts fourier transform data from the sample
# returning the spectrum analysis
subdata = data[:N]
F = fftpack.fft(subdata)

F contains the power and f the frequency each item within F is related to. The higher the power, the higher the frequency prevalence across the signal. Filtering the frequencies using the f matrix and extracting the power we could get a graph like the next one:

On the y-axis, $$|F|$$ is the absolute value of each unit from F and the values of f are the Frequency (Hz) on the x-axis. The green and orange lines can be ignored. To get the subset of frequencies [200-900] displayed on the chart, the next code was used:

# Interval limits
Lower_freq = 200
Upper_freq = 900

# f (frequencies) between lower frequency AND
# f (frequencies) upper frequencies.
filter_subset = (f &gt;= Lower_freq) * (f &lt;= Upper_freq)

# Extracts filtered items from the frequency list.
f_subset = f[filter_subset]

# Extracts filtered items from the Fourier transform power list.
F_subset = F[filter_subset]