We compare a very basic Autoencoder to PCA.
In part1 we keep the example simple and focus more on becoming familiar with the Keras syntax for setting up this kind of models and to have a feel for how NN Autoencoders differ from linear autoencoders like PCA.
Remember that PCA simply means Eigendecomposition of the covariance:
>>> Cov(data) = V*D*V.T
If you don't keep all the eigenvectors, this gives a low dimensional 'approximation' to the data itself:
>>> data_lowdim = data*V
The approximate data in 'original dimensions' is:
>>> data_approx = data*V*V.T
An Autoencoder is a generalisation of the above where:
>>> data_lowdim = Encode( data )
>>> data_approx = Decode( data_lowdim )
Where Encode(x) and Decode(x) are potentially nonlinear functions of x.
Of course in PCA, both encoder and decoder are stored in V:
>>> Encode == V
>>> Decode == V.T
In this tutorial we will:
# Helper stuff
# a simple (too simple) PCA class
import numpy as np
class PCA(object):
def __init__(self, data):
# Assumes data matrix is NumberSamples X NumberDimensions
self.data = data.copy()
self.mean = data.mean(axis=0)
# Demean the data such that cov(data)=data.T*data
self.data = data-self.mean
def fit(self):
# Eigvectors of covariance
Cov = data.T@data
_,V = np.linalg.eig(Cov)
# Keep Eigenvectors for use later
self.V = V
def encode(self,data,dim=1):
return data@self.V[:,:dim]
def decode(self,data_lowdim):
dim = data_lowdim.shape[1]
return data_lowdim@self.V[:,:dim].T
def predict(self,dim=1):
return self.data@self.V[:,:dim]@self.V[:,:dim].T + self.mean
Below we generate 2D data points from two Gaussian distributions (to add a bit of "nonlinearity")
import numpy as np
n = 1000
cov = [[1,.9],[.9,1]]
data1 = np.random.multivariate_normal([-1.1,.1],cov,n)
data2 = np.random.multivariate_normal([1.1,-.1],cov,n)
data = np.concatenate((data1,data2),axis=0)
# Quickly plot the data
%matplotlib inline
import seaborn as sns
import matplotlib.pyplot as plt
sns.scatterplot(x=data[:,0], y=data[:,1])
plt.axis('equal')
Next we do a quick PCA fitting to the data. We keep 1 dimension, and we plot the model prediction.
# do a quick PCA fit
pca = PCA(data)
pca.fit()
pca_pred = pca.predict(dim=1)
# the same can be achieved with:
# pca_pred = pca.decode(pca.encode(data,dim=1))
# quick plot
sns.scatterplot(x=data[:,0], y=data[:,1])
sns.scatterplot(x=pca_pred[:,0], y=pca_pred[:,1])
plt.legend(('data','PCA'))
Now let us define the Autoencoder model.
We will use a simple feedforward neural network that looks like this:
Input -> Layer -> Layer -> LowDim -> Layer -> Layer -> Output
from keras.layers import Input, Dense
from keras.models import Model
# creates the autoencoder and outputs the encoder and decoder separately as well
def create_AE(input_dim,latent_dim=1):
# Encoder Model
inputs = Input(shape=(input_dim,), name='Input')
x = Dense(20, activation = 'relu')(inputs)
x = Dense(20, activation = 'relu')(x)
lat = Dense(latent_dim, activation = 'linear')(x)
encoder = Model(inputs, lat, name='encoder')
# Decoder model
lat_input = Input(shape=(latent_dim,))
x = Dense(20, activation = 'relu')(lat_input)
x = Dense(20, activation = 'relu')(x)
outputs = Dense(input_dim, activation = 'linear')(x)
decoder = Model(lat_input, outputs, name='decoder')
# Autoencoder
outputs = decoder(encoder(inputs))
ae = Model(inputs, outputs, name='ae')
return ae,encoder,decoder
# Create Autoencoder
ae,enc,dec = create_AE(input_dim=data.shape[1],latent_dim=1)
ae.summary()
Does it make sense that it has this many parameters?
Let's see.
So in total = 60+420+21+40+420+42=1003
Before we start training we need a loss function. For example the one defined below is the mean squared error between input and output:
from keras.losses import mse
from keras import backend as K
def recon_loss(inputs,outputs):
reconstruction_loss = mse(inputs, outputs)
return K.mean(reconstruction_loss)
# Training
# define loss function
losses = {'decoder':recon_loss}
# choose optimisation routine
ae.compile(optimizer='adam',loss=losses)
# run the fitting
ae.fit(data,
{'decoder':data},
epochs=200,
batch_size=128,shuffle=True)
plt.plot(ae.history.history['loss'])
Finally, let's look at the prediction of the Autoencoder and compare it to PCA:
# Look at predictions
out = ae.predict(data)
# equivalently:
# out = dec.predict(enc.predict(data))
import seaborn as sns; sns.set()
sns.scatterplot(x=data[:,0], y=data[:,1])
sns.scatterplot(x=pca_pred[:,0], y=pca_pred[:,1])
sns.scatterplot(x=out[:,0], y=out[:,1])
plt.legend(('data','PCA','AE'))
In part2 we will apply this to the MNIST data.