# ONBI - GLM and PCA Practical

## Practical Overview

This practical requires Matlab. Go through the page and execute the listed commands in the Matlab command window (you can copy-paste). Don't click on the "answers" links until you have thought hard about the question. Raise your hand should you need any help.

### Contents:

General Linar Model
Fitting the General Linar Model to some data
Principal Component Analysis
Doing PCA on some data

## Simple GLM

Let's start simple. Open matlab, and generate noisy data y using a linear model with one regressor x and an intercept. I.e. `y=a+b*x`

```  x = (1:20)';
intercept   = -10;
slope       = 2.5;
y = intercept + slope*x;
y = y + 10*randn(size(y)); % add some noise
```

Now plot the data against x:

```  figure
plot(x,y,'o');
xlabel('x');
ylabel('y');
```

Let's compare fitting a linear model with and without the intercept. First, set up two design matrices:

```    M1 = [x];                % w/o intercept
M2 = [ones(size(x)) x];  % w/ intercept
```

Look at M1 and M2 if you are not sure what they should look like (e.g. type M1 in the Matlab command window and press enter).

Now solve the two GLMs:

```    beta1 = pinv(M1)*y;
beta2 = pinv(M2)*y;
```

Look at the values in beta1 and beta2. How should we interpret the meaning of these values? Answer

Plot the two models against the data:

```
figure;hold on;grid on
plot(x,y,'o');
plot(x,M1*beta1,'r-');
plot(x,M2*beta2,'b-');
```

Note how the red line must go through the origin `(0,0)`, as there is no intercept in the model (the intercept is always zero).

Note also how in this case the slope is higher when modelling the intercept. If we were to do statistics on this slope coefficient (to determine if it were significantly different from zero), would we find it more or less significant than if we hadn't modelled the intercept? Answer

Compare fitting the same M2 model (i.e. with a column of one's as a regressor), with and without demeaning x. What happens to the beta coefficients? How do we interpret the values in beta2 after demeaning x? Answer

Ok now let's have some fun with some real data. Download the Oxford weather data textfile from this link: OxfordWeather (right-click on the link and save on the disk).

Now load the text file in matlab and arrange the columns of the text file into different variables:

```  data = load('OxfordWeather.txt');

year       = data(:,1);
month      = data(:,2);
maxTemp    = data(:,3);
minTemp    = data(:,4);
hoursFrost = data(:,5);
rain       = data(:,6);

meanTemp   = .5*(minTemp+maxTemp);

```

Plot some of the data to familiarise yourself with it.

```
figure
subplot(2,3,2),boxplot(rain,month);                     xlabel('month');    ylabel('rain');
subplot(2,3,3),plot(minTemp,maxTemp,'.k');              xlabel('minTemp');  ylabel('maxTemp');
subplot(2,3,4),plot(meanTemp,hoursFrost,'.k');          xlabel('meanTemp'); ylabel('hoursFrost');
subplot(2,3,5),boxplot(rain,round(meanTemp/3)*3);       xlabel('meanTemp'); ylabel('rain');

```

You may notice from these plots that there is nothing much to model about the rain (at least with a GLM). It simply rains all the time... Instead, let's model the rest of the data in various ways:

• Fit a linear model with x=min and y=max monthly temperatures. Answer.
• Fit an exponential model with x=min temperature and y=hoursFrost. Hint. Answer.
• Write a generic function that can fit a polynomial of user-defined order (i.e. y=a+b*x+c*x^2+d*x^3+...) and fit this model to the hoursFrost data using meanTemp as a regressor. What polynomial order gives the best fit without over-fitting? Answer.
• Try to write down other models that can be fitted with the GLM (including via transformation of the data) and use simulated data to fit them. Some examples.

## Eigenvalues/Eigenvectors of data clouds

Here we will switch gears and turn our attention to eigenvalue decompositions. We will use eigenvalues in the context of data dimensionality reduction.

First, let's generate a 2D data cloud. The following code generates data that follows a Gaussian distribution with a covariance matrix that has a 60 degree rotation with respect to the x-axis. Read the code carefully and try to understand all the different steps.

```  mu=[5;2];                                     % mean position
ang=60;                                       % angle in degrees
R=[cosd(ang) -sind(ang);sind(ang) cosd(ang)]; % rotation matrix
sig=R*[1.3 0;0 .1]*R';                        % rotated covariance matrix
y=mvnrnd(mu,sig,100);                         % create points using multivariate gaussian random

% In case where mvnrnd is not installed (statistics toolbox), use
% Cholesky decomposition to generate the samples:
A=chol(sig);
y=(repmat(mu,1,100) + A'*randn(2,100))';

figure,hold on                                % plot the data
plot(y(:,1),y(:,2),'b.');
```

Now keep the previous plot open, demean the data, and plot the demeaned data on the same plot.

```  ydemean=y-repmat(mean(y),100,1);
plot(ydemean(:,1),ydemean(:,2),'k.');
```

Next, calculate the eigenvalue decomposition of the covariance matrix and plot the eigenvectors scaled by the square-root of their eigenvalues.

```  [v,d]=eig(cov(y));
d=sqrt(diag(d));

% Plot the two eigenvectors
axis equal,
quiver(0,0,v(1,2),v(2,2),3*d(2),'color','r'); % major eigenvector
quiver(0,0,v(1,1),v(2,1),3*d(1),'color','b'); % minor eigenvector
```

It should be clear that the major eigenvector (in red) is aligned with the direction of most "variance" in the data, and that the minor eigenvector is orthogonal to the major one, and is aligned with the direction of least variance. This is true because we have generated data using a Gaussian distribution.

Now project the data onto the first eigenvector and plot the projected data.

```  yy = (ydemean*v(:,2))*v(:,2)';
plot(yy(:,1),yy(:,2),'g.');
```

Now we do the same thing, but this time on a data set that does not have an interesting direction of maximum variance, but instead has information that is cast along a "radial" dimension.

Start by generating the doughnut data set (make sure you understand what each line is doing in the following code):

```  t = 2*pi*rand(1000,1); % random angle
r = .5+rand(1000,1);   % random radius between .5 and 1.5
x=r.*sin(t);
y=r.*cos(t);

figure,set(gcf,'color','w'),hold on
plot(x,y,'o')
axis equal
```

Now do the PCA on this data by calculating the eigenvalues of the covariance matrix:

```  Y=[x y];
[v,d]=eig(cov(Y));
d=sqrt(diag(d));

quiver(0,0,v(1,2),v(2,2),3*d(2),'color','r');
quiver(0,0,v(1,1),v(2,1),3*d(1),'color','b');

```

The resuting directions, as you can see, do not indicate any particularly useful feature of the data. In fact, they are completely random (driven by the noise, since the model is completely symmetric).

Furthermore, if we project the data onto the mean eigenvector, we loose all information on the doughnut:

```    ydemean=Y-repmat(mean(Y),size(Y,1),1);
yy = (ydemean*v(:,2))*v(:,2)';
plot(yy(:,1),yy(:,2),'g.');
```

Now one last data-cloud example. Consider the data generated with the following bit of code:

```  y1=mvnrnd([0;0],[1,0;0 20],1000);
y2=mvnrnd([5;0],[1,0;0 20],1000);
y=[y1;y2];
y=y-repmat(mean(y),2000,1); % demeand the data to keep things simple

figure;hold on
plot(y(:,1),y(:,2),'k.');
axis equal

```

As you can see, the data comes in two interesting clusters. However, the direction of maximum variance is orthogonal to the direction that separates the two clusters. Let's see how this affects the results of a data reduction that uses PCA.

First calculate the eigenspectrum of the covariance matrix and plot the major and minor eigenvectors:

```  [v,d]=eig(cov(y));
d=sqrt(diag(d));
quiver(0,0,v(1,2),v(2,2),3*d(2),'color','r'); % major eigenvector
quiver(0,0,v(1,1),v(2,1),3*d(1),'color','b'); % minor eigenvector
```

Now project the data onto the first eigenvector. You can see that the cluster information is completely lost. PCA is not a good way to cleanup this data.

```  yy = (y*v(:,2))*v(:,2)';
plot(yy(:,1),yy(:,2),'g.');
```

## PCA of the world

Let us finish with a fun example, PCA of the world map. I've chosen the world map because, although the data is multi-dimensional, which precludes from simple data-cloud style visualisation, information in the data can still be visualised as a matrix.

Now open the data in Matlab and visualise it with `imagesc`.

```  im = double(imread('world-map.gif'));  % read image
im=sum(im,3);                          % sum across the three colour channels
imagesc(im);                           % display the image

```

Start by calculating the eigenspectrum of the data covariance matrix:

```  [v,d]=eig(cov(im));
```

Now we are going to reduce the dimensionality of the data by reconstructing the data using a subset of the eigenvectors. To this end, we need to first demean the rows of the data matrix:

```  x=im;
m=mean(x,2);
for i=1:size(x,1)
x(i,:) = x(i,:) - m(i);
end
```

Next, reconstruct the data using only the top `p` eigenvectors. Look at what happens with and without adding the mean back to the data. Play with different values of `p`. When do we start seeing Great Britain?

```    p=4;
% reconstruct the data with top p eigenvectors
y = (x*v(:,end-p+1:end))*v(:,end-p+1:end)';
% plotting
figure
subplot(3,1,1);imagesc(im)
subplot(3,1,2);imagesc(y)
for i=1:size(x,1)
y(i,:) = y(i,:) + m(i);
end
subplot(3,1,3);imagesc(y)

```

Same as above but with Queen Victoria. When do you start to see her moustache?

```  im = double(imread('Queen_Victoria_by_Bassano.jpg'));
im = sum(im,3);

% reconstruct the data with top p eigenvectors
[v,d]=eig(cov(im));
p=4;
x=im;
m=mean(x,2);
for i=1:size(x,1)
x(i,:) = x(i,:) - m(i);
end
y = (x*v(:,end-p+1:end))*v(:,end-p+1:end)';
% plotting
figure
colormap(gray);
subplot(1,3,1); imagesc(im);
subplot(1,3,2); imagesc(y);