**First Play With PCA**

Click on the “Data set” button

Use the “Scatter” button with various “Caliber” and “Number of points” values; in addition, you may add several random points using the button “Random”.

Prepare
the cloud of data points

Go to “PCA” and select by two mouse clicks two points for the initial approximation of the first principal components.

Click
on the “Learn” button. The four-ray star will be placed at the mean point. This
is not yet a result.

To watch see the result and the transformation of the straight line from the initial approximation to the principal component, go to “Learning history”. The history of iterations will be demonstrated step by step:

In this example, the iterations converge in seven steps. The result is on the
last figure. The changes at the last three steps are tiny and we have omitted
here the sixth step.

__A task for exploration.__*Usually, the iterations converge in 5-10 steps. Can you find situations
(data sets and initial approximations) when the iterations converge in more
steps? What is the maximum you can achieve in your examples? If you can invent
a data set and initial approximation for which the PCA learning takes 20 or
more steps then you understand the nature of the PCA and learning algorithm.
Please play. You can also organize a competition: who can invent an example
with the maximal number of iterations?*

You
can find the accuracy of the approximation of the data set by the first
principal component. For this purpose, use the button “Show error” and find the
value of fraction of variance unexplained (FVU) in
the bottom right corner. This notion is illustrated below. For each data point,
the deviation is the distance from the mean point. The distance from the
projection on the principal component line to the mean point is the *explained* deviation. The distance from
the data point to the straight line of the first principal component is the *unexplained* deviation. The ratio of the
sum of squares of these unexplained deviations to the sum of squares of the
deviations from the mean point is the FVU*.* Usually, it is measured in %.

You can use the standardized data sets: click the button “Std. data”. Clean the screen and select one of the sets, for example “S horiz.”:

For smearing of this image, you can use the “Scatter” procedure. Choose the scattering radius and the number of points to add to every point in a circle with this radius:

:

After you click “OK”, the smeared image appears:

It may be interesting to use this operation several times with different radii and numbers of points and prepare a sort of “halo” around the image:

If you add randomly one point to each existent data point in a circle of radius
15, you get the following figure:

Initiate PCA with two mouse clicks, and learn:

In our example, the PCA algorithm converges in 6 steps. Go to the Learning history and look on this history, step by step. Below are the first, second and sixth steps. The fraction of variance unexplained (FUV) is 18.80%:

Play with the smeared standard images and explore the possibility of the PCA approximation. You can also use the Data set/Random button to add noise uniformly distributed on the work desk

Just for comparison, below are the self organizing map (SOM) approximations of the same dataset with 10 nodes (FVU=13.71%), with 15 nodes (FVU=5.50%) and with 20 nodes (FVU=2.62%):

Go to “First Play With SOM and GSOM” tutorial