First Play With PCA

Click on the “Data set” button  Use the “Scatter” button with various “Caliber” and “Number of points” values; in addition, you may add several random points using the button “Random”.

Prepare the cloud of data points  Go to “PCA” and select by two mouse clicks two points for the initial approximation of the first principal components.

Click on the “Learn” button. The four-ray star will be placed at the mean point. This is not yet a result. To watch see the result and the transformation of the straight line from the initial approximation to the principal component, go to “Learning history”. The history of iterations will be demonstrated step by step:      In this example, the iterations converge in seven steps. The result is on the last figure. The changes at the last three steps are tiny and we have omitted here the sixth step.

A task for exploration. Usually, the iterations converge in 5-10 steps. Can you find situations (data sets and initial approximations) when the iterations converge in more steps? What is the maximum you can achieve in your examples? If you can invent a data set and initial approximation for which the PCA learning takes 20 or more steps then you understand the nature of the PCA and learning algorithm. Please play. You can also organize a competition: who can invent an example with the maximal number of iterations?

You can find the accuracy of the approximation of the data set by the first principal component. For this purpose, use the button “Show error” and find the value of fraction of variance unexplained (FVU) in the bottom right corner. This notion is illustrated below. For each data point, the deviation is the distance from the mean point. The distance from the projection on the principal component line to the mean point is the explained deviation. The distance from the data point to the straight line of the first principal component is the unexplained deviation. The ratio of the sum of squares of these unexplained deviations to the sum of squares of the deviations from the mean point is the FVU. Usually, it is measured in %. You can use the standardized data sets: click the button “Std. data”. Clean the screen and select one of the sets, for example “S horiz.”: For smearing of this image, you can use the “Scatter” procedure. Choose the scattering radius and the number of points to add to every point in a circle with this radius:

: After you click “OK”, the smeared image appears: It may be interesting to use this operation several times with different radii and numbers of points and prepare a sort of “halo” around the image: If you add randomly one point to each existent data point in a circle of radius 15, you get the following figure: Initiate PCA with two mouse clicks, and learn:  In our example, the PCA algorithm converges in 6 steps. Go to the Learning history and look on this history, step by step. Below are the first, second and sixth steps. The fraction of variance unexplained (FUV) is 18.80%:   Play with the smeared standard images and explore the possibility of the PCA approximation. You can also use the Data set/Random button to add noise uniformly distributed on the work desk

Just for comparison, below are the self organizing map (SOM) approximations of the same dataset with 10 nodes (FVU=13.71%), with 15 nodes (FVU=5.50%) and with 20 nodes (FVU=2.62%):   Go to “First Play With SOM and GSOM” tutorial