Lecture 15 - PCA and ICA | Stanford CS229: Machine Learning Andrew Ng - Autumn 2018

artesia · 26 February 2026 00:34

In this lecture, Andrew Ng explains Principal Component Analysis (PCA) as a method for reducing the dimensionality of data by projecting it onto directions of maximum variance, and discusses its mathematical foundations, practical uses, and limitations. He also introduces Independent Component Analysis (ICA), which aims to separate mixed signals into their independent sources, setting up a deeper exploration of ICA in the following lecture.

artesia · 26 February 2026 00:54

In this lecture, Andrew Ng continues the discussion on unsupervised learning, focusing on Principal Component Analysis (PCA) and introducing Independent Component Analysis (ICA). He begins by explaining the motivation behind PCA: reducing the dimensionality of high-dimensional data while preserving as much variability as possible. Ng uses simple examples, such as measuring children’s heights in both centimeters and inches, to illustrate how data that appears high-dimensional may actually lie on a lower-dimensional subspace. The goal of PCA is to identify this subspace and project the data onto it, thereby simplifying the data and potentially reducing noise.

Ng then delves into the mathematical formulation of PCA. He explains that PCA seeks the directions (principal axes) along which the data varies the most, which are found by computing the eigenvectors of the covariance matrix of the data. The principal eigenvector corresponds to the direction of maximum variance, and projecting the data onto this vector minimizes the sum of squared distances between the original data points and their projections. He emphasizes the importance of preprocessing the data by centering (subtracting the mean) and scaling (standardizing the variance) before applying PCA, as this ensures that the principal components are not unduly influenced by differences in scale or offset.

The lecture also covers practical aspects and applications of PCA. Ng highlights that PCA is especially useful for visualization of high-dimensional data (e.g., reducing neural recordings from 50 dimensions to 3 for visualization), and for compressing data to make subsequent learning algorithms more efficient. However, he cautions against using PCA indiscriminately, particularly for reducing overfitting or for tasks like outlier detection and face recognition, where its effectiveness is inconsistent. He notes that while PCA can sometimes help in these scenarios, regularization or other domain-specific methods are often preferable.

Ng addresses common questions about PCA, such as how to choose the number of principal components (K) to retain. He explains that a typical approach is to select K such that a desired proportion (e.g., 90% or 95%) of the total variance is retained. He also clarifies that the principal components should be computed using only the training data, and then applied to both training and test data for consistency. Additionally, he discusses the instability of individual eigenvectors, recommending that users focus on the subspace spanned by the top K eigenvectors rather than interpreting individual components.

Towards the end of the lecture, Ng introduces Independent Component Analysis (ICA) using the “cocktail party problem” as motivation. ICA aims to separate mixed signals (such as overlapping voices recorded by multiple microphones) into their independent source components. Unlike PCA, which finds uncorrelated axes of maximum variance, ICA seeks statistically independent components. Ng outlines the basic setup for ICA, where observed signals are linear mixtures of independent sources, and the goal is to recover the original sources using only the observed mixtures. He concludes by noting the ambiguities inherent in ICA (such as the order and sign of recovered sources) and sets the stage for a deeper dive into ICA in the next lecture.