Dimensionality reduction

Dimensionality reduction is any transformation that takes data with many features and produces a representation with fewer features while preserving as much useful structure as possible. It’s the standard response to the Curse of dimensionality: the problems that arise when data has more dimensions than humans can visualize, more dimensions than distances are meaningful in, or more dimensions than the available training data can support.

The setup: original data lives in $N$ dimensions. We want a new representation in $M$ dimensions where $M ≪ N$ , ideally $M = 2$ or $3$ so we can plot the result on a page.

Reducing dimensions necessarily loses information. Project a three-dimensional cone onto a flat plane and you get a circle. Project a three-dimensional cylinder onto the same plane and you also get a circle. The original objects were different, the projections are identical, and we’ve lost what distinguished them. The art of dimensionality reduction is keeping the useful information while compressing the dimensions.

What counts as useful information? In statistics and machine learning, the variance of the data is often treated as a proxy for its information content. High variance means a lot of information: the points are spread out, their positions meaningfully distinct. Low variance means little, the points clumped together and indistinguishable from one another.

The two dimensionality-reduction methods this textbook covers:

Principal Component Analysis (PCA) is linear. It finds orthogonal directions of decreasing variance and projects the data onto the first few. PCA preserves global linear structure and is fast.
t-SNE (t-distributed Stochastic Neighbor Embedding) is non-linear. It preserves local neighborhood structure, so points that were close in the original space stay close in the embedding. Slower than PCA, often more visually striking.

Both are unsupervised — they don’t use class labels. Both produce 2D or 3D embeddings that can be plotted as scatter plots for visual inspection. They serve different purposes: PCA gives you the directions where data varies most; t-SNE gives you cluster structure.

There are many alternatives beyond these two: UMAP (a faster, often clearer alternative to t-SNE), autoencoders (neural-network-based, learn arbitrary nonlinear reductions), Isomap, LLE, factor analysis. The framework is the same in every case. High-dimensional data goes in, lower-dimensional representation comes out, with some notion of useful structure preserved.

Idriss Rami — Notes

Explorer

Dimensionality reduction

Graph View

Backlinks