Dimensionality reduction is any transformation that takes data with many features and produces a representation with fewer features while preserving as much useful structure as possible. It’s the standard response to the Curse of dimensionality — the problems that arise when data has more dimensions than humans can visualize, more dimensions than distances are meaningful in, or more dimensions than the available training data can support.

The setup: original data lives in dimensions. We want a new representation in dimensions where , ideally or so we can plot the result on a page.

Reducing dimensions necessarily loses information. If we project a three-dimensional cone onto a flat plane, we get a circle. If we project a three-dimensional cylinder onto the same plane, we also get a circle. The original objects were different; the projections are identical; we’ve lost what distinguished them. The art of dimensionality reduction is keeping the useful information while compressing the dimensions.

What counts as useful information? In statistics and machine learning, the variance of the data is often treated as a proxy for its information content. A dataset with high variance carries a lot of information — the points are spread out, their positions are meaningfully distinct. A dataset with low variance carries little — the points are clumped, indistinguishable from one another.

The two dimensionality-reduction methods this textbook covers:

  • Principal Component Analysis (PCA) is linear. It finds orthogonal directions of decreasing variance and projects the data onto the first few. PCA preserves global linear structure and is fast.
  • t-SNE (t-distributed Stochastic Neighbor Embedding) is non-linear. It preserves local neighborhood structure — points that were close in the original space stay close in the embedding. Slower than PCA, often more visually striking.

Both are unsupervised — they don’t use class labels. Both produce 2D or 3D embeddings that can be plotted as scatter plots for visual inspection. They serve different purposes: PCA gives you the directions where data varies most; t-SNE gives you cluster structure.

Beyond these two, the field has many alternatives — UMAP (a faster, often clearer alternative to t-SNE), autoencoders (neural-network-based, learn arbitrary nonlinear reductions), Isomap, LLE, factor analysis — but the conceptual framework is the same: high-dimensional data goes in, lower-dimensional representation comes out, with some notion of useful structure preserved.