The curse of dimensionality is the bundle of difficulties that arise when working with high-dimensional data. Three of them recur often enough to be worth naming:
- We can’t visualize past three dimensions. Humans can intuitively comprehend , , , maybe time as a fourth, and that’s the end. Anything beyond that has to be projected down to a 2D or 3D representation before it can be looked at on a page or screen.
- Distances mean less. In a high-dimensional space, almost all pairs of randomly-chosen points are roughly the same distance apart. The familiar notion that close points are similar breaks down: there are no close points; everything is moderately far.
- Models need exponentially more data to fill the space. A 1D model can be characterized from 10 well-spaced samples. A 10D model would need samples spaced equally densely. Real data is never anywhere near that dense, so models in high dimensions are always working with sparse coverage and have to extrapolate aggressively.
Real engineering data is often much higher-dimensional than we’d hoped. An IMU has 9 channels; instrumenting 12 sensors on a person gives 108 dimensions per time sample. An EEG headset with 64 channels gives 64 dimensions per sample. Even a simple table with columns for height, weight, hair color, eye color, and gender already has 5 dimensions before any encoding.
A concrete geometric flavour for the distance-concentration point: the volume of the unit hypersphere relative to the unit hypercube it sits inside shrinks to zero as dimension grows. In 2D the disk fills of the square; in 3D the ball fills of the cube; in 10D the hypersphere fills about of the hypercube; by 20D it’s essentially zero. Random points in a high-dimensional cube almost always land in the “corners” — far from the center — and the pairwise distances between them concentrate around a single value.
The standard response is Dimensionality reduction: take data living in dimensions and produce a lower-dimensional representation in dimensions, where is much less than . Ideally or so the result can be plotted. The two methods this textbook covers are Principal Component Analysis (linear, preserves global variance) and t-SNE (non-linear, preserves local neighborhoods).
Reducing dimensions necessarily loses information — projecting a cone and a cylinder onto the same flat plane both give circles, and the distinction is gone. The art is preserving the information that matters. Note that dimensionality reduction only cures the curse when the data actually lives on a low-dimensional manifold inside the high-dimensional space (which is common but not guaranteed). If the data genuinely needs all dimensions, no projection can recover what’s lost — the curse is structural to the problem, not just a representational inconvenience.