Label noise is disagreement, error, or inconsistency in the labels of training data. Real labels are rarely perfect. Annotators get distracted, careless, or just plain wrong; experts disagree on edge cases; the ground truth itself may be ambiguous. A model trained on noisy labels learns the noise along with the signal, which limits how well it can perform regardless of how clever the architecture is.
Several techniques address label noise, all stemming from the same idea: don’t trust any single label completely.
Majority voting asks multiple annotators to label the same example and takes the most common answer as the consensus. Any individual annotator might be wrong, but the law of large numbers makes the aggregate more reliable. This is the standard approach when crowdsourced labelling is the source of the labels.
Confidence-weighted labelling generalizes voting by weighting each annotator’s vote by how trustworthy they’ve been on similar tasks. An annotator with a track record of accurate answers gets more weight; a careless one gets less. This requires some prior knowledge of annotator quality — typically built up through gold-standard test items.
Active learning flips the usual order: instead of labelling a random subset of the data, the model identifies the examples it’s most uncertain about and asks human experts to label exactly those. Most of the labelling budget goes toward the examples where labels would teach the model the most.
Beyond technique, label quality is partly a sourcing question. Labels from a domain expert (a radiologist, a marine biologist) are slow and expensive but accurate. Labels from crowdsourcing are fast and cheap but noisy. Labels from Automated labelling — using a pre-trained model to label new data — are essentially free but propagate every error the labelling model makes. The principle the textbook returns to: quality and representativeness beat coverage — a thousand carefully labelled examples that span the variation we care about usually beat ten thousand sloppy ones.