Automated labelling uses a pre-trained machine-learning model to label data, and then trains a new model on the labels the old one produced. It’s fast and cheap — once the labelling model exists, applying it to millions of examples costs only compute, no human time.

The catch is that every error the labelling model makes propagates into the training data. The new model’s labels carry whatever biases and mistakes the old model had, plus whatever new errors it learns from the noisy data. Label noise is high, and it’s systematic noise — the kind that’s hardest to wash out.

Automated labelling is appropriate when:

  • We have a lot of data and a tolerance for noise.
  • No better labelling method is feasible (the dataset is too large for human annotators, the labels too specialized for crowdsourcing, the budget too small for experts).
  • A reasonably good pre-trained model exists for the task.

The strongest practical approach is usually a mixture: use automated labelling to produce a first pass, then have a human expert review and correct the labels, focusing expert time on the examples where the automated labels are least confident. This is a form of Active learning — let the model identify the cases that need human attention.

A close variant is partial labelling: instead of labelling every example, label a representative subset deeply and accurately, and rely on the model to generalize. The AVCaffe dataset was built this way, with a small set of carefully labelled examples covering the diversity of the task. The principle is the same one that recurs in labelling work: quality and representativeness beat raw coverage.