Supervised learning is the family of machine-learning algorithms that learns from labelled data — every training example comes paired with a correct answer. This image shows an owl, this patient has heart disease, this email is spam. The model’s job is to learn a function that takes an input and predicts the correct answer for it.

Two main types of supervised task, distinguished by what kind of answer the model produces:

  • Regression — the answer is a continuous numerical value. Given a person’s age and weight, predict their blood pressure. Given a year, predict the inflation rate.
  • Classification (ML) — the answer is a discrete category. Is this email spam or not? Is this wine high quality or low? Does this patient have diabetes?

In both cases, the training procedure is roughly the same: pick a model with adjustable parameters, define a Loss function that measures how badly the model’s predictions agree with the labels, and use Gradient descent (or a closed-form solver when one exists) to find parameters that minimize the loss.

The other two families of machine learning are different in what feedback the algorithm gets:

  • Unsupervised learning uses unlabelled data — just inputs, no correct answers. The model finds structure on its own: clusters, patterns, anomalies, low-dimensional representations. PCA and k-means clustering are unsupervised.
  • Reinforcement learning is interactive — an agent takes actions in an environment, gets rewards or penalties, and learns a policy that maximizes cumulative reward. Chess engines, Go engines, and game-playing agents are trained this way.

A simple feel for the distinction: imagine someone shows us a stack of fruit photographs, some oranges and some apples. If the photographs are labelled — orange ones say “orange” — we can learn to classify a new one. That’s supervised. If they’re unlabelled, we can still sort them into piles by visual similarity — color, shape — and end up with two piles even without knowing what they’re called. That’s unsupervised.

The Introduction to Data Science textbook focuses on supervised learning: linear and polynomial Regression for continuous outputs, Logistic regression for binary classification, trained with Gradient descent on standard losses (Mean squared error for regression, Binary cross-entropy for classification).