StandardScaler is the scikit-learn class that implements Normalization — subtracting the mean and dividing by the standard deviation, column by column, so each feature ends up with mean 0 and standard deviation 1.
Important detail: StandardScaler divides by the population standard deviation (ddof=0, divisor ), not the sample one. The scikit-learn docs are explicit about this: “We use a biased estimator for the standard deviation, equivalent to numpy.std(x, ddof=0).” This matters if you ever compare StandardScaler’s output to a scaling done in Pandas, whose .std() defaults to ddof=1 (, Bessel’s correction). The two will disagree slightly for small samples; for large the difference is negligible and doesn’t affect model behaviour. StandardScaler scales to mean 0 and (population) standard deviation 1 — not to the range ; that’s MinMaxScaler.
The standard usage:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train_scaled = sc.fit_transform(X_train)
X_test_scaled = sc.transform(X_test)The .fit(X) call measures the per-column means and standard deviations from X and stores them on the scaler object. The .transform(X) call applies the stored parameters to scale X. The .fit_transform(X) call does both at once — fits the scaler and transforms X — which is convenient when you have just one dataset, dangerous when you have separate training and test sets.
The critical discipline: fit_transform on the training set only, then transform on the test set. Never fit_transform on the test set (it would re-fit the scaler to the test data, leaking test information into the model) and never fit_transform on the entire dataset before the test split (same problem, harder to notice). This is the canonical example of Data leakage and the canonical illustration of why scikit-learn separates fit from transform as different operations.
Inside a scikit-learn pipeline, the discipline is automatic:
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import LogisticRegression
clf = make_pipeline(StandardScaler(), LogisticRegression(max_iter=10000))
clf.fit(X_train, y_train) # scaler fits on X_train, then transforms it
y_pred = clf.predict(X_test) # scaler transforms X_test with stored paramsThe pipeline’s .fit() fits the scaler on training data and passes the scaled data to the classifier. The pipeline’s .predict() transforms test data with the already-fit scaler and passes it through. There’s no way for test data to leak into the scaler’s fit.
For variants of scaling beyond zero-mean/unit-variance, scikit-learn has MinMaxScaler (scales to a fixed range like ), RobustScaler (uses median and interquartile range, more robust to outliers), and MaxAbsScaler (scales by the maximum absolute value). StandardScaler is the right default for most contexts; the alternatives address specific problems.