Sample-and-hold imputation

Sample-and-hold imputation fills a missing value by repeating the most recent valid sample. If sample 5 is missing, we copy the value from sample 4 into the gap. Visually, the missing point gets pulled to the same height as its predecessor.

This beats zero-replacement because it produces a value that’s at least plausible: the imputed value matches the local context. The implicit assumption is the signal didn’t change much in the last sample, roughly true for slowly-varying signals at high sample rates. An ECG at 500 Hz changes very little from one sample to the next; an Accelerometer in a quiet environment is nearly constant; a temperature reading doesn’t jump.

The assumption breaks down when the signal is fast-varying or the gap is long. Repeating the same value across many consecutive missing samples produces a flat plateau that doesn’t look like anything in the real signal. The plateau is sometimes acceptable (downstream code knows to flag it) and sometimes corrosive (the model learns that flat plateaus mean some particular class).

In Pandas, sample-and-hold is forward fill (ffill):

df.fillna(method='ffill')         # forward fill: copy previous value
df.fillna(method='bfill')         # backward fill: copy next value
df.fillna(method='ffill', limit=3) # only fill gaps of up to 3 samples

Forward fill is the natural choice for a causal signal, since we know what happened up to and including the missing time. Backward fill is occasionally useful for offline analysis, where we know what came after the gap. The limit= parameter prevents long gaps from being filled with one repeated value; sometimes you’d rather leave the gap visible than smear a stale value over too long a span.

For smoother imputation that uses both neighbors, see Linear interpolation. Imputation gives the overall framework.

Idriss Rami — Notes

Explorer

Sample-and-hold imputation

Graph View

Backlinks