Zero-replacement imputation is the simplest possible scheme for handling Missing data: every missing value gets replaced with . It’s fast and trivial to implement, and almost always the wrong choice.

The problem is that 0 is rarely a plausible estimate of what the missing value would have been. If the surrounding samples are all in the 200-400 range, dropping a 0 in among them creates an artifact more disruptive than the original gap was — a downward spike where the data should have been smooth. Any downstream model has to either learn that 0 means missing (and treat it specially) or just be confused by the artifact.

Visually, if we plotted our data points on a scatter plot, a point whose value is missing gets moved down to the x-axis. For a signal that’s already centered around zero — already normalized — zero-replacement is less catastrophic, because 0 is in the data’s natural range. For raw signals that aren’t centered, zero-replacement creates a clearly artificial value.

When zero-replacement is actually OK:

  • The missing rate is very low and any downstream model is robust to the occasional artifact.
  • The signal is already normalized to zero mean.
  • 0 carries the same semantic meaning as missing in the application (a count of events that didn’t happen, for instance).

When it isn’t:

  • The signal has a meaningful baseline far from 0.
  • The downstream model is sensitive to outliers.
  • Multiple consecutive samples are missing, producing a long flat segment of zeros that looks like nothing in the real signal.

In Pandas:

df.fillna(0)                       # all NaN → 0
df.fillna({'col1': 0, 'col2': 1.5}) # different fill per column

For better alternatives, see Sample-and-hold imputation (repeats the previous valid value), Linear interpolation (averages the two neighbors), or Non-linear interpolation (fits a smooth curve through several neighbors). Imputation gives the bigger picture of when each method applies.