Linear interpolation

Linear interpolation estimates a missing or intermediate value by drawing a straight line between the two neighboring known values and reading the new value off the line. If sample 4 has value $y_{4}$ and sample 6 has value $y_{6}$ , and sample 5 is missing, the linear interpolation for sample 5 is

$y_{5} = y_{4} + \frac{y _{6} - y _{4}}{6 - 4} \cdot (5 - 4) = \frac{y _{4} + y _{6}}{2}$

the average of the two neighbors when they’re equally spaced from the missing sample. For general spacing $x_{a}, x_{b}$ around the missing point $x$ :

$y_{x} = y_{a} + \frac{y _{b} - y _{a}}{x _{b} - x _{a}} \cdot (x - x_{a})$

This is the equation of the straight line through $(x_{a}, y_{a})$ and $(x_{b}, y_{b})$ , evaluated at $x$ .

More accurate than sample-and-hold when the signal varies smoothly: a straight line through both neighbors usually gets closer to the true value than copying one neighbor would. It’s the right default for Imputation of moderately smooth signals.

Less accurate than Non-linear interpolation when the signal has natural curvature, like ECG and EEG signals, where a straight line through two neighbors cuts off the curve between them. For wide gaps and curved signals, fitting a polynomial or spline through several neighbors recovers more of the true shape.

In Pandas:

df.interpolate(method='linear', inplace=True)

This walks each column and fills missing values by linear interpolation between the surrounding non-missing values. The result for a previously-missing pH cell of 5.0 (from a constant fill) might instead come out as 3.219…, closer to what the column’s actual distribution looks like.

One catch: interpolate doesn’t fill missing values at the very start of a column, because it has no value on one side to interpolate from. The first row remains NaN if the column starts with a gap. The limit_direction='both' argument can extend the fill toward the boundary, but the values there are extrapolation rather than interpolation.

Linear interpolation is also the basic operation behind drawing a Line graph: successive samples are connected by straight lines for visualization, the same operation as filling a missing value between two neighbors.

Idriss Rami — Notes

Explorer

Linear interpolation

Graph View

Backlinks