Linear interpolation estimates a missing or intermediate value by drawing a straight line between the two neighboring known values and reading the new value off the line. If sample 4 has value and sample 6 has value , and sample 5 is missing, the linear interpolation for sample 5 is

— the average of the two neighbors when they’re equally spaced from the missing sample. For general spacing around the missing point :

This is the equation of the straight line through and , evaluated at .

Linear interpolation is more accurate than sample-and-hold when the signal varies smoothly: a straight line through both neighbors usually gets closer to the true value than just copying one neighbor would. It’s the right default for Imputation of moderately smooth signals.

It’s less accurate than Non-linear interpolation when the signal has natural curvature — ECG and EEG signals, for instance, where a straight line through two neighbors cuts off the curve between them. For wide gaps and curved signals, fitting a polynomial or spline through several neighbors recovers more of the true shape.

In Pandas:

df.interpolate(method='linear', inplace=True)

This walks each column and fills missing values by linear interpolation between the surrounding non-missing values. The result for a previously-missing pH cell of 5.0 (from a constant fill) might instead come out as 3.219… — closer to what the column’s actual distribution looks like.

A subtlety: interpolate doesn’t fill missing values at the very start of a column, because it has no value on one side to interpolate from. The first row remains NaN if the column starts with a gap. The limit_direction='both' argument can extend the fill toward the boundary, but the values there are technically extrapolation rather than interpolation.

Linear interpolation is also the basic operation behind drawing a Line graph — successive samples are connected by straight lines for visualization, which is exactly the same operation as filling a missing value between two neighbors.