The directional derivative of a scalar function at a point, in the direction of a unit vector , is the rate of change of along that direction:
where is the angle between (gradient) and . The dot-product form is the workhorse for computation; the form is the geometric interpretation.
What it means
The partial derivative tells you how changes when you move in . The directional derivative generalizes this to any direction : it tells you how fast changes when you walk in the direction , per unit distance traveled.
For : . For : . For arbitrary :
The partial derivatives along the three coordinate axes are components of the gradient; the directional derivative in an arbitrary direction is the projection of the gradient onto that direction.
Why must be a unit vector
The formula assumes — “rate per unit distance.” If you use a non-unit vector in the same formula, the result would be scaled by . To express directional derivative for a non-unit :
Forgetting to normalize is one of the more common errors in introductory vector calculus.
Geometric interpretation: steepest ascent
Writing :
- Maximized at (i.e., pointing along ): . This is the “steepest ascent” direction.
- Zero at ( perpendicular to ): . Moving along a level surface (where is constant) means .
- Minimized at ( antiparallel to ): . The “steepest descent” direction, foundational for Gradient descent.
The gradient is the steepest-ascent vector; its magnitude is the steepest slope; the directional derivative in any other direction is just the projection.
Worked example
Take .
Gradient:
At point :
Directional derivative in the direction of . First normalize: , so .
Interpretation: walking from in the direction at unit speed, increases at rate per unit distance.
The maximum possible rate at this point: , in the direction .
Connection to chain rule
If is a parametrized curve passing through a point at with , then
This is the chain rule. The directional derivative is the special case where is a unit vector — measuring rate of change “per unit arc length” along the curve, at the starting point. Otherwise the rate is scaled by the speed .
In machine learning
In gradient-based optimization, the “step direction” question is: in which should we update parameters to decrease the loss fastest? Answer: . The negative gradient direction has directional derivative , the most-negative possible value. See Gradient descent.
In adaptive methods (Adam, RMSprop, etc.), the effective at each step is a more complex function of past gradients and squared gradients, but the underlying concept — “move in the direction of greatest local decrease” — is the directional-derivative principle.
In electromagnetics
The directional derivative appears wherever you want “the rate of change of a field along a specific direction” — typically along a path, a boundary, or a streamline. The line integral can be viewed as integrating along the path (where is the unit tangent and ): the integral accumulates the rate of potential change along the path, giving the total potential difference.