Double hashing

Double hashing resolves hash-table collisions by computing the probe step from a second hash function:

$index_{i} = (h_{1} (k) + i \cdot h_{2} (k)) mod M, i = 0, 1, 2, \dots$

Each key gets its own probe step $h_{2} (k)$ , so two different keys with the same initial $h_{1}$ value follow different probe sequences. This eliminates both the primary clustering of linear probing and the secondary clustering of Quadratic probing.

Closely matches uniform hashing

Knuth’s analytical model of an “ideal” hash table — uniform hashing, where every probe sequence is equally likely to be any permutation of the buckets — gives the best possible probe-count formulas:

Unsuccessful search: $\frac{1}{1 - α}$ probes on average.
Successful search: $\frac{1}{α} ln \frac{1}{1 - α}$ probes on average.

Double hashing approximates this very closely in practice when $h_{1}$ and $h_{2}$ are chosen well — the probe sequences are diverse enough to behave nearly like random permutations. So double hashing is the open-addressing scheme that comes closest to the theoretical optimum.

$α$	Unsuccessful (uniform)	Successful (uniform)
0.5	2.0	1.39
0.7	3.33	1.72
0.9	10	2.56
0.99	100	4.65

Compare with linear probing at $α = 0.9$ : 50.5 unsuccessful and 5.5 successful — five and two times worse, respectively.

Choosing $h_{2}$

Two requirements:

$h_{2} (k) \neq = 0$ for any key $k$ . A zero step would never advance.
$h_{2} (k)$ relatively prime to $M$ . Otherwise the probe sequence visits only a fraction of the table — specifically $M / g cd (h_{2} (k), M)$ buckets.

The cleanest way to guarantee both is to pick $M$ prime and define $h_{2} (k) = R - (k mod R)$ where $R$ is a prime smaller than $M$ . Then $h_{2} (k) \in [1, R]$ , never zero, never a multiple of $M$ .

A common alternative: $M = 2^{p}$ and $h_{2} (k) =$ any odd-valued hash. Odd numbers are coprime to $2^{p}$ , satisfying requirement 2.

Cost per probe

Each probe needs $h_{1} (k) + i \cdot h_{2} (k)$ computed (modulo $M$ ). The two hash computations happen once per lookup ( $h_{1}$ for the start, $h_{2}$ for the step), so the marginal cost per additional probe is just one multiply-add. That’s slightly more than Linear probing (one add per probe), so double hashing wins on probe count but loses a little on per-probe constants.

The bigger cost: probes touch scattered memory, which kills cache behaviour. On modern CPUs, two cache misses cost more than the entire computational difference between linear and double hashing — which is why production hash tables tend to use linear probing despite double hashing’s better theoretical probe count.

Double hashing’s niche today is more academic (the cleanest “approximates uniform hashing” scheme) than practical.

Worked example

Table of size $M = 11$ , $h_{1} (k) = k mod 11$ , $h_{2} (k) = 7 - (k mod 7)$ .

Insert $k = 25$ : $h_{1} = 25 mod 11 = 3$ . Bucket 3 empty → store at 3. Insert $k = 14$ : $h_{1} = 14 mod 11 = 3$ . Bucket 3 occupied. $h_{2} = 7 - 0 = 7$ . Try bucket $(3 + 7) mod 11 = 10$ . Empty → store at 10. Insert $k = 36$ : $h_{1} = 36 mod 11 = 3$ . Bucket 3 occupied. $h_{2} = 7 - (36 mod 7) = 7 - 1 = 6$ . Try bucket $(3 + 6) mod 11 = 9$ . Empty → store at 9.

Three keys with the same $h_{1} = 3$ , three different probe sequences. Compare with linear probing where all three would have piled into buckets 3, 4, 5 — extending a cluster.

Deletion

Same tombstone trick: replace deleted entries with a sentinel that lookups skip but inserts may reuse.

In context

Double hashing is the third and most sophisticated open-addressing scheme; the simpler ones are Linear probing and Quadratic probing. The closed-address alternative is Separate chaining.

For full comparison and the parent table-design discussion, see Hash table.

Idriss Rami — Notes

Explorer

Double hashing

Closely matches uniform hashing

Choosing $h_{2}$

Cost per probe

Worked example

Deletion

In context

Graph View

Table of Contents

Backlinks

Idriss Rami — Notes

Explorer

Double hashing

Closely matches uniform hashing

Choosing h2​

Cost per probe

Worked example

Deletion

In context

Graph View

Table of Contents

Backlinks

Choosing $h_{2}$