Double hashing resolves hash-table collisions by computing the probe step from a second hash function:
Each key gets its own probe step , so two different keys with the same initial value follow different probe sequences. This eliminates both the primary clustering of linear probing and the secondary clustering of Quadratic probing.
Closely matches uniform hashing
Knuth’s analytical model of an “ideal” hash table — uniform hashing, where every probe sequence is equally likely to be any permutation of the buckets — gives the best possible probe-count formulas:
- Unsuccessful search: probes on average.
- Successful search: probes on average.
Double hashing approximates this very closely in practice when and are chosen well — the probe sequences are diverse enough to behave nearly like random permutations. So double hashing is the open-addressing scheme that comes closest to the theoretical optimum.
| Unsuccessful (uniform) | Successful (uniform) | |
|---|---|---|
| 0.5 | 2.0 | 1.39 |
| 0.7 | 3.33 | 1.72 |
| 0.9 | 10 | 2.56 |
| 0.99 | 100 | 4.65 |
Compare with linear probing at : 50.5 unsuccessful and 5.5 successful — five and two times worse, respectively.
Choosing
Two requirements:
- for any key . A zero step would never advance.
- relatively prime to . Otherwise the probe sequence visits only a fraction of the table — specifically buckets.
The cleanest way to guarantee both is to pick prime and define where is a prime smaller than . Then , never zero, never a multiple of .
A common alternative: and any odd-valued hash. Odd numbers are coprime to , satisfying requirement 2.
Cost per probe
Each probe needs computed (modulo ). The two hash computations happen once per lookup ( for the start, for the step), so the marginal cost per additional probe is just one multiply-add. That’s slightly more than Linear probing (one add per probe), so double hashing wins on probe count but loses a little on per-probe constants.
The bigger cost: probes touch scattered memory, which kills cache behaviour. On modern CPUs, two cache misses cost more than the entire computational difference between linear and double hashing — which is why production hash tables tend to use linear probing despite double hashing’s better theoretical probe count.
Double hashing’s niche today is more academic (the cleanest “approximates uniform hashing” scheme) than practical.
Worked example
Table of size , , .
Insert : . Bucket 3 empty → store at 3. Insert : . Bucket 3 occupied. . Try bucket . Empty → store at 10. Insert : . Bucket 3 occupied. . Try bucket . Empty → store at 9.
Three keys with the same , three different probe sequences. Compare with linear probing where all three would have piled into buckets 3, 4, 5 — extending a cluster.
Deletion
Same tombstone trick: replace deleted entries with a sentinel that lookups skip but inserts may reuse.
In context
Double hashing is the third and most sophisticated open-addressing scheme; the simpler ones are Linear probing and Quadratic probing. The closed-address alternative is Separate chaining.
For full comparison and the parent table-design discussion, see Hash table.