Separate chaining

Separate chaining is a Hash table collision-resolution scheme where each bucket holds a Linked list (or another data structure) of all keys that hash to it. Insertion always succeeds — just append to the appropriate bucket’s list. Lookup hashes the key, walks the bucket’s list, returns when the key is found.

The contrast: open-addressing schemes (Linear probing, Quadratic probing, Double hashing) store one key per bucket and probe other buckets when collisions occur. Separate chaining keeps all colliding keys in one bucket, in a chain.

The terminology is confusing: separate chaining is sometimes called “open hashing” (the bucket list is open-ended) while open-addressing is sometimes called “closed hashing” (each bucket is closed to one key). Don’t read too much into the names — they were chosen separately by different authors and never reconciled.

Operations

typedef struct Node {
    char *key;
    void *value;
    struct Node *next;
} Node;
 
Node *table[M];   // M buckets, each pointing to head of a linked list

Insert(k, v):

Compute $i = h (k)$ .
Walk table[i]’s list. If a node with key $k$ exists, replace its value.
Otherwise, allocate a new node with $(k, v)$ and prepend to table[i].

Lookup(k):

Compute $i = h (k)$ .
Walk table[i]’s list, comparing keys.
Return the value if found, otherwise indicate absent.

Delete(k):

Compute $i = h (k)$ .
Walk table[i]’s list, find the node, unlink it, free.

Deletion is straightforward — no tombstones needed, unlike open-addressing schemes.

Worked example

Table size $M = 11$ , $h (k) = k mod 11$ . Insert keys $1, 3, 11, 25, 101$ :

$1 mod 11 = 1$ → bucket 1’s list: [1].
$3 mod 11 = 3$ → bucket 3’s list: [3].
$11 mod 11 = 0$ → bucket 0’s list: [11].
$25 mod 11 = 3$ → bucket 3’s list: [25, 3] (prepend, or append — both fine).
$101 mod 11 = 2$ → bucket 2’s list: [101].

Bucket 3 has a chain of two; the rest have one or zero. Lookup for 25 hashes to bucket 3, walks the chain (one comparison), found.

Probe-count analysis

Define load factor $α = n / M$ where $n$ is items, $M$ is buckets. Note: with chaining, $α$ can exceed 1 — having more items than buckets is fine, you just get longer chains.

Average chain length $= α$ . So:

Unsuccessful search: $Θ (1 + α)$ probes — hash, then walk the entire chain to confirm absence.
Successful search: $Θ (1 + α /2)$ probes on average — hash, then walk half the chain on average before finding the target.

Both are constant as long as $α$ stays moderate (say, $\leq 1$ ). The worst case (every key in the same bucket) is $O (n)$ , but with a good hash function this is vanishingly unlikely except under adversarial input — see Hash table for hash flooding.

Pros vs. open addressing

Pros of separate chaining:

No deletion complications. No tombstones; just unlink the node.
Load factor can exceed 1. Doesn’t refuse insertion when “full.” (Open-addressing tables must resize before $α = 1$ since there are no empty buckets.)
Easier to implement correctly. Lookup, insert, delete all reduce to linked-list operations.
More forgiving of bad hash functions. Long chains slow you down but don’t break correctness.

Cons:

Cache behaviour. Linked-list nodes are scattered in memory. Walking a chain typically incurs one cache miss per node, while open-addressing’s contiguous probe sequence often hits a single cache line.
Per-node memory overhead. Each entry needs a next pointer (8 bytes on 64-bit) plus heap-allocator overhead (typically 16+ bytes for malloc bookkeeping). Open addressing only stores the key + value.
More memory overall for the same load factor.

Variants

Chains as dynamic arrays instead of linked lists. Better cache; harder to delete from the middle.
Chains as balanced trees (AVL tree or red-black). Java’s HashMap “treeifies” buckets that grow past 8 entries — chain operations become $O (lo g k)$ rather than $O (k)$ , mitigating worst-case behaviour under hash flooding.
Cuckoo hashing. Two hash functions, two locations per key; on collision, kick out the existing key and re-insert it at its alternate location, recursively. Worst-case $O (1)$ lookup (only ever check two buckets). Insertion can fail and trigger a rebuild on bad luck.

When to use

Choose separate chaining when:

You expect highly variable load (sometimes empty, sometimes overloaded).
You can’t afford to resize aggressively.
Deletion is frequent (no tombstone cleanup).
Adversarial input is a concern (treeified chains contain damage).

Choose open addressing when:

Cache performance matters and the working set fits in cache.
Memory overhead matters (no per-entry pointers).
Load is well-controlled and resizing is acceptable.

For specific open-addressing schemes, see Linear probing, Quadratic probing, Double hashing. For the parent comparison, see Hash table.

Idriss Rami — Notes

Explorer