Page table

A page table is the data structure that the virtual memory system uses to translate virtual page numbers into physical frame numbers. The hardware MMU consults it on every memory access; the OS maintains it on every page allocation, eviction, or process switch.

The whole reason “virtual memory” works is that this table exists. A program issues ldw r2, 0x1000. The MMU strips off the page-offset bits, looks the rest up in the current page table, finds (or doesn’t find) a physical frame, and reissues the access against the translated physical address. Without the page table the program would just hit raw RAM.

Page table entry (PTE)

For 32-bit virtual addresses with 4 KB pages, the address splits as

[ 20-bit virtual page number | 12-bit page offset ]

The page table has $2^{20} = 1$ M entries — one per virtual page. Each entry (PTE) typically contains:

Physical frame number — the high-order bits of the physical address this virtual page maps to.
Valid / present bit — is the page currently in physical RAM?
Dirty / modified bit — has the page been written since it was loaded? (Tells the OS whether to write back to disk on eviction.)
Accessed / referenced bit — has the page been touched recently? (Used by replacement policies like clock/second-chance to approximate LRU.)
Permission bits — read / write / execute / user-vs-kernel.
Cacheability / write-back hints — cache policy for this page.

A typical x86-64 PTE is 8 bytes wide; a typical 32-bit PTE is 4 bytes wide.

The naive flat layout doesn’t scale

For 32-bit virtual addresses, a flat page table needs $2^{20} \times 4 bytes = 4$ MB per process. Annoying but tolerable.

For 64-bit virtual addresses with 4 KB pages, a flat page table would need $2^{52} \times 8 bytes = 32$ PB per process. Catastrophically impossible. And most of those PTEs would be empty — typical processes use a tiny fraction of the address space.

The fix is to not allocate page table memory for unmapped regions. Three standard approaches.

Multi-level (hierarchical) page table

Split the virtual page number into multiple fields, each indexing a separate level of the table. Only allocate sub-tables that actually contain mapped pages.

For 32-bit x86 with 4 KB pages, the standard 2-level layout splits:

[ 10-bit dir index | 10-bit table index | 12-bit offset ]

Page directory — $2^{10} = 1024$ entries, one per “directory slot.” Each entry points to a second-level page table or is marked empty.
Page table (level 2) — also $2^{10} = 1024$ entries, one per page in that 4 MB region.

Translation walks the levels:

Index the page directory with the top 10 bits of the virtual address. If the entry is empty → page fault.
Follow the pointer to a second-level table. Index it with the next 10 bits.
Read the PTE. If valid → concatenate frame number with the 12-bit offset to form the physical address.

A process with one mapped 4 MB region uses one directory page (4 KB) plus one second-level table (4 KB) — 8 KB total instead of 4 MB. For sparse address spaces, multi-level tables are dramatically cheaper.

x86-64 uses 4 levels (PML4 → PDPT → PDE → PTE), each 9 bits wide, covering 48 of the 64 virtual bits. ARM64 and RISC-V use similar 3- or 4-level schemes. Newer x86-64 chips support an optional 5-level layout for larger address spaces.

Inverted page table

Index by physical frame number instead of virtual page number. The table has one entry per physical frame, not per virtual page — so its size is bounded by RAM, not by virtual address space.

Each entry stores the (process ID, virtual page number) currently occupying that frame. Lookup hashes (PID, VPN) to find the matching entry.

Pro: size scales with physical RAM, not virtual address space — an excellent fit for 64-bit systems.
Con: lookup needs a hash, not a direct index. Hash collisions add probes. The TLB carries most of the load, but TLB misses are slower than with hierarchical tables.

PowerPC, IA-64, and UltraSPARC have used inverted page tables. x86 sticks with hierarchical.

Hashed page table

A compromise: hash the virtual page number to find the PTE, with a chain for collisions. Each chained entry holds (VPN, frame number, attributes). Better cache behaviour than fully-inverted tables for sparse address spaces.

Page table walk

Software-managed (MIPS, early Alpha): TLB miss raises an exception, the OS walks the table in software, installs the PTE in the TLB.

Hardware-managed (x86, ARM, RISC-V): the MMU itself walks the table on a TLB miss. Faster for the common case; less flexible for unusual page table formats.

A walk costs one memory access per level: 4 accesses for x86-64’s 4 levels, before you can satisfy the original access. The TLB caches the result so the walk only happens on a miss.

Per-process page tables

Each process has its own page table, so virtual address 0x1000 in process A and process B map to different physical frames. The CPU has a register (CR3 on x86, TTBR on ARM) holding the physical address of the current process’s page directory. A context switch reloads this register; the TLB is flushed (or partially flushed via PCID/ASID tagging) to prevent stale translations.

Shared page table entries

Several processes can map the same physical frame — used for shared libraries (one libc in RAM, many processes mapping it), shared memory IPC, and copy-on-write fork (parent and child initially share all pages read-only; on the first write, the OS copies the page).

Where it lives

The page table is itself in main memory — the same RAM the program reads from. Some PTEs may temporarily live in the TLB. The page directory lives at a known physical address; the OS sets the CR3 / TTBR register to point at it.

Self-referential mapping (“recursive page tables”) is a clever trick: map the directory into itself, so the OS can read or modify PTEs through the regular virtual-address mechanism rather than by raw physical reads. Used in some kernels for simpler PTE manipulation.

Page table vs TLB vs page fault

Three related concepts often confused:

The page table is the data structure in memory — the source of truth.
The TLB is a cache of recent page-table lookups, kept in MMU hardware.
A page fault is the event raised when a PTE is invalid (page not in RAM). The OS handles it by paging in from disk and updating the PTE.

A TLB miss is fast (re-walk the table). A page fault is slow (millions of cycles for a disk read).

Idriss Rami — Notes

Explorer