Two-pass assembler

A two-pass assembler scans the source file twice during translation, instead of once. The first pass builds a symbol table mapping every label to its address; the second pass uses the symbol table to fill in branch offsets and absolute addresses that depend on label values. The two-pass design solves the forward reference problem that single-pass assembly can’t handle in general.

The forward reference problem

A typical assembly source looks like:

START:
    bne r3, r0, LOOP    # forward reference — LOOP is below
    ...

LOOP:
    addi r4, r4, 1
    br LOOP

The first instruction references LOOP, which the assembler hasn’t seen yet. To translate the bne into machine code, the assembler needs the address of LOOP — but it doesn’t know that address until it processes the line LOOP:.

A single-pass assembler can’t fill in the offset for forward branches without knowing target addresses. Various workarounds exist (back-patching, deferred records), but they’re awkward and don’t scale to all cases.

Two-pass solution

Walk the source file twice:

Pass 1: build the symbol table

Walk every line. For each label definition (LABEL:), record LABEL → current address in the symbol table. Track the “current address” by counting bytes as you go (each instruction adds its own size).

Don’t generate any machine code yet — just build the symbol table.

After pass 1: the symbol table has every defined label and its address.

Pass 2: generate machine code

Walk the source file again, this time generating machine code. For each instruction:

If the instruction references a label, look up the label’s address in the symbol table.
Compute the offset (target − current PC) for branches, or use the absolute address.
Emit the encoded instruction with the offset filled in.

After pass 2: the object file is complete.

Worked example flow

For the source above, with each instruction taking 4 bytes starting at address 100:

Pass 1:

Line	Address	Symbol added
`START:`	100	START → 100
`bne r3, r0, LOOP`	100	(none — instruction is 4 bytes)
`...`	104	(depending on lines)
`LOOP:`	120	LOOP → 120
`addi r4, r4, 1`	120
`br LOOP`	124

Symbol table after pass 1: {START: 100, LOOP: 120}.

Pass 2:

For bne r3, r0, LOOP at address 100: look up LOOP → 120. The branch offset is 120 - (100 + 4) = 16 bytes (PC-relative addressing). Emit the encoded bne with offset = 16.

For br LOOP at address 124: look up LOOP → 120. Offset is 120 - (124 + 4) = -8. Emit the encoded br with offset = -8.

Why not always one pass

Single-pass assemblers exist and are useful for very simple ISAs or in scenarios where forward references are forbidden by convention. For real ISAs with forward branches and labels, the two-pass approach is the standard.

There are also fancier alternatives:

One-pass with back-patching: emit code with placeholders, walk back through the partially-completed output to fill them in once labels are known. Works but is complex.
Multi-pass: more than two passes, used for very complex assembly languages or when each pass simplifies the source for the next.

Two passes is the sweet spot for most assembly languages.

In context

For the broader assembly process and the assembler’s role in the toolchain, see Assembler. For the symbol resolution that the linker does across object files (as opposed to within a single source file), see Linker.

Idriss Rami — Notes

Explorer