The arithmetic and logic unit (ALU) is the part of the processor that actually performs arithmetic and bitwise operations on integer binary numbers. Inputs come from registers (operands) via the datapath; outputs go back to a register. The ALU itself is a Combinational circuit — its result for any pair of inputs is fixed by the operation code, no internal state.

The operations an ALU supports: addition, subtraction, multiplication, division, bitwise AND, OR, XOR, NOT, shifts, comparisons. The specific set is part of the processor’s ISA.

In a RISC datapath, operands arrive in inter-stage registers (RA, RB) at the start of the Execute stage, the ALU computes its result in one cycle, and the output lands in RZ for the Memory stage to use.

Status flags

Most ALUs maintain status flags alongside their numeric output. These let later instructions branch on the most recent computation’s result without re-comparing.

  • N — negative: if the result is negative, if positive.
  • Z — zero: if the result is zero, if non-zero.
  • C — carry: if there’s a carry-out of the most significant bit, otherwise.
  • V — overflow: if signed overflow occurred, otherwise.

For an -bit ALU, the flag equations:

where are sum bits and are carry bits ( is the carry into the sign bit, is the carry out of the sign bit).

Why the XOR detects signed overflow: in 2’s complement, adding two same-signed numbers should give a same-signed result. If both operands are positive (sign bits 0), there’s no carry into the sign bit unless the sum’s magnitude bit overflows into the sign position — that flips the result negative, but no carry-out appears, so and . If both operands are negative (sign bits 1), the sign-bit column adds with carry-out , plus whatever carry came in: a carry-in of 0 gives (overflow); a carry-in of 1 gives (no overflow, sum still negative). In both overflow cases the two carries differ; in both non-overflow cases they agree. XOR captures exactly that “they differ” condition.

The flags are stored in a status register (sometimes inside the processor status register alongside the interrupt enable bit). Branch instructions like “branch if zero” or “branch if negative” read these bits directly.

Implementation

Inside the ALU is a stack of functional blocks: an Adder (often carry-lookahead for speed), bitwise gates, a shifter, and a multiplexer that picks which result becomes the output based on the instruction’s opcode field.

For multiplication and division, modern ALUs either include a dedicated multi-cycle unit or fall back on iterative algorithms. Floating-point arithmetic typically lives in a separate FPU rather than the integer ALU.

In context

In the five-unit model, the ALU plus the control unit together form the processor. In the five-stage RISC datapath, the ALU sits at stage 3 (Execute), between register read and memory access.

The “iterative” part of the historical ALU description — multiplying a multi-bit operation across an iterative bit-slice — is mostly obsolete in modern hardware where parallel adders and barrel shifters compute results in a single cycle. The instructions still appear sequentially to the programmer, but the underlying gates are wide and parallel.