Subroutine nesting is when one subroutine calls another. The first call’s return address is in the link register; the second call would overwrite it. To preserve both, each subroutine that makes its own calls must save the link register on entry and restore it before returning.
The stack handles this naturally — each subroutine pushes its own return address before making any calls, then pops it before returning. The LIFO discipline of the stack matches the call/return order exactly.
Why nesting needs care
Consider three subroutines: calls , which calls .
When calls , the link register holds the return address into . When then calls , the call instruction overwrites the link register with the return address into . The address into is now lost — when eventually rets, it goes back to (good), but when rets, it tries to use whatever’s in the link register, which would be… wrong.
The fix is for to save the link register before its call to :
B:
subi sp, sp, 4
stw ra, 0(sp) # save return-into-A
call C # link register now has return-into-B
ldw ra, 0(sp) # restore return-into-A
addi sp, sp, 4
ret # returns to A correctly
With this pattern, calls can nest arbitrarily deep: each subroutine pushes its own return address, the stack stacks them up, and each ret finds the right address waiting at the top of the stack.
Leaf subroutines skip the dance
A leaf subroutine is one that doesn’t call any other subroutines. It doesn’t need to save the link register — the link register stays valid throughout, and ret works directly.
This is a real optimization. Compilers detect leaf functions and skip the prologue/epilogue overhead. Calling a leaf is significantly cheaper than calling a non-leaf for that reason.
Recursion
Recursion is the special case where a subroutine calls itself. The same nested-call mechanism applies — each recursive call’s return address is saved on the stack, and each return pops back to the previous level.
Recursion depth is limited by stack size. A recursive function that descends 10,000 levels needs 10,000 saved return addresses on the stack — plus any saved registers and locals per call. Deep recursion can blow the stack (stack overflow).
That’s why iterative versions of recursive algorithms exist — primarily to handle very deep recursion that would exhaust the stack, and only secondarily to avoid per-call overhead. The main motivation is correctness on inputs deep enough to overflow.
Tail calls
A tail call is a call where the calling subroutine has nothing left to do after the called one returns — ret’s return value is the caller’s return value too. In that case, the new subroutine could just reuse the current stack frame and the original caller’s return address.
Some compilers perform tail call optimization to do exactly this — replacing the call + ret sequence with a plain jmp. This makes tail-recursive functions effectively iterative, avoiding stack growth.
Nios II doesn’t have a special tail-call instruction; the optimization is up to the compiler to recognize and apply.