Notes on Procedure Call 

From Chapter 9 "A Programmer's View of Computer Architecture: with assembly language examples from MIPS RISC architecture," by Goodman and Miller, 1993

(Prepared by Karren Miller, Revised by Saeid Nooshabadi)


All about Procedures
--------------------

an introduction to procedures

why have procedures?
-- reuse of code simplifies program writing
-- modular code facilitates modification
-- allows different programmers to write different parts of the
same program
-- etc.

Assembly languages typically provide little or no support for
procedure implementation.

So, we get to build a mechanism for implementing procedures out
of what we already know.




First, some terms and what we need.

In C:

{
.
.
.
x = max(x, y); CALL
.
.
.
}
HEADER PARAMETERS
int max (int a, int b)
{
if ( a > b )
max = a; BODY
else
max = b;
}



Steps in the execution of the procedure:
1. save return address
2. procedure call
3. execute procedure
4. return


what is return address? instruction following call

what is procedure call? jump or branch to first instruction
in the procedure

what is return? jump or branch to return address


ARM Assembly implementation of C procedure call:


mov lr, rtn1
b proc1 ; one procedure call
rtn1: ; next instruction here
.
.
.
mov lr, rtn2
b proc1 ; another procedure call
rtn2: ; next instruction here

.
.
.

proc1: ; 1st instruction of procedure here
.
.
.
mov pc, lr


mov pc, lr does an unconditional branch (jump, actually) to the address
contained in register lr specified.






ARM instruction set provides a convenient instruction for procedure
calls.
bl procname

does 2 things
1. it places the address of the instruction following it
into register lr(r14). (The choice of 31 is arbitrary, but
fixed.)
2. it branches (jumps) to the address given by the label (procname).




the example re-written:


bl proc1
.
.
.
bl proc1
.
.
.


proc1: ; 1st instruction of procedure here
.
.
.
mov pc, lr ; lr is the alias for r14




one problem with this scheme. What happens if a procedure
calls itself (recursion), or if a procedure calls another
procedure (nesting) (using bl)?

The value in register lr gets overwritten with each
bl instruction. Return addresses are lost. This
is an unrecoverable error!


What is needed to handle this problem is to have a way to
save return addresses as they are generated. For a recursive
subroutine, it is not known ahead of time how many times
the subroutine will be called. This data is generated
dynamically; while the program is running.

The best way to save dynamically generated data is on a stack.

SYSTEM STACK
------------
A stack is so frequently used in implementing procedure call/return,
that many computer systems predefine a stack, the SYSTEM STACK.


STATIC -- can be defined when program is written (compile time)
DYNAMIC -- is defined when a program is executed (run time)

in this case, it is the amount of storage that cannot be
determined until run time.

The size of the system stack is very large. In theory, it should
be infinitely large. In practice, it must have a size limit.


very | stack | | grows towards smaller addresses
large | here | |
addresses| | |
| | |
| | \ /
| |
| |
| |
| |
| |
| |
| your |
| program |
address | here |
0 | |


The ARM Procedurte Call Standard (APCS) system stack is defined
to grow towards smaller addresses, and the stack pointer points to
an full location at the top of the stack. The stack pointer is
register r13, also called sp, and it is defined before program
execution begins.


push, in ARM Assembly:

sub sp, sp, #4 ; create spcae for the data to be pushed.
 str r?, [sp] ; the r? is replaced by whateger register


OR

str r?, [sp, #-4]
 sub sp, sp, #4

OR

str r?, [sp, #-4]!


pop, in ARM Assembly:

ldr r?, [sp]
add sp, sp, #4 ; the ? is replace by a register number

OR

add sp, sp, #4
ldr r?, [sp, #-4]

OR

ldr r?, [sp], #4



NOTE: if sp (r13) is used for any other purpose, then the stack pointer
is lost.



An example of using the system stack to save return addresses:

bl doit
.
.
.
bl doit
.
.
.

doit: str r?, [sp, #-4]! ; save return address and decrement sp

.
.
.
bl another ; this would overwrite the return
; address if it had not been saved.
.
.
.

ldr r?, [sp], #4 ; restore return address, and increment sp
mov pc, lr



about STACK FRAMES (ACTIVATION RECORDS)
---------------------------------------

from a compiler's point of view, there are a bunch of things
that should go on the stack relating to procedure call/return.
They include:
return address (register)
parameters
other various registers

Each procedure has different requirements for numbers of
parameters, their size, and how many registers (which ones)
will need to be saved on the stack. So, we compose a
STACK FRAME or ACTIVATION RECORD that is specific to a
procedure.


Space for a stack frame gets placed on the stack each time
a procedure is called, and taken off the stack each time a
return occurs. These stack frames are pushed/popped
DYNAMICALLY (while the program is running).


example:
bl A
bl B
.
.

A: bl C
bl D
mov pc, lr

B: bl D
mov pc, lr

C: bl E
mov pc, lr

D: mov pc, lr

E: mov pc, lr


show stack for a trace through this calling sequence




The code (skeleton) for one of these procedures:
A: bl C
bl D
mov pc, lr





A: sub sp, sp, 20 ; allocate frame for A
str lr, [sp, #16] ; save A's return address

bl C
bl D

ldr lr, [sp, #16] ; restore A's return address
add sp, sp, 20 ; remove A's frame from stack
mov pc, lr ; return from A




Some notes on this:
-- the allocation and removal of a frame should be done within
the body of the procedure. That way, the compiler does not
need to know the size of a procedure's frame.
-- Accesses to A's frame are done via offsets from the stack pointer.



parameter passing.
------------------

parameter = argument

Just as there is little/no support for implementing procedures
in many assembly languages, there is little/no support for passing
parameters to those procedures.

Remember, when it comes to the implementation,
-- convention (ARM Procedure Call Standards -APCS)
-- its up to the programmer to follow the conventions



Passing parameters means getting data into a place set aside
for the parameters. Both the calling program and the procedure
need to know where the parameters are.

The calling program places them there, and possibly uses
values returned by the procedure. The procedure uses
the parameters.


A note on parameter passing --
a HLL specifies rules for passing parameters. There are basically
2 types of parameters.

Note that a language can offer 1 or both types.


call by value -- what C has. In Pascal, these are parameters
declared without the var in front of the variable name.
Fortran doesn't have this type of parameter.

The parameter passed may not be modified by the procedure.
This can be implemented by passing a copy of the value.
What call by value really implies is that the procedure can
modify the value (copy) passed to it, but that the value
is not changed outside the scope of the procedure.

call by reference -- what Fortran has. In Pascal, these are
"var type" parameters.

The parameter passed to the subroutine can be modified,
and the modification is seen outside the scope of the
subroutine. It is sort of like having access to global
variable.


There are many ways of implementing these 2 variable types.
If call by value is the only parameter type allowed, how
can we implement a reference type parameter?
Pass the address of the variable as the parameter. Then
access to the variable is made through its address. This
is what is done in C.



Simplest mechanism -- registers
-------------------------------

the calling program puts the parameter(s) into specific registers,
and the procedure uses them.

example:

.
.
.
mov r0, r5 ; put parameter in register r0
bl decrement
mov r5, r0 ; recopy parameter to its correct place.
.
.
.
decrement: sub r0, r0, #-1
mov pc, lr


Notes: -- This is a trivial example, since the procedure is 1 line
long.
-- Why not just use r5 within the procedure?
1. convention -- parameters are passed in specific registers.
2. same procedure could be used to decrement the value
in other registers -- just copy the value to register r0
first, and copy it out afterwards.


Historically more significant mechanism: parameters on stack
---------------------

place the parameters to a procedure (function) in the activation
record for the procedure.

sub sp, sp, #8 ; allocate space for parameters
str r5, [sp,#4] ; place parameter 1 into AR of proc
str r6, [sp] ; place parameter 2 into AR of proc
bl proc
.
.
.
proc:
sub sp, sp, #12 ; allocate remainder of AR for proc
; assume fixed size (too big) activation record
ldr r2, [sp,#12] ; retrieve parameter 1 for use
ldr r3, [sp,#12] ; retrieve parameter 2

; use parameters in procedure calculations

add sp, sp, #20 ; remove AR of proc
mov pc, lr




calling program: allocates space for parameters
places parameters into stack
calls procedure
deallocates remainder of AR of procedure

procedure: allocates AR (or remainder of AR)
deallocates AR of procedure (or at least most of it)



ARM convention -- when passing parameters in registers,
the first 4 parameters are passed in registers r0-r3.
Then, ANY and ALL procedures use those registers for their parameters.


If there are nested subroutine calls, and registers r- - r4 are
used for parameters, the values would be lost (just like the
return address would be lost for 'bl' if not saved). There are
2 possible solutions.

1. For non-recursive nested calls, each procedure has associated
with it a section of memory. Before a nested call is made,
the current parameters are stored in that memory. After the return
from the nested call, the current values are restored.

2. For recursive calls, current parameters are stored on the stack before
a nested call. After the return from the nested call, the
current parameters are restored.


Here's a general layout of how this second option is used
(with 4 or fewer parameters):

procedure layout:
allocate remainder of AR
put return address on stack into AR of procedure

procedure calculations

to set up a call to proc2,
place current parameters (in r0-r3) into AR of procedure
allocate AR for proc2
set up parameters to proc2 in r0-r3
call proc2 (bl proc2)
copy any return values out of r0-r3
mov r0-r3 to other registers if required
pop current parameters from stack back to r0-r3, if needed.
care must be taken not to over write the return values in r0-r3

more procedure calculations (presumably using procedure's return
value and parameters which are now in r0-r3)

get procedure's return address from AR
return (mov pc, lr)




more about parameter passing.

a trivial example that contains nested calls, so saves
current parameters on the stack:

; a set of procedures that do the following,
; if a < b, then switch a with b and decrement both


; a is in register r4
; b is in register r5

.text

subs r6, r5, r4
bge othercode
mov a1, r4 ; place parameters in registers
mov a2, r5 ; in APCS a1 = r0 & a2 = r1
bl s_and_d
mov r4, a1 ; copy out return values
mov r5, a2
othercode:

done

; switch is a procedure to switch its 2 parameters, and then
; decrement each of the 2 parameters
; a1 (r0) -- first parameter
; a2 (r2) -- second parameter
; a3 -- temporary for switching
s_and_d: sub sp, sp, #20 ; allocate frame for switch
str lr, [sp, #16] ; save return address on stack

mov a3, a1 ; switch the 2 parameters
mov a2, a1 ; a3 is register r2
mov a2, a3

bl decrement ; the value to decrement is already
; in a1.
str a1, [sp,#12] ; place current parameter in frame
mov a1, a2 ; set up parameter in a1
bl decrement
mov a2, a1 ; copy return value
ldr a1, [sp,#12] ; restore current parameter

ldr lr, [sp,#16] ; get return address
sub sp, sp, #20 ; de-allocate frame for switch
mov pc, lr

; procedure decrement subtracts 1 from its parameter
; a1 (r0) -- parameter to be decremented
decrement: sub a0, a0, #1
mov pc, lr



Summary and other ideas:
1. use registers
+ easy, and don't have to store data in memory (faster)
- limited number of registers
- doesn't work for recursion, and must be careful when
using it where there are nested subroutines
2. use some registers, and place the rest on the stack
+ since many procedures have few parameters, get the advantages
of (1) most of the time.
- lots of "data shuffling"
3. put all parameters on the stack (an unsophisticated compiler might
do this)
+ simple, clean method (easy to implement)
- lots of stack operations (meaning slow, since the stack is in
memory)
4. put parameters in memory set aside for them
+ simple, clean method
- lots of memory operations (slow)
- doesn't work for recursion

Note: whatever you do, try to be consistant. Don't use all 4 methods
in the same program. (Its poor style.)





About frame pointers
--------------


The stack gets used for more than just pushing/popping stack frames.
During the execution of a procedure, there may be a need for temporary
storage of variables. The common example of this is in expression
evaluation.
Example: high level language statement
Z = (X * Y) + (A/2) - 100
The intermediate values of X*Y and A/2 must be stored somewhere.
On older machines, register space was at a premium. There just weren't
enough registers to be used for this sort of thing. So, intermediate
results (local variables) were stored on the stack.

They don't go in the stack frame of the executing procedure; they
are pushed/popped onto the stack as needed.


So, at one point in a procedure, parameter 6 might be at [sp,#16]



---------
| |
------------|
| | |
--------- |
|param 6| |
--------- |
| | |
--------- |
| | |--- procedure's frame
--------- |
| | |
--------- |
| | |
--------- ---<- sp
| |
---------


and, at another point within the same procedure, parameter 6 might be
at [sp,#20]


---------
| |
------------|
| | |
--------- |
|param 6| |
--------- |
| | |
--------- |
| | |--- procedure's frame
--------- |
| | |
--------- |
| temp1 | |
--------- |
| temp2 | |
--------- ---<- sp
| |
---------


All this is motivation for keeping an extra pointer around that does
not move with respect to the current stack frame.

Call it a FRAME POINTER. Make it point to the base of the current
frame:

---------
| |
------------|
| | |
--------- |<- frame pointer
|param 6| |
--------- |
| | |
--------- |
| | |--- procedure's frame
--------- |
| | |
--------- |
| temp1 | |
--------- |
| temp2 | |
--------- ---<- sp
| |
---------
Now items within the frame can be accessed with offsets from the
frame pointer, AND the offsets do not change within the procedure.


parameter 6 will be at [frame pointer - #4]
A new register is needed for this frame pointer.
(ARM standards chooses fp (r11).

parameter 6 is at [fp, #-4]


NOTE:

-- The frame pointer must be initialized at the start of
every procedure, and restored at the end of every procedure.
-- The ARM architecture doesn't really allocate a register for a
frame pointer. It has something else that it calls a "virtual frame
pointer," but it isn't really the same as described here. On the
ARM, all data with a stack frame is accessed via the stack pointer, sp.
However, Gcc supports fram pointer




New problem:
What happens if you've got lots of variables, and your procedure
runs out of registers to put them in. This occurs when you are
following the conventions for register usage, and you shouldn't
overwrite the values in certain registers.

Most common solution: store register values temporarily on the
stack in AR.


Two types:

CALLEE SAVED
a procedure clears out some registers for its own use

register values are preserved across procedure calls

ARM calls these CALEE SAVED registers, and designates
v1-v8 for this useage.

v1-v8 are aliases for r4-r12

the called procedure saves register values in its AR,
uses the registers for local variables,
restores register values before it returns.
Please note that if v7(r11) is uded as fp it cannot
be used as v7.


CALLER SAVED
the calling program saves the registers that it does not want a
called procedure to overwrite

register values are NOT preserved across procedure calls

ARM calls these SCRATCH/TEMPORARY registers, and designates a1 -a4
 for this useage.

a1-a4 are aliases for r0-r3

procedures use these registers for local variables, because
the values do not need to be preserved outside the scope
of the procedure.
Also, recall that the same registers are used for passing the first
arguments and return values.





what the mechanisms should look like from the compiler's
point of view:


THE CODE:

call setup
procedure call
return cleanup
.
.
.
procedure: prologue

calculations

epilogue




CALL SETUP
place current parameters into current stack frame
save any temporary registers that need to be preserved across the
procedure call
allocate space for ALL parameters frame (AR) for procedure to be called
(move sp to down enough space for the procedure's parameters)
place first 4 parameters to procedure into a1-a4
place remainder of parameters to procedure into newly allocated space




PROLOGUE
allocate space for remainder of stack frame
save return address in stack frame
copy needed parameters from stack frame into registers
save any needed saved registers into current stack frame


EPILOGUE
restore (copy) return address from stack frame into lr
restore from stack frame any saved registers (saved in prologue)
de-allocate stack frame (or most of it)
(move sp so the space for the procedure's frame is gone)


RETURN CLEANUP
copy needed return values and parameters from a1-a4, or stack
frame to correct places
de-allocate remainder of procedure's stack frame
(move sp so the space for the procedure's frame is gone)
restore any temporary registers from stack frame (saved in call setup)




REVISITING PROCEDURES.


What needs to be done depends on HLL.
The order is fairly consistent.
What is done by caller/callee varies from implementation to implementation.


Needed:
--> items to go in activation record.

return address
frame pointer
parameters
local variables --| may be some overlap here
saved registers --|


--> mechanism

before procedure CALL
1. caller gets parameters into correct location
2. space is allocated for part of activation record
then
3. control is transfered to procedure

before procedure RETURN
1. put return values into correct location
2. restore anything that needs to be restored (return address, callee
saved registers, frame pointer)
3. remove activation record
then
4. jump to return location


some guidelines:

-- if parameters passed on stack, want them "between" caller and
callees activation records.

-- use of frame pointer reduces amount of code. It gives a better
level of abstraction.

-- depending on conventions and implementations, the amount of
space allocated for activation record may be different then
the amount of space removed.

If callee allocates space, and parameters are on stack.

If caller and callee each allocate some of the space.

-- ARM:
only allocate space in activation record for all parameters,
if there are more than 4.