Jumping around¶
In machine code, there are no conditionals, loops, or functions—there
are only jumps. Jumps are the building block used for all other forms
of control flow. Jumps are also called branches or ‘goto’, and in
fact there is a goto
keyword one can use in C, but since C also
has conditionals, loops, functions, and such, there isn’t much use for
goto
. In assembly, it’s all about jumping around, but the trick
is to keep the flow of control organized.
We typically introduce students to the branching idea in CS160 using the Little Computer, which might be worth reviewing (or using as a gentle introduction even if you haven’t seen it before).
The most basic form of control flow, so basic it is easy to overlook,
is sequential. In a list of instructions, when one is finished, the
next one takes place, and the next, and so on. The way the machine keeps
track of its place is with the Instruction Pointer, the special-purpose
register rip
. In sequential control flow, as the instruction
at one address is decoded, the instruction pointer is set to the next
location in memory, right after the current instruction, and that’s
where the next one will be found.
Instead, a programmer can use the jmp
(‘jump’) instruction
to set the instruction pointer to any address they like. The usual way
to specify an address is the same, whether it is the address of data or
machine code, and that is to use a label. The following program has an
instruction that gets skipped, because the flow of control jumps right
over it.
.intel_syntax noprefix
.text
.global main
main:
push rbp
mov rbp, rsp
# set rax to zero
mov rax, 0
# increment rax, now it is 1
inc rax
# jump!
jmp .Lhere
# this doesn't happen...
inc rax
.Lhere:
# ...because we go straight here
inc rax
# now rax is 2 (not 3)
pop rbp
ret
Try assembling it and stepping through in gdb!
The label can be anywhere. Jumping forward skips the instructions you jump over; jumping backward repeats them.
.intel_syntax noprefix
.text
.global main
main:
push rbp
mov rbp, rsp
# set rax to zero
mov rax, 0
.Lhere:
# count up
inc rax
# go back up and do it again!
jmp .Lhere
# nothing else ever happens
pop rbp # not this
ret # nope
# just increment forever
Try this one in gdb, too, and stop it when you get bored.
Conditional jumps¶
Generally, code that can never execute and code that can never stop executing aren’t what we want. To have conditionals and general-purpose loops, we need a way to check a condition and only jump sometimes. There is a whole family of related conditional jump instructions. Each one jumps when a particular condition is true, and when the condition is false, it does nothing, falling through to the next instruction as normal.
A conditional jump doesn’t actually test the condition, it relies on
the program already having computed whether the condition was true and
leaving the answer in a special-purpose register named FLAGS
. The
FLAGS
register gets set implicitly as a result of doing other
calculations, all the time; for example, if you perform an unsigned
addition with the add
instruction and the result wraps around, one
of the bits in the FLAGS
register named CF
(the ‘carry
flag’) is set to 1. If you calculate a difference using sub
and the result is negative, the SF
(‘sign flag’) is set. You
are free to use or ignore the flags that result from your calculations,
but they are always being maintained implicitly. The conditional jump
instructions use the flags to determine whether to jump.
Consider the following C code, which declares that there exist two global
variables, a
and b
(but not what their values will be),
and then in main sets another variable based on the condition a
== b
.
extern long a;
extern long b;
long x;
int main()
{
if (a == b)
x = 0;
else
x = 1;
}
According to Compiler Explorer, this might translate into the following machine code.
main:
push rbp
mov rbp, rsp
mov rdx, QWORD PTR a[rip]
mov rax, QWORD PTR b[rip]
cmp rdx, rax
jne .L2
mov QWORD PTR x[rip], 0
jmp .L3
.L2:
mov QWORD PTR x[rip], 1
.L3:
mov eax, 0
pop rbp
ret
After the function boilerplate, this code loads a
into rdx
and b
into rax
. Then it runs the cmp
instruction,
which is the same as sub
except it doesn’t save the result. If
it doesn’t save the result, what is the point? To set up FLAGS
based on the calculation. If a == b
, then their difference will
be zero, and the ‘zero flag’ ZF
bit in FLAGS
will
be set. The conditional jump jne
only jumps if ZF
is
unset, so if the variables are equal, the jump won’t happen and control
will fall through into the next line, setting x
to 0. If the
variables are different, however, that will be skipped and control will
jump to .L2
, where x
is set to 1.
What about the unconditional jmp
, and its target, .L3
?
On the branch where x
is set to 0, we need to skip over the
instruction that would then set x
to 1; whichever instruction
sets x
, control should resume at the boilerplate that leaves
the function.
Repetition¶
Jumps backwards make loops. Consider this program which demonstrates
a for
loop in C.
extern int x;
int main()
{
int i;
for (i = 0; i < 10; ++i)
x += 3;
}
It looks like this compiled.
main:
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-4], 0
jmp .L2
.L3:
mov eax, DWORD PTR x[rip]
add eax, 3
mov DWORD PTR x[rip], eax
add DWORD PTR [rbp-4], 1
.L2:
cmp DWORD PTR [rbp-4], 9
jle .L3
mov eax, 0
pop rbp
ret
The add eax, 3
instruction is the body of the loop, the thing
being repeated. See if you can find the initialization (i =
0
), the condition (i < 10
), and the step (++i
). Here
is an important hint: the variable i
is being stored in memory
at [rbp-4]
. (We will get into why when we discuss stack frames.)
Conclusions¶
To make sense of these jump-based control flow structures, one useful tool is a flow chart. First, group instructions into runs as long as possible where there are no jumps into or out of the middle; on paper or a whiteboard or such, draw a box for each group, with the instructions or a quick description inside so you know which is which. Second, draw an arrow out of the bottom of each box leading to the top of every box it could jump to, and label the arrow with the condition under which the jump will go that way.
There are complex visual symbolic languages that can be used for flow
charting, but I think a few rectangular boxes and labeled arrows,
for a start, are nearly all that you need. An if
in C will
look like a path that goes around the conditional body, where an
if
/else
will look like a diamond. A loop will look
literally like a loop, a circular path you can follow around and around.
If you experiment, you can begin to notice some style. For example,
in my C code where I checked a == b
, the assembly code used
the conditional jump jne
, ‘jump if not equal’. The equal
turned into not equal because the jump is choosing whether to skip over
the conditional body, and control skips the body when the condition is
false. Similarly, the for
loop had a condition of i <
10
, but the compiled version actually tested whether i <= 9
,
and did so at the bottom of the loop rather than up at the top. At the
top, instead, is an unconditional jump down to that condition.
Being able to read the control structures emitted by a compiler and figure out what higher-level, structured programming control structures they represent is one of the big ideas of this course.