Jumping around

In machine code, there are no conditionals, loops, or functions—there are only jumps. Jumps are the building block used for all other forms of control flow. Jumps are also called branches or ‘goto’, and in fact there is a goto keyword one can use in C, but since C also has conditionals, loops, functions, and such, there isn’t much use for goto. In assembly, it’s all about jumping around, but the trick is to keep the flow of control organized.

We typically introduce students to the branching idea in CS160 using the Little Computer, which might be worth reviewing (or using as a gentle introduction even if you haven’t seen it before).

The most basic form of control flow, so basic it is easy to overlook, is sequential. In a list of instructions, when one is finished, the next one takes place, and the next, and so on. The way the machine keeps track of its place is with the Instruction Pointer, the special-purpose register rip. In sequential control flow, as the instruction at one address is decoded, the instruction pointer is set to the next location in memory, right after the current instruction, and that’s where the next one will be found.

Instead, a programmer can use the jmp (‘jump’) instruction to set the instruction pointer to any address they like. The usual way to specify an address is the same, whether it is the address of data or machine code, and that is to use a label. The following program has an instruction that gets skipped, because the flow of control jumps right over it.

    .intel_syntax noprefix
    .text
    .global main
main:
    push    rbp
    mov rbp, rsp

    # set rax to zero
    mov rax, 0

    # increment rax, now it is 1
    inc rax

    # jump!
    jmp .Lhere

    # this doesn't happen...
    inc rax

.Lhere:
    # ...because we go straight here
    inc rax

    # now rax is 2 (not 3)

    pop rbp
    ret

Try assembling it and stepping through in gdb!

The label can be anywhere. Jumping forward skips the instructions you jump over; jumping backward repeats them.

    .intel_syntax noprefix
    .text
    .global main
main:
    push    rbp
    mov rbp, rsp

    # set rax to zero
    mov rax, 0

.Lhere:
    # count up
    inc rax

    # go back up and do it again!
    jmp .Lhere

    # nothing else ever happens

    pop rbp # not this
    ret     # nope

    # just increment forever

Try this one in gdb, too, and stop it when you get bored.

Conditional jumps

Generally, code that can never execute and code that can never stop executing aren’t what we want. To have conditionals and general-purpose loops, we need a way to check a condition and only jump sometimes. There is a whole family of related conditional jump instructions. Each one jumps when a particular condition is true, and when the condition is false, it does nothing, falling through to the next instruction as normal.

A conditional jump doesn’t actually test the condition, it relies on the program already having computed whether the condition was true and leaving the answer in a special-purpose register named FLAGS. The FLAGS register gets set implicitly as a result of doing other calculations, all the time; for example, if you perform an unsigned addition with the add instruction and the result wraps around, one of the bits in the FLAGS register named CF (the ‘carry flag’) is set to 1. If you calculate a difference using sub and the result is negative, the SF (‘sign flag’) is set. You are free to use or ignore the flags that result from your calculations, but they are always being maintained implicitly. The conditional jump instructions use the flags to determine whether to jump.

Consider the following C code, which declares that there exist two global variables, a and b (but not what their values will be), and then in main sets another variable based on the condition a == b.

extern long a;
extern long b;

long x;

int main()
{
    if (a == b)
    x = 0;
    else
    x = 1;
}

According to Compiler Explorer, this might translate into the following machine code.

main:
    push    rbp
    mov     rbp, rsp
    mov     rdx, QWORD PTR a[rip]
    mov     rax, QWORD PTR b[rip]
    cmp     rdx, rax
    jne     .L2
    mov     QWORD PTR x[rip], 0
    jmp     .L3
.L2:
    mov     QWORD PTR x[rip], 1
.L3:
    mov     eax, 0
    pop     rbp
    ret

After the function boilerplate, this code loads a into rdx and b into rax. Then it runs the cmp instruction, which is the same as sub except it doesn’t save the result. If it doesn’t save the result, what is the point? To set up FLAGS based on the calculation. If a == b, then their difference will be zero, and the ‘zero flag’ ZF bit in FLAGS will be set. The conditional jump jne only jumps if ZF is unset, so if the variables are equal, the jump won’t happen and control will fall through into the next line, setting x to 0. If the variables are different, however, that will be skipped and control will jump to .L2, where x is set to 1.

What about the unconditional jmp, and its target, .L3? On the branch where x is set to 0, we need to skip over the instruction that would then set x to 1; whichever instruction sets x, control should resume at the boilerplate that leaves the function.

Repetition

Jumps backwards make loops. Consider this program which demonstrates a for loop in C.

extern int x;

int main()
{
    int i;

    for (i = 0; i < 10; ++i)
    x += 3;
}

It looks like this compiled.

main:
    push    rbp
    mov     rbp, rsp
    mov     DWORD PTR [rbp-4], 0
    jmp     .L2
.L3:
    mov     eax, DWORD PTR x[rip]
    add     eax, 3
    mov     DWORD PTR x[rip], eax
    add     DWORD PTR [rbp-4], 1
.L2:
    cmp     DWORD PTR [rbp-4], 9
    jle     .L3
    mov     eax, 0
    pop     rbp
    ret

The add eax, 3 instruction is the body of the loop, the thing being repeated. See if you can find the initialization (i = 0), the condition (i < 10), and the step (++i). Here is an important hint: the variable i is being stored in memory at [rbp-4]. (We will get into why when we discuss stack frames.)

Conclusions

To make sense of these jump-based control flow structures, one useful tool is a flow chart. First, group instructions into runs as long as possible where there are no jumps into or out of the middle; on paper or a whiteboard or such, draw a box for each group, with the instructions or a quick description inside so you know which is which. Second, draw an arrow out of the bottom of each box leading to the top of every box it could jump to, and label the arrow with the condition under which the jump will go that way.

There are complex visual symbolic languages that can be used for flow charting, but I think a few rectangular boxes and labeled arrows, for a start, are nearly all that you need. An if in C will look like a path that goes around the conditional body, where an if/else will look like a diamond. A loop will look literally like a loop, a circular path you can follow around and around.

If you experiment, you can begin to notice some style. For example, in my C code where I checked a == b, the assembly code used the conditional jump jne, ‘jump if not equal’. The equal turned into not equal because the jump is choosing whether to skip over the conditional body, and control skips the body when the condition is false. Similarly, the for loop had a condition of i < 10, but the compiled version actually tested whether i <= 9, and did so at the bottom of the loop rather than up at the top. At the top, instead, is an unconditional jump down to that condition.

Being able to read the control structures emitted by a compiler and figure out what higher-level, structured programming control structures they represent is one of the big ideas of this course.

You have attempted of activities on this page