Linkage in C

In general, useful C programs span multiple files which are compiled separately and linked together. We have seen how to structure projects in this way and talked about some of the benefits; now let us explore in detail the language features that explicitly support linkage and when each combination makes sense. The following project consists of a pair of functions, up and twice, which count up by ones and repeatedly double a number, respectively. Each is defined in its own source file, along with a global variable to keep track of the number.

In up.c, define the up function and its count variable.

int count = 1;

int up(void)
{
    return ++count;
}

In twice.c, define the twice function and its count variable.

int count = 1;

int twice(void)
{
    count *= 2;
    return count;
}

In steps.c, define a main function for a program that puts the functions through their paces.

#include <stdio.h>

int up(void);
int twice(void);

int main()
{
    printf("%d\n", up());
    printf("%d\n", twice());
    printf("%d\n", up());
    printf("%d\n", twice());
}

(It would be good practice in general to put the declarations of up and twice in a header and #include it, but here I have chosen to directly write the declarations in steps.c to reduce the number of files to talk about and to be more explicit about which declarations and definitions are in each file.)

The following Makefile is adequate to compile each file separately, then link them together to produce a program named steps.

all: steps
.PHONY: all

steps: steps.o up.o twice.o

.PHONY: clean
    rm -f steps up.o twice.o steps.o

If you try to run make, you will see a linker error. The three object files, up.o, twice.o, and steps.o, can be made, but they cannot be linked together. The problem is that two of them define a variable named count, and global variables are shared across the whole program so those two separate variables cannot coexist.

You can see what the linker sees by using objdump to view the symbol table in each file. Run objdump -t steps.o; the output has a number of debugging symbols not currently of interest, and the following entries mentioning symbols visible in the C source code.

0000000000000000 g     F .text  000000000000006f main
0000000000000000         *UND*  0000000000000000 up
0000000000000000         *UND*  0000000000000000 printf
0000000000000000         *UND*  0000000000000000 twice

Read the man page for objdump for more information about the output of the -t option. The first column gives the value of each symbol; the linker will end up choosing addresses for everything, but in the object file these symbols all have the value zero. There are some letters next indicating various aspects of the symbol; main has g meaning ‘global’ and F meaning ‘function’, as it is a globally-visible function definition. The symbol’s section comes next; main is going to be placed in the .text section. The number that follows is the symbol’s size in hexadecimal; the machine code for main occupies 0x6f bytes. Finally the name of the symbol.

The entries for up, twice, and printf are marked as *UND* for ‘undefined’. Those symbols are used in steps.o, but not defined there; they will need to be linked in from other files (up.o, twice.o, and the standard library, respectively).

The symbol table for up.o shows the following.

0000000000000000 g     O .data  0000000000000004 count
0000000000000000 g     F .text  000000000000001f up

There is a globally-visible definition in the .text segment for a function named up. The symbol that caused problems at link time, count, is going to take up 4 bytes in the .data section. It is marked g for ‘global’ as well, and O for ‘object’ (i.e. a variable). Compare to the symbol table for twice.o.

0000000000000000 g     O .data  0000000000000004 count
0000000000000000 g     F .text  000000000000001e twice

Now it is possible to see that there are two definitions of count. What can be done to make this program? Well, it depends on what the programmer intended—were the two variables named the same thing because the programmer wanted a shared, single variable that could be modified through both functions? Or should each function have its own counter, and the name conflict was an unfortunate accident?

External linkage

To have a single variable that multiple files share, it is necessary to declare it with external linkage. For any symbol—function or object—there can only be one definition of that symbol, but there can be other declarations of it. The difference is that a declaration only says that the symbol exists and some information about its type, while a definition sets aside memory for the symbol to refer to and may give it an initial value.

For functions, it is easy to tell apart a declaration and definition. The up and twice functions are declared in steps.c, giving their names, parameter lists, and return types, but ending with a semicolon rather than a function body. Since there is no definition of either function in that file, the symbol table gets an undefined entry that will match at link time with a file that can offer the needed definition. The definitions of those functions have function bodies in curly brackets.

A line such as int count = 1; or even int count; without the initializer is a definition. Memory will be set aside for the variable count. To declare simply that there is such a symbol, but not define it, use the extern keyword to give the symbol ‘external linkage’. There must still be one definition somewhere. For example, you could leave the definition in up.c as-is, but change twice.c to the following.

extern int count;

int twice(void)
{
    count *= 2;
    return count;
}

Running make again should now succeed. Look at the symbol table for the new twice.o.

0000000000000000 g     F .text  000000000000001e twice
0000000000000000         *UND*  0000000000000000 count

Now its count symbol is undefined, and the linker can fill it in with the global object count from up.o. (Try making the declaration in up.c extern too, and see what happens. There has to be a definition somewhere, but it could be in steps.c or any file that gets linked into the final program.)

Now there is a compiled, linked program in the file steps. It, too, has a symbol table, which you can view with objdump -t steps. It has a lot more symbols, enough that you might want to use less to scroll through the output. It is also often useful to pipe the output into grep to search for a particular symbol, as in objdump -t steps | grep count, which shows the following.

0000000000004010 g     O .data  0000000000000004              count

Now the symbol has a concrete address, 0x4010, and there is only one count for the whole program. Running ./steps produces the following output, because counting up by ones and doubling are happening to the same variable.

2
4
5
10

Static linkage

Perhaps the programmer intended to calculate a sequence that counts by ones, and a separate sequence that doubles, but because the code is split between files there was an unintended naming conflict. One way around that, of course, is to rename one or both of the count variables so they have unique names. However, the static keyword in C gives the possibility of defining variables that have the same allocation and lifetime as globals, but limited scope.

Change up.c to add the static keyword on its definition of count.

static int count = 1;

int up(void)
{
    return ++count;
}

Similarly, give twice.c a static definition of a variable named count.

static int count = 1;

int twice(void)
{
    count *= 2;
    return count;
}

Make the program and look at the symbol tables. Both up.o and twice.o should have an entry like this.

0000000000000000 l     O .data  0000000000000004 count

Now, rather than a g for ‘global’, the count symbol is marked l for ‘local’. Local symbols do not conflict with symbols in other files, and do not fill in undefined references in other files. These two object files can be linked together, and if you look in the symbol table for the finished program, you will see that both variables exist.

0000000000004010 l     O .data  0000000000000004              count
0000000000004014 l     O .data  0000000000000004              count

They have the same name, but you can tell that they are not the same object because they are at different addresses, 0x4010 and 0x4014. Running ./steps will show the following output because the counters are separate.

2
2
3
4

Static linkage is not only useful for file-level globals. If, as in this case, a variable is only being used within one function, the scope of its name can be restricted to just that function—as though it were a local variable of that function—but still have global allocation. So, for example, twice.c could become just the following.

int twice(void)
{
    static int count = 1;
    count *= 2;
    return count;
}

Try making the program and seeing what, if anything, changes about the symbol tables using that style.

Finally, static linkage can be used on any definition, not just objects. It is commonly used to write a function in one file that is not linked to other files, making name conflicts easier to avoid and interfaces easier to describe while still allowing function decomposition with useful helpers.

Global and static linkage in assembly

We have already seen all of the assembly directives necessary to control whether symbols are global (in C, defined at global scope), local (in C, static), or external (in C, declared, possibly with extern, but not defined). Recall addints.s, with an assembly program to add integers.

    .intel_syntax noprefix

    .text
    .global main
main:   push    rbp
    mov rbp, rsp

    mov rax, [rip + a]
    mov rcx, [rip + b]
    add rcx, rax
    mov [rip + c], rcx

    mov rax, 0
    pop rbp
    ret


    .data
a:  .quad   123
b:  .quad   456

    .bss
c:  .zero   8

Assemble this into an object file with e.g. make addints.o, then look at the symbol table with objdump -t addints.o. The main function is global, whereas the symbols a, b, and c are local (i.e. static). Labels in your assembly code become symbols in the symbol table; by default, they are local, but you can make them global with the .global directive as with main above.

Now try deleting the lines that define a and b and look at the symbol table that results. The assembler will generate an undefined entry in the symbol table for any symbol that the program uses but which isn’t defined in the same file. If you try to make a linked program from just this file, the linker will complain that those symbols aren’t defined, but you can define them in another file, link them together, and have a finished file. You could even link object files from assembly and object files from C together.

You have attempted of activities on this page