Linkage in C¶
In general, useful C programs span multiple files which are compiled
separately and linked together. We have seen how to structure projects
in this way and talked about some of the benefits; now let us explore
in detail the language features that explicitly support linkage and when
each combination makes sense. The following project consists of a pair
of functions, up
and twice
, which count up by ones and
repeatedly double a number, respectively. Each is defined in its own
source file, along with a global variable to keep track of the number.
In up.c
, define the up
function and its count
variable.
int count = 1;
int up(void)
{
return ++count;
}
In twice.c
, define the twice
function and its
count
variable.
int count = 1;
int twice(void)
{
count *= 2;
return count;
}
In steps.c
, define a main
function for a program that
puts the functions through their paces.
#include <stdio.h>
int up(void);
int twice(void);
int main()
{
printf("%d\n", up());
printf("%d\n", twice());
printf("%d\n", up());
printf("%d\n", twice());
}
(It would be good practice in general to put the declarations of up
and twice
in a header and #include
it, but here I have
chosen to directly write the declarations in steps.c
to reduce
the number of files to talk about and to be more explicit about which
declarations and definitions are in each file.)
The following Makefile
is adequate to compile each file separately,
then link them together to produce a program named steps.
all: steps
.PHONY: all
steps: steps.o up.o twice.o
.PHONY: clean
rm -f steps up.o twice.o steps.o
If you try to run make
, you will see a linker error. The three
object files, up.o
, twice.o
, and steps.o
, can be
made, but they cannot be linked together. The problem is that two of them
define a variable named count
, and global variables are shared
across the whole program so those two separate variables cannot coexist.
You can see what the linker sees by using objdump to view the
symbol table in each file. Run objdump -t steps.o
; the output
has a number of debugging symbols not currently of interest, and the
following entries mentioning symbols visible in the C source code.
0000000000000000 g F .text 000000000000006f main
0000000000000000 *UND* 0000000000000000 up
0000000000000000 *UND* 0000000000000000 printf
0000000000000000 *UND* 0000000000000000 twice
Read the man page for objdump for more information about the
output of the -t
option. The first column gives the value of
each symbol; the linker will end up choosing addresses for everything,
but in the object file these symbols all have the value zero. There are
some letters next indicating various aspects of the symbol; main
has g
meaning ‘global’ and F
meaning ‘function’,
as it is a globally-visible function definition. The symbol’s section
comes next; main
is going to be placed in the .text
section. The number that follows is the symbol’s size in hexadecimal;
the machine code for main
occupies 0x6f bytes. Finally the name
of the symbol.
The entries for up
, twice
, and printf
are
marked as *UND*
for ‘undefined’. Those symbols are used in
steps.o
, but not defined there; they will need to be linked
in from other files (up.o
, twice.o
, and the standard
library, respectively).
The symbol table for up.o
shows the following.
0000000000000000 g O .data 0000000000000004 count
0000000000000000 g F .text 000000000000001f up
There is a globally-visible definition in the .text
segment for
a function named up
. The symbol that caused problems at link
time, count
, is going to take up 4 bytes in the .data
section. It is marked g
for ‘global’ as well, and O
for ‘object’ (i.e. a variable). Compare to the symbol table for
twice.o
.
0000000000000000 g O .data 0000000000000004 count
0000000000000000 g F .text 000000000000001e twice
Now it is possible to see that there are two definitions of count
.
What can be done to make this program? Well, it depends on what the
programmer intended—were the two variables named the same thing because
the programmer wanted a shared, single variable that could be modified
through both functions? Or should each function have its own counter,
and the name conflict was an unfortunate accident?
External linkage¶
To have a single variable that multiple files share, it is necessary to declare it with external linkage. For any symbol—function or object—there can only be one definition of that symbol, but there can be other declarations of it. The difference is that a declaration only says that the symbol exists and some information about its type, while a definition sets aside memory for the symbol to refer to and may give it an initial value.
For functions, it is easy to tell apart a declaration and definition. The
up
and twice
functions are declared in steps.c
,
giving their names, parameter lists, and return types, but ending with
a semicolon rather than a function body. Since there is no definition
of either function in that file, the symbol table gets an undefined
entry that will match at link time with a file that can offer the needed
definition. The definitions of those functions have function bodies in
curly brackets.
A line such as int count = 1;
or even int count; without the
initializer is a definition. Memory will be set aside for the variable
count
. To declare simply that there is such a symbol, but not
define it, use the extern
keyword to give the symbol ‘external
linkage’. There must still be one definition somewhere. For example,
you could leave the definition in up.c
as-is, but change
twice.c
to the following.
extern int count;
int twice(void)
{
count *= 2;
return count;
}
Running make
again should now succeed. Look at the symbol table
for the new twice.o
.
0000000000000000 g F .text 000000000000001e twice
0000000000000000 *UND* 0000000000000000 count
Now its count
symbol is undefined, and the linker can fill it
in with the global object count
from up.o
. (Try making
the declaration in up.c
extern
too, and see what happens.
There has to be a definition somewhere, but it could be in steps.c
or any file that gets linked into the final program.)
Now there is a compiled, linked program in the file steps
. It,
too, has a symbol table, which you can view with objdump -t
steps
. It has a lot more symbols, enough that you might want to use
less to scroll through the output. It is also often useful to
pipe the output into grep to search for a particular symbol,
as in objdump -t steps | grep count
, which shows the following.
0000000000004010 g O .data 0000000000000004 count
Now the symbol has a concrete address, 0x4010, and there is only one count
for the whole program. Running ./steps
produces the following output,
because counting up by ones and doubling are happening to the same variable.
2
4
5
10
Static linkage¶
Perhaps the programmer intended to calculate a sequence that counts by
ones, and a separate sequence that doubles, but because the code is
split between files there was an unintended naming conflict. One way
around that, of course, is to rename one or both of the count
variables so they have unique names. However, the static
keyword
in C gives the possibility of defining variables that have the same
allocation and lifetime as globals, but limited scope.
Change up.c
to add the static
keyword on its definition
of count
.
static int count = 1;
int up(void)
{
return ++count;
}
Similarly, give twice.c
a static definition of a variable named
count
.
static int count = 1;
int twice(void)
{
count *= 2;
return count;
}
Make the program and look at the symbol tables. Both up.o
and
twice.o
should have an entry like this.
0000000000000000 l O .data 0000000000000004 count
Now, rather than a g
for ‘global’, the count
symbol
is marked l
for ‘local’. Local symbols do not conflict with
symbols in other files, and do not fill in undefined references in other
files. These two object files can be linked together, and if you look
in the symbol table for the finished program, you will see that both
variables exist.
0000000000004010 l O .data 0000000000000004 count
0000000000004014 l O .data 0000000000000004 count
They have the same name, but you can tell that they are not the same
object because they are at different addresses, 0x4010 and 0x4014. Running
./steps
will show the following output because the counters
are separate.
2
2
3
4
Static linkage is not only useful for file-level globals. If, as in this
case, a variable is only being used within one function, the scope of
its name can be restricted to just that function—as though it were a
local variable of that function—but still have global allocation. So,
for example, twice.c
could become just the following.
int twice(void)
{
static int count = 1;
count *= 2;
return count;
}
Try making the program and seeing what, if anything, changes about the symbol tables using that style.
Finally, static linkage can be used on any definition, not just objects. It is commonly used to write a function in one file that is not linked to other files, making name conflicts easier to avoid and interfaces easier to describe while still allowing function decomposition with useful helpers.
Global and static linkage in assembly¶
We have already seen all of the assembly directives necessary to
control whether symbols are global (in C, defined at global scope),
local (in C, static
), or external (in C, declared, possibly with
extern
, but not defined). Recall addints.s
, with
an assembly program to add integers.
.intel_syntax noprefix
.text
.global main
main: push rbp
mov rbp, rsp
mov rax, [rip + a]
mov rcx, [rip + b]
add rcx, rax
mov [rip + c], rcx
mov rax, 0
pop rbp
ret
.data
a: .quad 123
b: .quad 456
.bss
c: .zero 8
Assemble this into an object file with e.g. make addints.o
,
then look at the symbol table with objdump -t addints.o
. The
main
function is global, whereas the symbols a
, b
,
and c
are local (i.e. static
). Labels in your assembly
code become symbols in the symbol table; by default, they are local,
but you can make them global with the .global
directive as with
main
above.
Now try deleting the lines that define a
and b
and look at
the symbol table that results. The assembler will generate an undefined
entry in the symbol table for any symbol that the program uses but which
isn’t defined in the same file. If you try to make a linked program
from just this file, the linker will complain that those symbols aren’t
defined, but you can define them in another file, link them together,
and have a finished file. You could even link object files from assembly
and object files from C together.