Arrays in C¶
An array in C is a compound type. Every compound type in C is based on
a primitive type such as int
, double
, char
, etc.
So, you can have an array of int
items, an array of char
items, etc. To declare a variable with a compound type, you first write
the name of the primitive type at its base, and then you decorate the
identifier with the syntax you would use to access that primitive type.
Let’s say we want an array of integers named array
; since the
syntax for accessing items in an array are square brackets, the declaration
of our variable will look like this.
int array[] = {1, 2, 3};
In this example, I have initialized the array with some values. When you have initial values to fill in, this is the best way to do it. When all you know is how many values there will be, but you need to do some calculating to fill in the values, you can just specify the size instead.
int array[3]; /* space for three ints, uninitialized */
You are allowed to specify the size and also a list of values. If you supply more values than fit in the size, the compiler will complain; if you supply fewer values than fill the size, the remaining items will be filled in with zeroes.
int array[3] = {1, 2, 3, 4, 5}; /* compile error */
int array[5] = {1, 2, 3}; /* same as {1, 2, 3, 0, 0} */
int array[3] = {}; /* same as {0, 0, 0} */
As a special case, the initializer list for an array of char
can use the string literal syntax. The following two declarations are
equivalent (note that the '\0'
null character terminator is
implied by the string literal syntax).
char array[] = {'h', 'e', 'l', 'l', 'o', '\0'};
char array[] = "hello";
The size of an array is the sum of the sizes of its items; since they
are by definition all the same size, that is the same as the size of
one item multiplied by the number of items in the array. Experiment with
putting the following snippet into a simple main
and running it
with various array types and sizes.
int array[3]; /* change 'int' and '3' to other things */
printf("%zd bytes for one item\n", sizeof(array[0]));
printf("%zd bytes for the whole array\n", sizeof(array));
printf("%zd items in the array\n", sizeof(array) / sizeof(array[0]));
To further explore how arrays are implemented in memory, we can use the
&
address-of operator to see where the items have been put.
char array[3];
printf("%p array itself\n", &array);
printf("%p array[0]\n", &array[0]);
printf("%p array[1]\n", &array[1]);
printf("%p array[2]\n", &array[2]);
The actual adresses are likely to be different in different runs, but their relationships will show a consistent pattern. Here is a possible output.
0x7ffc36682f82 array itself
0x7ffc36682f82 array[0]
0x7ffc36682f83 array[1]
0x7ffc36682f84 array[2]
The addresses are (very large) numbers being printed in
hexadecimal. Without worrying too much about e.g. what they would be in
decimal, it is clear enough that they’re nearly the same, only varying in
the last place. The address of the array itself is the same as the address
of the first element—you can rely on this. Furthermore, each subsequent
item is at the next subsequent address. The items in this array are of
type char
, so each one occupies one byte and then comes the next.
If I change the type of the array items to int
, and on my example
machine that means each one occupies four bytes, I see output like this.
0x7ffd84ac13dc array itself
0x7ffd84ac13dc array[0]
0x7ffd84ac13e0 array[1]
0x7ffd84ac13e4 array[2]
Still, the array’s address is the same as the first item. Now, the items aren’t spaced out at one-byte intervals. Instead, they are spaced out at four-byte intervals (check my math in hexadecimal), still packed in as tightly as the size of an item will allow.
This means that I can use a pointer variable to navigate an array of items fairly easily. Pointers store addresses and addresses are just numbers (typically quite large and conventionally printed in hexadecimal, but still just numbers). However, most mathematical operations don’t result in any meaningful result—I could multiply an address by three, but that doesn’t mean there’s anything interesting there in memory. Within a block of memory that represents an array, however, there is one kind of arithmetic that makes sense: because the items are laid out so regularly, addition and subtraction lets us move from one to the other. In C, adding and subtracting to a pointer is implicitly done in steps of a whole item, rather than single bytes, so this is particularly elegant.
int array[] = {13, 14, 15};
int *ptr = &array[0];
printf("*ptr = %d\n", *ptr);
printf("array[0] = %d\n", array[0]);
printf("*(ptr + 1) = %d\n", *(ptr + 1));
printf("array[1] = %d\n", array[1]);
printf("*(ptr + 2) = %d\n", *(ptr + 2));
printf("array[2] = %d\n", array[2]);
In fact, note that the number you add to the pointer, i.e. the number of
items to skip, is the same as the index you use in the square brackets
to index the array. This is no coincidence, it is in fact the basis
of array indexing counting from zero. Furthermore, you can use the
square bracket notation on a pointer! The expression ptr[n]
is equivalent to *(ptr + n)
.
printf("ptr[0] = %d\n", ptr[0]);
printf("array[0] = %d\n", array[0]);
printf("ptr[1] = %d\n", ptr[1]);
printf("array[1] = %d\n", array[1]);
printf("ptr[2] = %d\n", ptr[2]);
printf("array[2] = %d\n", array[2]);
In fact, addition is commutative, so…what if this works…
printf("%d\n", ptr[2]);
printf("%d\n", 2[ptr]);
Huh. Yep. Never do that though. Or tell anyone I told you.
Basically, pointers and arrays are interchangeable in a C program nearly everywhere. If you just use the name of an array as a pointer, it will implicitly be converted to the appropriate address (i.e. of the first item).
int *ptr = array; /* nicer than &array[0] */
You will see C programmers throwing around pointers and arrays interchangeably, and using pointer arithmetic to instantly calculate where the item they want is to be found. This is the power of arrays—they directly leverage the strength of random-access memory and the way it is addressed.
There are still some ways you might need to be careful and notice that arrays are technically not just pointers. First, and perhaps most obviously, they are not generally the same size.
int array[5];
int *ptr = array;
For me, this prints the following.
20 bytes for array
8 bytes for pointer
The 20 bytes for the array makes sense, that’s 5 times the 4 bytes for
an int
. Why 8 bytes for the pointer? Oh right, that’s because
the pointer isn’t storing 5 integers, it’s storing the address of where
the array starts, and addresses on this system are 8 bytes.
For similar reasons, assignment makes sense on pointers but not arrays.
/* initializations, not assignment, don't let the = fool you */
int array1[3] = {3, 5, 7};
int array2[3] = {2, 4, 8};
int *ptr = array1;
/* assign to ptr, now it points to array2 */
ptr = array2;
/* assign to array1? doesn't even compile */
array1 = array2;
This difference seems to come up most often with strings. A character array and a character pointer are typically interchangeable as text, but they are stored differently. A string literal in double quotes being used in an expression will result in the literal text being allocated statically (i.e. at compile time, probably in the read-only data segment of the object file), and its address will be used. A character array, however, even if initialized using the double-quote syntax as a shorthand, will be allocated automatically at runtime.
char *ptr = "hello";
char array[] = "hello";
The two variables here, ptr
and array
, are interchangeable
as long as you just pass them to puts
or something like
that. However, you can tell them apart if you try to assign to them
(e.g. ptr = "blah"
versus array = "blah"
), change their
elements (e.g. ptr[0] = 'y'
versus array[0] = 'y'
),
get their size with sizeof
, or look at the compiled object file.