Declarations, definitions, and numbers

Consider this line of C code.

int x = 2 + 2;

This is called a ‘declaration’. It declares the existence of an object—in this case, it is named ‘x’, it holds values that are integers (int), it starts out with the value 4 (i.e. the result of evaluating the expression \(2 + 2\)). It is possible to declare something exists in general without immediately finding a place for it to actually exist in memory, but this declaration does indeed cause x to to exist as soon as the main function is invoked. A declaration that causes storage to be allocated is called a ‘definition’.

In fact, the code in twoplustwo.c that contains this line has two definitions: one for the function main, and one for the variable x inside of it. (We will talk more about the scope and duration of things, but it suffices to say for now that our x is confined within main.)

Unlike in math, variables in C can be changed to hold different values at different instants in the computation. Somewhat confusingly, C uses the same symbol, =, both to declare that a variable is equal to some value, and to say that a variable is to be changed to hold some value.

int main()
{
    int x = 2 + 2;
    x = 2 + 3;
}

In this program, x is initialized to 4, and then set to 5 (i.e. the result of evaluation the expression \(2 + 3\)). It is not strictly necessary to initialize a variable; to calculate \(2 + 2\), one could write the following.

int main()
{
    int x;
    x = 2 + 2;
}

The definition int x; declares that a variable named ‘x’ that holds integers exists, and sets aside a suitable part of memory for it. Then the assignment x = 2 + 2 calculates a new value and changes the memory for x to represent the result. This is worse style, because between definining x and assigning it a meaningful value, it does still exist and represent an integer—it is just undefined which one! Such uninitialized variables are a common source of bugs. Still, there are often situations where it makes sense to define a variable before there is a meaningful value to put in it; so, just strive to put the definition as close as possible to giving it a value, and the ideal is to put them in the exact same place using an initializer in the definition.

Here is a program that calculates \(2 + 2\) in yet another way.

int main()
{
    int x = 2;
    x = x + 2;
}

First, this program defines an integer x and initializes it to 2. Then, it evaluates the expression \(x + 2\), which since \(x\) is currently 2 means \(2 + 2\) or \(4\), and that result is stored into x. You can follow along in the debugger.

The code I’ve shown you so far has included two types, int and double. They correspond to types of numbers, but there are some interesting details to discuss because those numbers are stored in computer memory.

Integral types

An int isn’t exactly the same as an integer. For one thing, mathematically the integers are infinite, while a computer with finite mass can only represent a finite amount of information. Furthermore, the way computing hardware is designed, there are certain fixed-size chunks of information (e.g. 32 or 64 bits) that it can handle directly, and anything bigger or smaller must generally be simulated in software.

The C language is designed to respect that sort of design decision and give the programmer an architecture-agnostic way to work within such limitations. So, C has a family of related types, called the integral types, which all store integers but with different architectural characteristics.

A C integral type is either ‘signed’ or ‘unsigned’. A signed integer can store negative, zero, or positive numbers; an unsigned integer stores nonnegative (i.e. zero or positive) numbers. (Thus an unsigned integer is more like a natural number in math than an integer.) You can specify signed int or unsigned int but if you don’t specify the default is signed, i.e. int is equivalent to signed int.

Integral types are also distinguished by width, i.e. how many bits are used to represent a value. The int type is chosen for each architecture to be the most natural width for the hardware’s bus, math instructions, etc.; on many current systems that is 32 bits (4 bytes), but the C language specification leaves it implementation-defined so an int could be a different size on, say, an Arduino.

The char type is an integral type that is the smallest unit the architecture can subdivide memory into, while being large enough to store the numeric code for a character of text (hence the abbreviated name ‘char’). Thus, char is synonymous with ‘byte’, and it follows that although the standard allows different widths on different architectures, on every known system it has a width of 8 bits.

A short int, or just short, is at least as wide as a char and at most as wide as an int. That is all the standard guarantees (once again, it is implementation-defined), but on a system with an 8-bit char and a 32-bit int, a short is usually 16 bits (2 bytes). Going the other way, a long int, or just long, is at least as wide as an int but possibly wider, and a long long int or just long long is at least as wide as a long but possibly wider. Most architectures don’t actually have five different widths of integers (char, short, int, long, long long) built into the hardware, so typically some of them will actually be the same size; however, each wider C type is at least as wide as the previous, giving the programmer some control over width while remaining portable.

When in doubt, use int for whole numbers.

Floating-point types

To represent rational and real numbers, which aren’t necessarily whole numbers, C provies the float type. The name float is an abbreviation of ‘floating point’, as opposed to ‘fixed point’. As an example of a fixed-point scheme, consider US currency: when it is written with a decimal point, there are conventionally 2 places after it, like 1.99. So one could, rather than trying to represent fractions of dollars at all, just use an int for the whole number of pennies. However, in general, you might want 1.99, or 19.9, or 0.199—the point ‘floats’ around based on the value’s order of magnitude.

Unlike the integral types, real types in C are always signed; however, there are multiple widths available. Being stored in a fixed number of bits limits a floating-point variable not only in magnitude but also in precision; there is much more to explore here, but the main idea is that floats attempt to represent real numbers, but a programmer must remember that they are only finite approximations.

The double type is a larger floating point type, so named because it has double the width of a float. There is also a long double type, at least as wide as a double but even bigger on architectures that can support that.

When in doubt, use double for real numbers.

You have attempted of activities on this page