Representing text

Remember the char type? It is an integer type, but it is used to represent characters, i.e. letters and other bits of text. (For this reason, some people pronounce it like ‘care’, as in the first syllable of ‘character’, but I usually pronounce it like ‘char’, as from a fire.)

Letter by letter

For every thing your terminal thinks is a character, there is a number. This means not only letters, upper and lower case, but numbers, punctuation, and control codes for things like going to the next line and ringing a bell to get the user’s attention. Which number? It is straightforward to experiment with, in a few ways. Here is a program that prints out a character as both text and as a decimal number.

#include <stdio.h>

int main()
{
    char c = 'z';
    printf("%c %d\n", c, c);
}

As you can see, a character value can be written in a C program using single quotes, as in 'z'. Most things you can type and have appear on the screen can just go in the single quotes and represent themselves, but some things are hard to type or see. The backslash character is used to ‘escape’ the next character to being special; for example, '\n' is the character for newline. Another useful escape sequence to know is '\t' for tab. Note that although typed with extra characters in the source file, an escape sequence represents a single character. Finally, although backslash is perfectly visible and typeable, since it is used to escape things, it is also now special and must be represented with '\\'.

Feel free to play around with which characters have which numbers. More than likely you will find that they correspond to the ASCII standard, which you can also view on the console with man ascii, but as soon as you’ve figured out which number is which, I encourage you to immediately not worry about it. The numbers might be different on different systems, and it would be less reliable and less readable for you to write your programs based on anything but the single-quoted character literals—just because you happen to know that 'A' is 65, doesn’t mean you should ever write 65 when you could write 'A'. There are some simple rules you are allowed to assume, however.

  • The upper-case letters are contiguous, i.e. 'A' + 1 == 'B' and so on.

  • The lower-case letters are contiguous, i.e. 'a' + 1 == 'b' and so on.

  • The digits are contiguous, i.e. '0' + 1 == '1' and so on. (Although note that in general, '3' != 3. Characters and numbers are different concepts.)

Strings

A longer run of text, like ‘hello, world’, is represented as a string, which is really just an array of char values. You can use the datatype char * or make an array of char, and there is also a syntax for literals. Just as a single character goes in single quotes, a string of characters can go in double quotes. We have seen many examples already, such as "hello, world" or "%c %d\n". Anything you can do with a char in single quotes, including backslash escapes, you can put in a row inside double quotes and it’s a string!

You have attempted of activities on this page