Representing text¶
Remember the char
type? It is an integer type, but it is used
to represent characters, i.e. letters and other bits of text. (For
this reason, some people pronounce it like ‘care’, as in the first
syllable of ‘character’, but I usually pronounce it like ‘char’,
as from a fire.)
Letter by letter¶
For every thing your terminal thinks is a character, there is a number. This means not only letters, upper and lower case, but numbers, punctuation, and control codes for things like going to the next line and ringing a bell to get the user’s attention. Which number? It is straightforward to experiment with, in a few ways. Here is a program that prints out a character as both text and as a decimal number.
#include <stdio.h>
int main()
{
char c = 'z';
printf("%c %d\n", c, c);
}
As you can see, a character value can be written in a C program using
single quotes, as in 'z'
. Most things you can type and have appear
on the screen can just go in the single quotes and represent themselves,
but some things are hard to type or see. The backslash character
is used to ‘escape’ the next character to being special; for example,
'\n'
is the character for newline. Another useful escape sequence
to know is '\t'
for tab. Note that although typed with extra
characters in the source file, an escape sequence represents a single
character. Finally, although backslash is perfectly visible and typeable,
since it is used to escape things, it is also now special and must be
represented with '\\'
.
Feel free to play around with which characters have which numbers. More
than likely you will find that they correspond to the ASCII standard,
which you can also view on the console with man ascii
, but
as soon as you’ve figured out which number is which, I encourage you
to immediately not worry about it. The numbers might be different on
different systems, and it would be less reliable and less readable
for you to write your programs based on anything but the single-quoted
character literals—just because you happen to know that 'A'
is
65, doesn’t mean you should ever write 65
when you could write
'A'
. There are some simple rules you are allowed to assume,
however.
The upper-case letters are contiguous, i.e.
'A' + 1 == 'B'
and so on.The lower-case letters are contiguous, i.e.
'a' + 1 == 'b'
and so on.The digits are contiguous, i.e.
'0' + 1 == '1'
and so on. (Although note that in general,'3' != 3
. Characters and numbers are different concepts.)
Strings¶
A longer run of text, like ‘hello, world’, is represented as a string,
which is really just an array of char
values. You can use the
datatype char *
or make an array of char
, and there is
also a syntax for literals. Just as a single character goes in single
quotes, a string of characters can go in double quotes. We have seen many
examples already, such as "hello, world"
or "%c %d\n"
.
Anything you can do with a char in single quotes, including backslash
escapes, you can put in a row inside double quotes and it’s a string!