5.5. Strings¶
Strings in assembly are simple an array of signed bytes (chars) interpreted by their ASCII code. ‘A’ is stored as 0x41, ‘B’ as 0x42, etc… By convention, they usually end with a NULL character - the value 0. This convention allows us to allocate large buffers (chunks of memory), only use the parts we need, and then later identify how much of the buffer is actually in use.
For example, given this chunk of memory:
0x41 | 0x70 | 0x70 | 0x6c | 0x65 | 0x00 | 0x4f | 0x92 | 0xa0 | 0x07 | 0x1c | 0x1c | 0x3d | 0x82 | 0xbc | 0x0f |
We would interpret it to mean 0x41 0x70 0x70 0x6c 0x65
- the 0x00
says “stop reading here”. Anthing else in that buffer would be ignored.
The standard way to load a string is via the .asciz
directive. (The z stands for the 0 at the end).
It loads a given string into memory and adds a null byte 0x00 to end it. The sample below loads the chars ‘A’ ‘p’ ‘p’ ‘l’ ‘e’ and places a 0x00 after them:
.data
.asciz "Apple"
@use 0xFF to show where string ends
.byte 0xFF
Displayed in byte mode (so that endianness doesn’t change the order we see the chars in), the result looks like this:
Note
There also is an .ascii which does not add a null byte to the end.
To loop over a string, we use a loop that looks for a null char. The code sample below counts the ‘e’s in a string. It is roughly equivalent to the C++:
int eCount = 0;
int i = 0;
while(myString[i] != '\0') {
char current = myString[i];
if (current == 'e')
eCount += 1;
i++;
}
In the assembly version r
holds the eCount, r2
the base address of the string, and r3
the index. Since each char is a single byte, we can use
base address + index to calculate the memory address of each char:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | .data
myString: .asciz "hello there people"
.align @force alignment to word boundary
.text
_start:
MOV r1, #0 @ r1 = eCount
LDR r2, =myString @ r2 = base address of myString
MOV r3, #0 @ i = 0 - r3 will be index
B looptest @ Jump to loop test
loopstart:
@We know that r4 has current char from doing test
CMP r4, #'e' @ compare current char to e (0x65)
BNE endIf
ADD r1, #1 @ if equal, add one to counter
endIf:
ADD r3, r3, #1 @ i++
looptest:
LDRB r4, [r2, r3] @ r4 = current char (myString[i])
@ Load as a BYTE!!!
CMP r4, #0 @ 0 char signifies end of string
BNE loopstart @ If not null char, go through a loop iteration
end:
B end @stop here
|