How to work with floating-point numbers¶

Floating point numbers, what we’d call float or double in C, didn’t use to be part of the x86 architecture. At all. You could write routines in software to deal with them, or buy an x87 Floating-Point Unit coprocessor separately. (Yes, like a GPU, but an FPU. Coprocessors are an old idea.) Luckily, the modern x86-64 architecture includes a dedicated extension for floating-point numbers (and a few other features) called SSE. This history is important to know about, because none of the main x86 instructions or registers can handle floating-point numbers—instead there is a whole other set of registers and instructions, including some instructions that let us move information back and forth.

The floating-point registers are called the XMM registers. There are sixteen of them, xmm0, xmm1, etc., through xmm15. Each one is huge—128 bits—and can not only work with floating-point numbers but can also do simultaneous operations on small arrays of numbers at once. We’re not going to use that feature, but it’s relevant because some of the instructions specifically mention ‘scalar’ values, as opposed to vectors, i.e. one at a time.

The movq instruction (‘move quadword’) is like mov, but lets us move quadwords at a time in and out of the XMM registers. A quadword floating-point number corresponds to the C type double. To add two doubles, you can use addsd (‘add scalar double’), which works just like add but on XMM registers (and appropriately-sized memory locations). So, the following snippet from the floating-point add example reads in two doubles, a and b, adds them up, and puts the answer back in c.

movq    xmm0, [rip + a]
movq    xmm1, [rip + b]
addsd   xmm1, xmm0
movq    [rip + c], xmm1

In order to specify global variables that contain doubles, there is the .double directive.

    .data
a:  .double 12.3
b:  .double 45.6

Setting aside empty space in the bss segment works the same regardless of type.

    .bss
c:  .zero   8

In addition to addsd, there is also subsd, mulsd, and divsd, among other operations. Unlike in integers, where mul and div have special behavior around how big the numbers can get and accounting for remainders, floating-point numbers can incorporate that directly in their representation, so all four of the arithmetic instructions take two operands and put the answer back into the first one.

There is one different worthy of note, between e.g. add and addsd. You can put ‘immediate’ numbers directly in the machine code for an add instruction, like add rax, 1. You can’t in the SSE instructions, so if you want to add 1 to an XMM register, you have to put that 1 into a register first, usually by loading it from memory. So, the half-a-float example has a separate constant to represent the number 2 in order to use it as the divisor.

movq    xmm0, [rip + a]
divsd   xmm0, [rip + .Ltwo]
movq    [rip + c], xmm0

And the constant:

    .section .rodata
.Ltwo:  .double 2.0

The .rodata is for read-only data—we’re not going to mess with what ‘two’ means. The name I chose for this variable is .Ltwo, not simply two; the .L prefix means this symbol will be local (private to this assembly file, not appearing in the symbol table). That way I can use it here but none of my other code needs to know I had to put this constant anywhere.

There are a lot more operations in the SSE extension, but one that might come in handy is sqrtsd. When you execute, for example, sqrtsd xmm0, xmm1, the square root of xmm1 will be stored into xmm0. In C, you have to call the sqrt library function to compute square roots, but in assembly, you can ask the hardware!

The following screencast shows a gdb session that walks through the halving program step by step, looking at the memory locations and the registers to make sure everything seems right. Because the SSE extension registers can do so many things, they display is complicated structures rather than a single number; we are only interested in the first number in the two-double view. We should see that \(12.3 / 2.0\) is about \(6.15\).

You have attempted of activities on this page