How to work with floating-point numbers¶
Floating point numbers, what we’d call float
or double
in
C, didn’t use to be part of the x86 architecture. At all. You could write
routines in software to deal with them, or buy an x87 Floating-Point Unit
coprocessor separately. (Yes, like a GPU, but an FPU. Coprocessors are an
old idea.) Luckily, the modern x86-64 architecture includes a dedicated
extension for floating-point numbers (and a few other features) called
SSE. This history is important to know about, because none of the main
x86 instructions or registers can handle floating-point numbers—instead
there is a whole other set of registers and instructions, including some
instructions that let us move information back and forth.
The floating-point registers are called the XMM registers. There
are sixteen of them, xmm0
, xmm1
, etc., through
xmm15
. Each one is huge—128 bits—and can not only work
with floating-point numbers but can also do simultaneous operations on
small arrays of numbers at once. We’re not going to use that feature,
but it’s relevant because some of the instructions specifically mention
‘scalar’ values, as opposed to vectors, i.e. one at a time.
The movq
instruction (‘move quadword’) is like mov
, but
lets us move quadwords at a time in and out of the XMM registers. A quadword
floating-point number corresponds to the C type double
. To add
two doubles, you can use addsd
(‘add scalar double’), which works
just like add
but on XMM registers (and appropriately-sized memory
locations). So, the following snippet from the floating-point add example
reads in two doubles, a
and b
, adds them up, and puts the
answer back in c
.
movq xmm0, [rip + a]
movq xmm1, [rip + b]
addsd xmm1, xmm0
movq [rip + c], xmm1
In order to specify global variables that contain doubles, there is the
.double
directive.
.data
a: .double 12.3
b: .double 45.6
Setting aside empty space in the bss segment works the same regardless of type.
.bss
c: .zero 8
In addition to addsd
, there is also subsd
, mulsd
,
and divsd
, among other operations. Unlike in integers, where
mul
and div
have special behavior around how big the
numbers can get and accounting for remainders, floating-point numbers
can incorporate that directly in their representation, so all four of
the arithmetic instructions take two operands and put the answer back
into the first one.
There is one different worthy of note, between e.g. add
and addsd
. You can put ‘immediate’ numbers directly in
the machine code for an add
instruction, like add rax,
1
. You can’t in the SSE instructions, so if you want to add 1 to an
XMM register, you have to put that 1 into a register first, usually
by loading it from memory. So, the half-a-float example has a separate
constant to represent the number 2 in order to use it as the divisor.
movq xmm0, [rip + a]
divsd xmm0, [rip + .Ltwo]
movq [rip + c], xmm0
And the constant:
.section .rodata
.Ltwo: .double 2.0
The .rodata
is for read-only data—we’re not going to mess with
what ‘two’ means. The name I chose for this variable is .Ltwo
,
not simply two
; the .L
prefix means this symbol will
be local (private to this assembly file, not appearing in the symbol
table). That way I can use it here but none of my other code needs to
know I had to put this constant anywhere.
There are a lot more operations in the SSE extension, but one that
might come in handy is sqrtsd
. When you execute, for example,
sqrtsd xmm0, xmm1
, the square root of xmm1
will be
stored into xmm0
. In C, you have to call the sqrt
library function to compute square roots, but in assembly, you can ask
the hardware!
The following screencast shows a gdb session that walks through
the halving program step by step, looking at the memory locations and the
registers to make sure everything seems right. Because the SSE extension
registers can do so many things, they display is complicated structures
rather than a single number; we are only interested in the first number
in the two-double
view. We should see that \(12.3 / 2.0\)
is about \(6.15\).