About the XMM SSE2 floating point instructions
SSE2 instructions:-
These are SSE2 (streaming SIMD extensions) floating
point instructions which use the 128-bit XMM registers and which can handle
double-precision (64-bit) floating point values. There are also some
instructions which work on single-precision (32-bit) floating point values.
Support for SSE2 was introduced in the Pentium 4 and Xeon processors.
Generally the instructions are very similar to the
SSE floating point instructions except for the
data size that they work with.
Before using these instructions in your code you need to check if they
are available on the processor which is running your program. This is done
by calling CPUID having set EAX=1. Then test bit 26 of EDX. The bit will
be set if the SSE2 instructions can be used.
In the tests the following data declarations are used:-
DOUBLEFP1 DQ 1.1
DQ 3.3
DOUBLEFP2 DQ 20.66
DQ 40.66
DOUBLEFPN DQ -5.1
DQ +6.3
Since it is possible that these labels may not be on a 16-bit boundary, the
MOVUPD instruction must be used to transfer the data from memory into an
XMM register. MOVUPD (move two unaligned packed double-precision values)
does not care about alignment.
If you specify ALIGN 16 immediately before the data declaration, however,
then the faster MOVAPD (move two aligned packed double-precision values)
can be used instead. If you get this wrong your program will cause an
exception. See more about this (in the
case of MOVDQA and MOVDQU which work in the same way). When transferring
between registers, either MOVUPD or MOVAPD may be used.
The instructions we are looking at here tend largely to be two types.
The first type of instruction deals with two 64-bit floating point numbers at once.
These instructions have "PD" in their mnemonic name, referring to "packed
double-precision". The second type of instruction deals with just one 64-bit
floating point value. These instructions have "SD" in their mnemonic name referring to
"scalar double-precision". They work on the lowest part of the XMM register
only, that is to say the first 64 bits of the register (bits 0 to 63).
To watch these tests properly you need to set the appropriate breakpoint,
start the test and then single step through the instructions. You can then
watch how they change the XMM registers. Using GoBug you can make the XMM
registers appear in their floating point SSE2 format using the
appropriate button on the toolbar.
Data movement instructions
Arithmetic instructions
Logical instructions
Comparison instructions
Shuffle and unpack instructions
Conversion instructions
SSE2 Data movement instructions
This demonstrates moving data into the registers and
between the registers. MOVUPD and MOVAPD (aligned version), MOVSD, MOVLPD
and MOVHPD can also be
used to get values out and into memory. MOVMSKPD can be used after a
comparison instruction to get the result of the compare
into eax for analysis.
As an experiment we also try the SSE integer instruction MOVDQU and
the SSE floating point instruction MOVUPS to see if they do the same as
MOVUPD. It seems they do, merely performing a bit transfer into the
XMM register. However, Intel do warn against using different instructions from those specified to avoid unstated performance implications.
The breakpoint is XMMSSE2_FPDATA:-
XMMSSE2_FPDATA:
MOV EAX,1 ;request CPU feature flags
CPUID ;0Fh, 0A2h CPUID instruction
TEST EDX,4000000h ;test bit 26 (SSE2)
JNZ >L20 ;SSE2 available
CALL NOSSE2FPMESS ;displays message if SSE2 not available
RET
L20:
;***** display XMM registers in SSE2 mode ..
MOVUPD XMM0,[DOUBLEFP1] ;move two double precision fp values into XMM0
MOVAPD XMM7,XMM0 ;copying to XMM7
MOVSD XMM2,[DOUBLEFP2] ;move fp value to XMM1 low only
MOVLPD XMM3,[DOUBLEFP2] ;this seems to be the same
MOVHPD XMM4,[DOUBLEFP2] ;but this moves the high value
MOVUPD XMM0,[DOUBLEFPN] ;move two new values, one is negative
MOVMSKPD EAX,XMM0 ;get both sign bits in XMM0 into eax
;************ and as an experiment, see if this does the same as MOVUPD ..
MOVDQU XMM1,[DOUBLEFPN] ;use integer instruction to transfer the bits
;************ as this too (one byte smaller) ..
MOVUPS XMM2,[DOUBLEFPN] ;use SSE instruction to transfer the bits
RET
SSE2 Arithmetic instrunctions
This demonstrated the arithmetic instructions which can
work in the XMM registers using double-precision (64-bit) numbers.
The breakpoint is XMMSSE2_FPARITH:-
XMMSSE2_FPARITH:
MOV EAX,1 ;request CPU feature flags
CPUID ;0Fh, 0A2h CPUID instruction
TEST EDX,4000000h ;test bit 26 (SSE2)
JNZ >L22 ;SSE2 available
CALL NOSSE2FPMESS ;displays message if SSE2 not available
RET
L22:
;***** display XMM registers in SSE2 mode ..
MOVUPD XMM0,[DOUBLEFP1] ;move two double precision fp values into XMM0
MOVAPD XMM2,XMM0 ;copying to XMM2
MOVUPD XMM1,[DOUBLEFP2] ;move 2nd tester fp values into XMM1
MOVAPD XMM3,XMM1 ;copying to XMM3
ADDPD XMM0,XMM1 ;add both fp values result in XMM0
MOVAPD XMM0,XMM2 ;restore value in XMM0
SUBPD XMM0,XMM1 ;subtract both fp values result in XMM0
;*******
MOVAPD XMM0,XMM2 ;restore value in XMM0
ADDSD XMM0,XMM1 ;add low fp value result in XMM0
SUBSD XMM0,XMM1 ;subtract low fp value result in XMM0
;*******
MOVAPD XMM0,XMM2 ;restore value in XMM0
MULPD XMM0,XMM1 ;multiply both fp values result in XMM0
;*******
MOVAPD XMM0,XMM2 ;restore value in XMM0
MULSD XMM0,XMM1 ;multiply low fp value result in XMM0
;*******
MOVAPD XMM0,XMM2 ;restore value in XMM0
DIVPD XMM0,XMM1 ;divide both fp values result in XMM0
;*******
MOVAPD XMM0,XMM2 ;restore value in XMM0
DIVSD XMM0,XMM1 ;divide low fp value result in XMM0
;*******
MOVAPD XMM0,XMM2 ;restore value in XMM0
SQRTPD XMM0,XMM1 ;get square roots of both fp values result in XMM0
;*******
MOVAPD XMM0,XMM2 ;restore value in XMM0
SQRTSD XMM0,XMM1 ;get square root of low fp value result in XMM0
;*******
MOVAPD XMM0,XMM2 ;restore value in XMM0
MAXPD XMM0,XMM1 ;get numerically greater fp values result in XMM0
;*******
MOVAPD XMM0,XMM2 ;restore value in XMM0
MAXSD XMM0,XMM1 ;get numerically greater of low fp values result in XMM0
;*******
MOVAPD XMM0,XMM2 ;restore value in XMM0
MINPD XMM0,XMM1 ;get numerically smaller fp values result in XMM0
;*******
MOVAPD XMM0,XMM2 ;restore value in XMM0
MINSD XMM0,XMM1 ;get numerically smaller of low fp values result in XMM0
RET
SSE2 Logical instructions
This demonstrates the logical instructions which can
work in the XMM registers using double-precision (64-bit) numbers.
The breakpoint is XMMSSE2_FPLOGIC:-
XMMSSE2_FPLOGIC:
MOV EAX,1 ;request CPU feature flags
CPUID ;0Fh, 0A2h CPUID instruction
TEST EDX,4000000h ;test bit 26 (SSE2)
JNZ >L24 ;SSE2 available
CALL NOSSE2FPMESS ;displays message if SSE2 not available
RET
L24:
;***** display XMM registers in SSE2 mode ..
MOVUPD XMM0,[DOUBLEFP1] ;move two double precision fp values into XMM0
MOVAPD XMM2,XMM0 ;copying to XMM2
MOVUPD XMM1,[DOUBLEFP2] ;move 2nd tester fp values into XMM1
MOVAPD XMM3,XMM1 ;copying to XMM3
ANDPD XMM0,XMM1 ;perform AND on both fp values result in XMM0
;*******
MOVAPD XMM0,XMM2 ;restore value in XMM0
ANDNPD XMM0,XMM1 ;perform AND NOT on both fp values result in XMM0
;*******
MOVAPD XMM0,XMM2 ;restore value in XMM0
ORPD XMM0,XMM1 ;perform OR on both fp values result in XMM0
;*******
MOVAPD XMM0,XMM2 ;restore value in XMM0
XORPD XMM0,XMM1 ;perform XOR on both fp values result in XMM0
RET
SSE2 Comparison instructions
This demonstrates the comparison instructions which can
work in the XMM registers using single-precision (64-bit) numbers.
You tell CMPPD and CMPSD what to do by specifying an immediate value in the
third operand. It is not easy to remember what value does what, so some
assemblers (including GoAsm) also provide psuedo mnemonics in the form
recommended by Intel (given here in the comment). Somewhat easier to use,
because they use the ordinary flags are COMISD and UCOMISD although they
only work on one floating point value in the XMM register (contained in
bits 0-63).
The breakpoint is XMMSSE2_FPCOMP:-
XMMSSE2_FPCOMP:
MOV EAX,1 ;request CPU feature flags
CPUID ;0Fh, 0A2h CPUID instruction
TEST EDX,4000000h ;test bit 26 (SSE2)
JNZ >L26 ;SSE2 available
CALL NOSSE2FPMESS ;displays message if SSE2 not available
RET
L26:
;***** display XMM registers in SSE2 mode ..
MOVUPD XMM0,[DOUBLEFP1] ;move two double precision fp values into XMM0
MOVAPD XMM2,XMM0 ;copying to XMM2
MOVUPD XMM1,[DOUBLEFP2] ;move 2nd tester fp values into XMM1
MOVAPD XMM3,XMM1 ;copying to XMM3
;********************* compare instructions working on both fp values
CMPPD XMM0,XMM1,0 ;=CMPEQPD see whether equal, result in XMM0
MOVAPD XMM0,XMM2 ;restore original value to XMM0
CMPPD XMM0,XMM1,1 ;=CMPLTPD see whether less than, result in XMM0
MOVAPD XMM0,XMM2 ;restore original value to XMM0
CMPPD XMM0,XMM1,2 ;=CMPLEPD see whether less than or equal, result in XMM0
MOVAPD XMM0,XMM2 ;restore original value to XMM0
CMPPD XMM0,XMM1,3 ;=CMPUNORDPD see unordered, result in XMM0
MOVAPD XMM0,XMM2 ;restore original value to XMM0
CMPPD XMM0,XMM1,4 ;=CMPNEQPD see whether not equal, result in XMM0
MOVAPD XMM0,XMM2 ;restore original value to XMM0
CMPPD XMM0,XMM1,5 ;=CMPNLTPD see whether not less than, result in XMM0
MOVAPD XMM0,XMM2 ;restore original value to XMM0
CMPPD XMM0,XMM1,6 ;=CMPNLEPD see whether not less than or equal, result in XMM0
MOVAPD XMM0,XMM2 ;restore original value to XMM0
CMPPD XMM0,XMM1,7 ;=CMPORDPD see whether ordered, result in XMM0
;********************* compare instructions working on low value only
MOVAPD XMM0,XMM2 ;restore original value to XMM0
CMPSD XMM0,XMM1,0 ;=CMPEQPD see whether equal, result in XMM0
MOVAPD XMM0,XMM2 ;restore original value to XMM0
CMPSD XMM0,XMM1,1 ;=CMPLTPD see whether less than, result in XMM0
MOVAPD XMM0,XMM2 ;restore original value to XMM0
CMPSD XMM0,XMM1,2 ;=CMPLEPD see whether less than or equal, result in XMM0
MOVAPD XMM0,XMM2 ;restore original value to XMM0
CMPSD XMM0,XMM1,3 ;=CMPUNORDPD see unordered, result in XMM0
MOVAPD XMM0,XMM2 ;restore original value to XMM0
CMPSD XMM0,XMM1,4 ;=CMPNEQPD see whether not equal, result in XMM0
MOVAPD XMM0,XMM2 ;restore original value to XMM0
CMPSD XMM0,XMM1,5 ;=CMPNLTPD see whether not less than, result in XMM0
MOVAPD XMM0,XMM2 ;restore original value to XMM0
CMPSD XMM0,XMM1,6 ;=CMPNLEPD see whether not less than or equal, result in XMM0
MOVAPD XMM0,XMM2 ;restore original value to XMM0
CMPSD XMM0,XMM1,7 ;=CMPORDPD see whether ordered, result in XMM0
;********************* compare and give result in eflags
MOVAPD XMM0,XMM2 ;restore original value to XMM0
COMISD XMM0,XMM1 ;look at lowest only result in eflags
UCOMISD XMM0,XMM1 ;(unordered compare)
MOVUPD XMM1,[DOUBLEFPN] ;move two -ve, two +ve values into XMM1
COMISD XMM0,XMM1 ;look at lowest only - result in eflags
UCOMISD XMM0,XMM1 ;(unordered compare)
RET
SSE2 Shuffle and unpack instructions
With these instructions you can move the double-precision
(64-bit) floating point values around the XMM registers.
The breakpoint is XMMSSE2_SHUFF:-
XMMSSE2_SHUFF:
MOV EAX,1 ;request CPU feature flags
CPUID ;0Fh, 0A2h CPUID instruction
TEST EDX,4000000h ;test bit 26 (SSE2)
JNZ >L28 ;SSE2 available
CALL NOSSE2FPMESS ;displays message if SSE2 not available
RET
L28:
;***** display XMM registers in SSE2 mode ..
MOVUPD XMM0,[DOUBLEFP1] ;move two double precision fp values into XMM0
MOVAPD XMM2,XMM0 ;copying to XMM2
MOVUPD XMM1,[DOUBLEFP2] ;move 2nd tester fp values into XMM1
MOVAPD XMM3,XMM1 ;copying to XMM3
SHUFPD XMM0,XMM1,3h ;shuffle pack into destination
SHUFPD XMM0,XMM0,1h ;swap the values in XMM0
MOVAPD XMM0,XMM2 ;restore original value to XMM0
UNPCKHPD XMM0,XMM1 ;unpack (high) and put into destination
MOVAPD XMM0,XMM2 ;restore original value to XMM0
UNPCKLPD XMM0,XMM0 ;unpack (low) and put into destination
RET
SSE2 Conversion instructions
The instructions convert between integers, single-precision
and double-precision floating point values. They should be read together
with the SSE conversion instructions.
The breakpoint is XMMSSE2_CONV:-
XMMSSE2_CONV:
MOV EAX,1 ;request CPU feature flags
CPUID ;0Fh, 0A2h CPUID instruction
TEST EDX,4000000h ;test bit 26 (SSE2)
JNZ >L30 ;SSE2 available
CALL NOSSE2FPMESS ;displays message if SSE2 not available
RET
L30:
;***** display XMM registers in both SSE and SSE2 modes ..
;***** conversion between single and double-precision fp values ..
CVTPS2PD XMM0,[SINGLEFP1] ;put single-precision fp values into XMM0 as double-precision
CVTPD2PS XMM6,XMM0 ;convert double precision to single precision in XMM7
CVTSS2SD XMM1,[SINGLEFP1] ;as CVTPS2PD but working with only one value
CVTSD2SS XMM7,XMM1 ;as CVTSS2SD but working with only one value
;***** conversion between integers and double-precision fp values ..
;***** open the MMX integer pane for these tests ..
CVTPD2PI MM0,XMM0 ;convert fp values in XMM0 to integers in MM0
CVTTPD2PI MM1,XMM0 ;same as above with truncation
CVTPI2PD XMM0,[DINTEGER] ;convert 23 and 24 to double-precision fp values
;***** open the XMM integer display and switch to dword display
CVTPD2DQ XMM7,XMM0 ;and convert 23 and 24 to dword integers into XMM7 (low)
CVTTPD2DQ XMM7,XMM0 ;same as above with truncation
CVTDQ2PD XMM3,XMM7 ;and back into fp values in XMM3
CVTSD2SI EAX,XMM0 ;take low fp value and convert as integer in EAX
CVTTSD2SI EDX,XMM0 ;same as above with truncation
CVTSI2SD XMM4,EAX ;and back again into XMM4 (low)
;***** conversion between single-precision and integers ..
;***** watch these in XMM integer display switched to dword display
CVTPS2DQ XMM0,[SINGLEFP1] ;move 4 single-precision fp values to dwords as integers
CVTTPS2DQ XMM1,[SINGLEFP1] ;same as above with truncation
;***** and watch this in the SSE fp pane ..
CVTDQ2PS XMM6,XMM0 ;and convert back to 4 single-precision fp values
CVTDQ2PS XMM7,XMM1 ;ditto
RET