XMM SSE2 floating point instructions

**About the XMM SSE2 floating point instructions**

These are SSE2 (streaming SIMD extensions) floating
point instructions which use the 128-bit XMM registers and which can handle
double-precision (64-bit) floating point values. There are also some
instructions which work on single-precision (32-bit) floating point values.
Support for SSE2 was introduced in the Pentium 4 and Xeon processors.

Generally the instructions are very similar to the
SSE floating point instructions except for the
data size that they work with.

Before using these instructions in your code you need to check if they
are available on the processor which is running your program. This is done
by calling CPUID having set EAX=1. Then test bit 26 of EDX. The bit will
be set if the SSE2 instructions can be used.

In the tests the following data declarations are used:-

DOUBLEFP1 DQ 1.1 DQ 3.3 DOUBLEFP2 DQ 20.66 DQ 40.66 DOUBLEFPN DQ -5.1 DQ +6.3Since it is possible that these labels may not be on a 16-bit boundary, the MOVUPD instruction must be used to transfer the data from memory into an XMM register. MOVUPD (move two unaligned packed double-precision values) does not care about alignment. If you specify ALIGN 16 immediately before the data declaration, however, then the faster MOVAPD (move two aligned packed double-precision values) can be used instead. If you get this wrong your program will cause an exception. See more about this (in the case of MOVDQA and MOVDQU which work in the same way). When transferring between registers, either MOVUPD or MOVAPD may be used.

The instructions we are looking at here tend largely to be two types. The first type of instruction deals with two 64-bit floating point numbers at once. These instructions have "PD" in their mnemonic name, referring to "packed double-precision". The second type of instruction deals with just one 64-bit floating point value. These instructions have "SD" in their mnemonic name referring to "scalar double-precision". They work on the lowest part of the XMM register only, that is to say the first 64 bits of the register (bits 0 to 63).

To watch these tests properly you need to set the appropriate breakpoint, start the test and then single step through the instructions. You can then watch how they change the XMM registers. Using GoBug you can make the XMM registers appear in their floating point SSE2 format using the appropriate button on the toolbar.

**SSE2 instructions:-**

Data movement instructions

Arithmetic instructions

Logical instructions

Comparison instructions

Shuffle and unpack instructions

Conversion instructions

SSE2 Data movement instructions

This demonstrates moving data into the registers and
between the registers. MOVUPD and MOVAPD (aligned version), MOVSD, MOVLPD
and MOVHPD can also be
used to get values out and into memory. MOVMSKPD can be used after a
comparison instruction to get the result of the compare
into eax for analysis.

As an experiment we also try the SSE integer instruction MOVDQU and
the SSE floating point instruction MOVUPS to see if they do the same as
MOVUPD. It seems they do, merely performing a bit transfer into the
XMM register. However, Intel do warn against using different instructions from those specified to avoid unstated performance implications.

The breakpoint is XMMSSE2_FPDATA:-

XMMSSE2_FPDATA: MOV EAX,1 ;request CPU feature flags CPUID ;0Fh, 0A2h CPUID instruction TEST EDX,4000000h ;test bit 26 (SSE2) JNZ >L20 ;SSE2 available CALL NOSSE2FPMESS ;displays message if SSE2 not available RET L20: ;***** display XMM registers in SSE2 mode .. MOVUPD XMM0,[DOUBLEFP1] ;move two double precision fp values into XMM0 MOVAPD XMM7,XMM0 ;copying to XMM7 MOVSD XMM2,[DOUBLEFP2] ;move fp value to XMM1 low only MOVLPD XMM3,[DOUBLEFP2] ;this seems to be the same MOVHPD XMM4,[DOUBLEFP2] ;but this moves the high value MOVUPD XMM0,[DOUBLEFPN] ;move two new values, one is negative MOVMSKPD EAX,XMM0 ;get both sign bits in XMM0 into eax ;************ and as an experiment, see if this does the same as MOVUPD .. MOVDQU XMM1,[DOUBLEFPN] ;use integer instruction to transfer the bits ;************ as this too (one byte smaller) .. MOVUPS XMM2,[DOUBLEFPN] ;use SSE instruction to transfer the bits RET

SSE2 Arithmetic instrunctions

This demonstrated the arithmetic instructions which can
work in the XMM registers using double-precision (64-bit) numbers.

The breakpoint is XMMSSE2_FPARITH:-

XMMSSE2_FPARITH: MOV EAX,1 ;request CPU feature flags CPUID ;0Fh, 0A2h CPUID instruction TEST EDX,4000000h ;test bit 26 (SSE2) JNZ >L22 ;SSE2 available CALL NOSSE2FPMESS ;displays message if SSE2 not available RET L22: ;***** display XMM registers in SSE2 mode .. MOVUPD XMM0,[DOUBLEFP1] ;move two double precision fp values into XMM0 MOVAPD XMM2,XMM0 ;copying to XMM2 MOVUPD XMM1,[DOUBLEFP2] ;move 2nd tester fp values into XMM1 MOVAPD XMM3,XMM1 ;copying to XMM3 ADDPD XMM0,XMM1 ;add both fp values result in XMM0 MOVAPD XMM0,XMM2 ;restore value in XMM0 SUBPD XMM0,XMM1 ;subtract both fp values result in XMM0 ;******* MOVAPD XMM0,XMM2 ;restore value in XMM0 ADDSD XMM0,XMM1 ;add low fp value result in XMM0 SUBSD XMM0,XMM1 ;subtract low fp value result in XMM0 ;******* MOVAPD XMM0,XMM2 ;restore value in XMM0 MULPD XMM0,XMM1 ;multiply both fp values result in XMM0 ;******* MOVAPD XMM0,XMM2 ;restore value in XMM0 MULSD XMM0,XMM1 ;multiply low fp value result in XMM0 ;******* MOVAPD XMM0,XMM2 ;restore value in XMM0 DIVPD XMM0,XMM1 ;divide both fp values result in XMM0 ;******* MOVAPD XMM0,XMM2 ;restore value in XMM0 DIVSD XMM0,XMM1 ;divide low fp value result in XMM0 ;******* MOVAPD XMM0,XMM2 ;restore value in XMM0 SQRTPD XMM0,XMM1 ;get square roots of both fp values result in XMM0 ;******* MOVAPD XMM0,XMM2 ;restore value in XMM0 SQRTSD XMM0,XMM1 ;get square root of low fp value result in XMM0 ;******* MOVAPD XMM0,XMM2 ;restore value in XMM0 MAXPD XMM0,XMM1 ;get numerically greater fp values result in XMM0 ;******* MOVAPD XMM0,XMM2 ;restore value in XMM0 MAXSD XMM0,XMM1 ;get numerically greater of low fp values result in XMM0 ;******* MOVAPD XMM0,XMM2 ;restore value in XMM0 MINPD XMM0,XMM1 ;get numerically smaller fp values result in XMM0 ;******* MOVAPD XMM0,XMM2 ;restore value in XMM0 MINSD XMM0,XMM1 ;get numerically smaller of low fp values result in XMM0 RET

SSE2 Logical instructions

This demonstrates the logical instructions which can
work in the XMM registers using double-precision (64-bit) numbers.

The breakpoint is XMMSSE2_FPLOGIC:-

XMMSSE2_FPLOGIC: MOV EAX,1 ;request CPU feature flags CPUID ;0Fh, 0A2h CPUID instruction TEST EDX,4000000h ;test bit 26 (SSE2) JNZ >L24 ;SSE2 available CALL NOSSE2FPMESS ;displays message if SSE2 not available RET L24: ;***** display XMM registers in SSE2 mode .. MOVUPD XMM0,[DOUBLEFP1] ;move two double precision fp values into XMM0 MOVAPD XMM2,XMM0 ;copying to XMM2 MOVUPD XMM1,[DOUBLEFP2] ;move 2nd tester fp values into XMM1 MOVAPD XMM3,XMM1 ;copying to XMM3 ANDPD XMM0,XMM1 ;perform AND on both fp values result in XMM0 ;******* MOVAPD XMM0,XMM2 ;restore value in XMM0 ANDNPD XMM0,XMM1 ;perform AND NOT on both fp values result in XMM0 ;******* MOVAPD XMM0,XMM2 ;restore value in XMM0 ORPD XMM0,XMM1 ;perform OR on both fp values result in XMM0 ;******* MOVAPD XMM0,XMM2 ;restore value in XMM0 XORPD XMM0,XMM1 ;perform XOR on both fp values result in XMM0 RET

SSE2 Comparison instructions

This demonstrates the comparison instructions which can
work in the XMM registers using single-precision (64-bit) numbers.

You tell CMPPD and CMPSD what to do by specifying an immediate value in the
third operand. It is not easy to remember what value does what, so some
assemblers (including GoAsm) also provide psuedo mnemonics in the form
recommended by Intel (given here in the comment). Somewhat easier to use,
because they use the ordinary flags are COMISD and UCOMISD although they
only work on one floating point value in the XMM register (contained in
bits 0-63).

The breakpoint is XMMSSE2_FPCOMP:-

XMMSSE2_FPCOMP: MOV EAX,1 ;request CPU feature flags CPUID ;0Fh, 0A2h CPUID instruction TEST EDX,4000000h ;test bit 26 (SSE2) JNZ >L26 ;SSE2 available CALL NOSSE2FPMESS ;displays message if SSE2 not available RET L26: ;***** display XMM registers in SSE2 mode .. MOVUPD XMM0,[DOUBLEFP1] ;move two double precision fp values into XMM0 MOVAPD XMM2,XMM0 ;copying to XMM2 MOVUPD XMM1,[DOUBLEFP2] ;move 2nd tester fp values into XMM1 MOVAPD XMM3,XMM1 ;copying to XMM3 ;********************* compare instructions working on both fp values CMPPD XMM0,XMM1,0 ;=CMPEQPD see whether equal, result in XMM0 MOVAPD XMM0,XMM2 ;restore original value to XMM0 CMPPD XMM0,XMM1,1 ;=CMPLTPD see whether less than, result in XMM0 MOVAPD XMM0,XMM2 ;restore original value to XMM0 CMPPD XMM0,XMM1,2 ;=CMPLEPD see whether less than or equal, result in XMM0 MOVAPD XMM0,XMM2 ;restore original value to XMM0 CMPPD XMM0,XMM1,3 ;=CMPUNORDPD see unordered, result in XMM0 MOVAPD XMM0,XMM2 ;restore original value to XMM0 CMPPD XMM0,XMM1,4 ;=CMPNEQPD see whether not equal, result in XMM0 MOVAPD XMM0,XMM2 ;restore original value to XMM0 CMPPD XMM0,XMM1,5 ;=CMPNLTPD see whether not less than, result in XMM0 MOVAPD XMM0,XMM2 ;restore original value to XMM0 CMPPD XMM0,XMM1,6 ;=CMPNLEPD see whether not less than or equal, result in XMM0 MOVAPD XMM0,XMM2 ;restore original value to XMM0 CMPPD XMM0,XMM1,7 ;=CMPORDPD see whether ordered, result in XMM0 ;********************* compare instructions working on low value only MOVAPD XMM0,XMM2 ;restore original value to XMM0 CMPSD XMM0,XMM1,0 ;=CMPEQPD see whether equal, result in XMM0 MOVAPD XMM0,XMM2 ;restore original value to XMM0 CMPSD XMM0,XMM1,1 ;=CMPLTPD see whether less than, result in XMM0 MOVAPD XMM0,XMM2 ;restore original value to XMM0 CMPSD XMM0,XMM1,2 ;=CMPLEPD see whether less than or equal, result in XMM0 MOVAPD XMM0,XMM2 ;restore original value to XMM0 CMPSD XMM0,XMM1,3 ;=CMPUNORDPD see unordered, result in XMM0 MOVAPD XMM0,XMM2 ;restore original value to XMM0 CMPSD XMM0,XMM1,4 ;=CMPNEQPD see whether not equal, result in XMM0 MOVAPD XMM0,XMM2 ;restore original value to XMM0 CMPSD XMM0,XMM1,5 ;=CMPNLTPD see whether not less than, result in XMM0 MOVAPD XMM0,XMM2 ;restore original value to XMM0 CMPSD XMM0,XMM1,6 ;=CMPNLEPD see whether not less than or equal, result in XMM0 MOVAPD XMM0,XMM2 ;restore original value to XMM0 CMPSD XMM0,XMM1,7 ;=CMPORDPD see whether ordered, result in XMM0 ;********************* compare and give result in eflags MOVAPD XMM0,XMM2 ;restore original value to XMM0 COMISD XMM0,XMM1 ;look at lowest only result in eflags UCOMISD XMM0,XMM1 ;(unordered compare) MOVUPD XMM1,[DOUBLEFPN] ;move two -ve, two +ve values into XMM1 COMISD XMM0,XMM1 ;look at lowest only - result in eflags UCOMISD XMM0,XMM1 ;(unordered compare) RET

SSE2 Shuffle and unpack instructions

With these instructions you can move the double-precision
(64-bit) floating point values around the XMM registers.

The breakpoint is XMMSSE2_SHUFF:-

XMMSSE2_SHUFF: MOV EAX,1 ;request CPU feature flags CPUID ;0Fh, 0A2h CPUID instruction TEST EDX,4000000h ;test bit 26 (SSE2) JNZ >L28 ;SSE2 available CALL NOSSE2FPMESS ;displays message if SSE2 not available RET L28: ;***** display XMM registers in SSE2 mode .. MOVUPD XMM0,[DOUBLEFP1] ;move two double precision fp values into XMM0 MOVAPD XMM2,XMM0 ;copying to XMM2 MOVUPD XMM1,[DOUBLEFP2] ;move 2nd tester fp values into XMM1 MOVAPD XMM3,XMM1 ;copying to XMM3 SHUFPD XMM0,XMM1,3h ;shuffle pack into destination SHUFPD XMM0,XMM0,1h ;swap the values in XMM0 MOVAPD XMM0,XMM2 ;restore original value to XMM0 UNPCKHPD XMM0,XMM1 ;unpack (high) and put into destination MOVAPD XMM0,XMM2 ;restore original value to XMM0 UNPCKLPD XMM0,XMM0 ;unpack (low) and put into destination RET

SSE2 Conversion instructions

The instructions convert between integers, single-precision
and double-precision floating point values. They should be read together
with the SSE conversion instructions.

The breakpoint is XMMSSE2_CONV:-

XMMSSE2_CONV: MOV EAX,1 ;request CPU feature flags CPUID ;0Fh, 0A2h CPUID instruction TEST EDX,4000000h ;test bit 26 (SSE2) JNZ >L30 ;SSE2 available CALL NOSSE2FPMESS ;displays message if SSE2 not available RET L30: ;***** display XMM registers in both SSE and SSE2 modes .. ;***** conversion between single and double-precision fp values .. CVTPS2PD XMM0,[SINGLEFP1] ;put single-precision fp values into XMM0 as double-precision CVTPD2PS XMM6,XMM0 ;convert double precision to single precision in XMM7 CVTSS2SD XMM1,[SINGLEFP1] ;as CVTPS2PD but working with only one value CVTSD2SS XMM7,XMM1 ;as CVTSS2SD but working with only one value ;***** conversion between integers and double-precision fp values .. ;***** open the MMX integer pane for these tests .. CVTPD2PI MM0,XMM0 ;convert fp values in XMM0 to integers in MM0 CVTTPD2PI MM1,XMM0 ;same as above with truncation CVTPI2PD XMM0,[DINTEGER] ;convert 23 and 24 to double-precision fp values ;***** open the XMM integer display and switch to dword display CVTPD2DQ XMM7,XMM0 ;and convert 23 and 24 to dword integers into XMM7 (low) CVTTPD2DQ XMM7,XMM0 ;same as above with truncation CVTDQ2PD XMM3,XMM7 ;and back into fp values in XMM3 CVTSD2SI EAX,XMM0 ;take low fp value and convert as integer in EAX CVTTSD2SI EDX,XMM0 ;same as above with truncation CVTSI2SD XMM4,EAX ;and back again into XMM4 (low) ;***** conversion between single-precision and integers .. ;***** watch these in XMM integer display switched to dword display CVTPS2DQ XMM0,[SINGLEFP1] ;move 4 single-precision fp values to dwords as integers CVTTPS2DQ XMM1,[SINGLEFP1] ;same as above with truncation ;***** and watch this in the SSE fp pane .. CVTDQ2PS XMM6,XMM0 ;and convert back to 4 single-precision fp values CVTDQ2PS XMM7,XMM1 ;ditto RET