These are SSE2 integer instructions which use the 128-bit XMM
registers. They were introduced in the Pentium 4 and Xeon processors.
For convenience they are divided into two parts.
The first part demonstrates
extensions to the MMX SSE integer instructions
(except for PSHUFW). These are now extended for use with the XMM registers.
The second part demonstrates other SSE2 integer
instructions for the XMM registers.
Before using the SSE2 instructions in your code you need to
check if they are available on the processor which is running your program.
This is done by testing bit 26 in edx after calling CPUID with eax=1.
The following tests assume that somewhere in the data section
the following double qwords have been declared. Each double qword is a
total of 128 bits.
Set the breakpoint to XMMSSE2_INTEGER1, run the test and single-step through the code:-
Set the breakpoint to XMMSSE2_INTEGER_NEW, run the test and single-step
through the code:-
DQWORD_VALUE1 DQ 0204060822222222h
DQ 444444446080C0E0h
DQWORD_VALUE2 DQ 0406080A44444444h
DQ 6666666680A0E100h
Since it is possible that these labels as declared here without using
ALIGN) may not be on a 16-bit boundary, the
MOVDQU instruction must be used to transfer data from this into an XMM
register. MOVDQU (move unaligned double quadword) does not care about
alignment. If you specify ALIGN 16 immediately before the data declaration,
however, then the faster MOVDQA (move aligned double quadword) can be used
instead. If you get this wrong your program will cause an exception.
See more about this.
To watch these tests properly you need to set the appropriate breakpoint,
start the test and then single step through the instructions. You can then
watch how they change the XMM registers. Using GoBug you can make the XMM
registers appear in their integer format using the appropriate button on the
toolbar.
Integer instructions for XMM registers (extensions to the MMX SSE
integer instructions)
XMMSSE2_INTEGER1: ;MMX integer instructions now for XMM registers
MOV EAX,1 ;request CPU feature flags
CPUID ;0Fh, 0A2h CPUID instruction
TEST EDX,4000000h ;test bit 26 (SSE2)
JNZ >L18 ;SSE2 available
CALL NOSSE2INTMESS
RET
L18:
;********************* xmm average computation
;********************* (switch xmm pane to "byte" view)
MOVDQU XMM0,[DQWORD_VALUE1] ;give 1st tester values to XMM0
MOVDQA XMM2,XMM0 ;copying to XMM2
MOVDQU XMM1,[DQWORD_VALUE2] ;give 2nd tester values to XMM1
MOVDQA XMM3,XMM1 ;copying to XMM3
PAVGB XMM0,XMM1 ;packed average-by-byte result in XMM0
;********************* (switch xmm pane to "word" view)
MOVDQA XMM0,XMM2 ;restore XMM0
PAVGW XMM0,XMM1 ;packed average-by-word result in XMM0
;********************* xmm extract to gp register
PEXTRW EAX,XMM1,2 ;extract 3rd word of XMM0 (low) to eax
PEXTRW EDX,XMM1,0 ;extract 1st word of XMM0 (low) to edx
PEXTRW ESI,XMM1,4 ;extract 5th word of XMM0 (high) to esi
PEXTRW EDI,XMM1,7 ;extract 8th word of XMM0 (high) to edi
;********************* xmm insert from gp register
PINSRW XMM0,EAX,0 ;insert eax to 1st word of XMM0 (low)
PINSRW XMM0,EDX,2 ;insert edx to 3rd word of XMM0 (low)
PINSRW XMM0,ESI,4 ;insert esi to 5th word of XMM0 (high)
PINSRW XMM0,EDI,7 ;insert edi to 8th word of XMM0 (high)
;********************* report xmm byte maximum
;********************* (switch xmm pane to "byte" view)
MOVDQA XMM3,XMM2
PMAXUB XMM3,XMM0 ;report greater-by-byte in XMM3
;********************* report xmm byte minimum
MOVDQA XMM3,XMM2
PMINUB XMM3,XMM0 ;report lesser-by-byte in XMM3
;********************* compute sum of absolute differences
MOVDQA XMM3,XMM2
PSADBW XMM3,XMM0 ;sum of absolute differences in XMM3
;********************* report xmm word maximum
;********************* (switch xmm pane to "word" view)
MOVDQA XMM3,XMM2
PMAXSW XMM3,XMM0 ;report greater-by-word in XMM3
;********************* report xmm word minimum
MOVDQA XMM3,XMM2
PMINSW XMM3,XMM0 ;report lesser-by-word in XMM3
;**** multiply packed unsigned word integers high word result only
MOVDQA XMM0,XMM3
PMULHUW XMM0,XMM3 ;XMM3 * XMM0, high word result in XMM0
;********************* create byte mask from most significant bits
;********************* (switch xmm pane to "byte" view)
PMOVMSKB EAX,XMM0
;
RET
Other SSE2 integer instructions for the XMM registers
XMMSSE2_INTEGER2:
MOV EAX,1 ;request CPU feature flags
CPUID ;0Fh, 0A2h CPUID instruction
TEST EDX,4000000h ;test bit 26 (SSE2)
JNZ >L20 ;SSE2 available
CALL NOSSE2INTMESS
RET
L20:
;********************* (switch xmm pane to "qword" view)
MOVDQU XMM0,[DQWORD_VALUE1] ;give 1st tester values to XMM0
MOVDQA XMM2,XMM0 ;copying to XMM2
MOVDQU XMM1,[DQWORD_VALUE2] ;give 2nd tester values to XMM1
MOVDQA XMM3,XMM1 ;copying to XMM3
PADDQ XMM0,XMM1 ;packed quadword add
MOVDQA XMM0,XMM2
PSUBQ XMM0,XMM1 ;packed quadword subtract
MOVDQA XMM0,XMM2
PSLLDQ XMM0,5 ;shift double quadword left logical (5 bytes)
PSRLDQ XMM0,5 ;shift double quadword right logical (5 bytes)
MOVDQA XMM0,XMM2
PUNPCKHQDQ XMM0,XMM1 ;unpack high quadwords
PUNPCKLQDQ XMM0,XMM1 ;unpack low quadwords
;********************* (switch xmm pane to "dword" view)
MOVDQA XMM0,XMM2
PMULUDQ XMM0,XMM1 ;multiply packed unsigned dword integers
MOVDQA XMM0,XMM2
PSHUFD XMM0,XMM1,33h ;shuffle packed doubleword integers
;********************* (switch xmm pane to "word" view)
MOVDQA XMM0,XMM2
PSHUFLW XMM0,XMM1,33h ;shuffle packed low words
PSHUFHW XMM0,XMM1,33h ;shuffle packed high words
;********************* (open mmx pane and switch to "qword" view)
;********************* (switch xmm pane to "qword" view)
MOVDQ2Q MM0,XMM1 ;move qword integer from XMM to MMX
MOVQ2DQ XMM6,MM0 ;move qword integer from MMX to XMM
;
RET
;