These tests all use the CPUID instruction (called with EAX=1)
to obtain information about the CPU. This gives its results in EDX and
bit 23 is set if MMX instructions can be used. In the case of the SSE and
SSE2 integer instructions bits 25 and 26 must be tested respectively.
Note also that EMMS is used at the end of every SSE and SSE2 sequence
using the MMX registers to clear them if an instruction using the FPU
follows. EMMS sets the Tag word (which is used by the FPU) to zero,
indicating that all FP registers are available. This is essential because
the MMX registers are in fact the same as the FP registers. For the same
reason you can't mix MMX and FP instructions. See generally
instructions for the FPU.
There are 4 demonstrations:-
Moving and shifting
Logical instructions
Main pack instructions
Addressing methods
SSE integer instructions
SSE2 integer instructions
Moving and shifting
Using these instructions you can move 32 bits (MOVD) or 64 bits (MOVQ) at once to and from the MMX registers or between the registers or between the registers and memory. You can also shift data bit-wise using the "PS" instructions, followed by "L" (left) or "R" (right), "L" (logical) or "A" (arithmetic) and then "W" (16 bits), "D" (32 bits) or "Q" (64 bits).
Set the breakpoint to MMX_MOVESHIFT run the test and single-step through the code:-
MMX_MOVESHIFT:
MOV EAX,1 ;request CPU feature flags
CPUID ;0Fh, 0A2h CPUID instruction
TEST EDX,800000h ;test bit 23 to see if MMX available
JNZ .20 ;yes
RET
.20
;***************** MOVING AND SHIFTING
MOV EAX,12345678h
MOVD MM0,EAX
MOVQ MM1,MM0
MOV D[QWORD_VALUE],80008000h
MOV D[QWORD_VALUE+4],80008000h
MOVQ MM3,[QWORD_VALUE]
PSLLW MM3,1
MOVQ MM4,MM3
PSRLW MM3,1
MOVQ MM3,[QWORD_VALUE]
PSLLD MM3,1
MOVQ MM4,MM3
PSRLD MM3,1
MOVQ MM3,[QWORD_VALUE]
PSLLQ MM3,1
MOVQ MM4,MM3
PSRLQ MM3,1
;*****************
MOVQ MM3,[QWORD_VALUE]
PSLLW MM3,16
MOVQ MM4,MM3
PSRLW MM3,16
MOVQ MM3,[QWORD_VALUE]
PSLLD MM3,16
MOVQ MM4,MM3
PSRLD MM3,16
MOVQ MM3,[QWORD_VALUE]
PSLLQ MM3,16
MOVQ MM4,MM3
PSRLQ MM3,16
;*****************
MOV D[QWORD_VALUE],81808080h
MOV D[QWORD_VALUE+4],81808080h
MOVQ MM3,[QWORD_VALUE]
PSRAW MM3,2
MOVQ MM4,MM3
PSRAD MM3,2
EMMS
RET
Logical instructions
This demonstrates PAND (bitwise logical AND), PANDN (bitwise logical AND NOT), POR (bitwise logical OR) and PXOR (bitwise logical exclusive OR).
Set the breakpoint to MMX_LOGICAL run the test and single-step through the code:-
MMX_LOGICAL:
MOV EAX,1 ;request CPU feature flags
CPUID ;0Fh, 0A2h CPUID instruction
TEST EDX,800000h ;test bit 23 to see if MMX available
JNZ .20 ;yes
RET
.20
MOV D[QWORD_VALUE],10101010h
MOV D[QWORD_VALUE+4],10101010h
MOVQ MM1,[QWORD_VALUE]
MOV D[QWORD_VALUE],0h
MOV D[QWORD_VALUE+4],0FFFFFFFFh
MOVQ MM0,[QWORD_VALUE]
PAND MM0,MM1
MOVQ MM2,MM0
;********
MOVQ MM0,[QWORD_VALUE]
PANDN MM0,MM1
MOVQ MM3,MM0
;********
MOVQ MM0,[QWORD_VALUE]
POR MM0,MM1
MOVQ MM4,MM0
;********
MOVQ MM0,[QWORD_VALUE]
PXOR MM0,MM1
MOVQ MM5,MM0
EMMS
RET
Main pack instructions
The pack instructions operate on a 64 bit quantity which has packed in it either 8 bytes (packed byte) or 4 words (packed word) or 2 dwords (packed doubleword). All pack instructions start with P. PACKSSWB (words) and PACKSSDW (doublewords) actually convert a two 64 bit signed values into packed signed format, whereas PACKUSWB converts two 64 bit signed values into packed unsigned bytes. PUNPK instructions act in reverse. For the other instructions, ignore the P and the remainder of the instruction becomes more obvious.
Set the breakpoint to MMX_PACKINSTRUCTIONS run the test and single-step through the code:-
MMX_PACKINSTRUCTIONS:
MOV EAX,1 ;request CPU feature flags
CPUID ;0Fh, 0A2h CPUID instruction
TEST EDX,800000h ;test bit 23 to see if MMX available
JNZ .20 ;yes
RET
.20
;***************** pack
MOV D[QWORD_VALUE],2222h
MOV D[QWORD_VALUE+4],0FFFFFFFFh
MOVQ MM0,[QWORD_VALUE]
MOV D[QWORD_VALUE],1111h
MOV D[QWORD_VALUE+4],0FFFFFFFFh
MOVQ MM1,[QWORD_VALUE]
PACKSSDW MM0,MM1
MOVQ MM2,MM0
;******************
MOVQ MM0,[QWORD_VALUE]
PACKSSWB MM0,MM1
MOVQ MM3,MM0
;******************
MOVQ MM0,[QWORD_VALUE]
PACKUSWB MM0,MM1
MOVQ MM4,MM0
;****************** unpack
MOVQ MM0,[QWORD_VALUE]
PUNPCKHBW MM0,MM1
MOVQ MM5,MM0
;******************
MOVQ MM0,[QWORD_VALUE]
PUNPCKHWD MM0,MM1
MOVQ MM6,MM0
;******************
MOVQ MM0,[QWORD_VALUE]
PUNPCKHDQ MM0,MM1
MOVQ MM7,MM0
;******************
MOVQ MM0,[QWORD_VALUE]
PUNPCKLBW MM0,MM1
MOVQ MM2,MM0
;******************
MOVQ MM0,[QWORD_VALUE]
PUNPCKLWD MM0,MM1
MOVQ MM3,MM0
;******************
MOVQ MM0,[QWORD_VALUE]
PUNPCKLDQ MM0,MM1
MOVQ MM4,MM0
;****************** add
MOVQ MM0,[QWORD_VALUE]
PADDB MM0,MM1
MOVQ MM5,MM0
;******************
MOVQ MM0,[QWORD_VALUE]
PADDW MM0,MM1
MOVQ MM6,MM0
;******************
MOVQ MM0,[QWORD_VALUE]
PADDD MM0,MM1
MOVQ MM7,MM0
;******************
MOVQ MM0,[QWORD_VALUE]
PADDSB MM0,MM1
MOVQ MM2,MM0
;******************
MOVQ MM0,[QWORD_VALUE]
PADDSW MM0,MM1
MOVQ MM3,MM0
;******************
MOVQ MM0,[QWORD_VALUE]
PADDUSB MM0,MM1
MOVQ MM4,MM0
;******************
MOVQ MM0,[QWORD_VALUE]
PADDUSW MM0,MM1
MOVQ MM5,MM0
;****************** subtract
MOV D[QWORD_VALUE],67676767h
MOV D[QWORD_VALUE+4],67676768h
MOVQ MM0,[QWORD_VALUE]
MOVQ MM1,[QWORD_VALUE]
PSUBB MM0,MM1
MOVQ MM6,MM0
;******************
MOVQ MM0,[QWORD_VALUE]
MOVQ MM1,[QWORD_VALUE]
PSUBSW MM0,MM1
MOVQ MM7,MM0
;******************
MOVQ MM0,[QWORD_VALUE]
MOVQ MM1,[QWORD_VALUE]
PSUBD MM0,MM1
MOVQ MM2,MM0
;****************** compare
MOV D[QWORD_VALUE],1111h
MOV D[QWORD_VALUE+4],0FFFFFFFFh
MOVQ MM1,[QWORD_VALUE]
MOV D[QWORD_VALUE],2222h
MOV D[QWORD_VALUE+4],0FFFFFFFFh
MOVQ MM0,[QWORD_VALUE]
PCMPEQB MM0,MM1
MOVQ MM3,MM0
;******************
MOVQ MM0,[QWORD_VALUE]
PCMPEQW MM0,MM1
MOVQ MM4,MM0
;******************
MOVQ MM0,[QWORD_VALUE]
PCMPEQD MM0,MM1
MOVQ MM5,MM0
;******************
MOV D[QWORD_VALUE],1111h
MOV D[QWORD_VALUE+4],0FFFFFFFFh
MOVQ MM1,[QWORD_VALUE]
MOV D[QWORD_VALUE],2222h
MOV D[QWORD_VALUE+4],0FFFFFFFFh
MOVQ MM0,[QWORD_VALUE]
PCMPGTB MM0,MM1
MOVQ MM6,MM0
;******************
MOVQ MM0,[QWORD_VALUE]
PCMPGTW MM0,MM1
MOVQ MM7,MM0
;******************
MOVQ MM0,[QWORD_VALUE]
PCMPGTD MM0,MM1
MOVQ MM2,MM0
;******************
MOV D[QWORD_VALUE],222222h
MOV D[QWORD_VALUE+4],333333h
MOVQ MM0,[QWORD_VALUE]
MOVQ MM1,[QWORD_VALUE]
PMADDWD MM0,MM1
MOVQ MM3,MM0
;******************
MOVQ MM0,[QWORD_VALUE]
PMULHW MM0,MM1
MOVQ MM4,MM0
;******************
MOVQ MM0,[QWORD_VALUE]
PMULLW MM0,MM1
MOVQ MM5,MM0
EMMS
RET
Addressing methods
These tests show that traditional addressing methods can be used with MMX instructions.
Set the breakpoint to MMX_PACKINSTRUCTIONS run the test and single-step through the code:-
MMX_ADDRESSING:
MOV EAX,1 ;request CPU feature flags
CPUID ;0Fh, 0A2h CPUID instruction
TEST EDX,800000h ;test bit 23 to see if MMX available
JNZ .20 ;yes
RET
.20
MOV EDI,ADDR MMXBUFF
MOV ECX,64
MOV EAX,12345678h
REP STOSD
MOV ESI,ADDR MMXBUFF
MOV EAX,4
MOVD MM0,[ESI]
MOVD MM0,[ESI+EAX*2]
MOVD MM0,[ESI+EAX*4+44h]
MOVD MM6,[MMXBUFF]
MOVD MM6,[MMXBUFF+EAX*2-8h]
;*****************
MOVQ MM6,[MMXBUFF+8h]
MOVQ MM6,[ESI+EAX*2+25h]
;*****************
XOR EAX,EAX
MOVQ [QWORD_VALUE],MM6
MOVQ [QWORD_VALUE],MM6
MOVQ [QWORD_VALUE+EAX],MM6
MOVQ [QWORD_VALUE+EAX*2],MM6
MOVQ [MMXBUFF+EAX*2+10h],MM6
MOV EAX,20h
MOVQ [MMXBUFF+EAX*2-10h],MM6
;***************** PACK with signed saturation
MOV EAX,10h
PACKSSWB MM0,[MMXBUFF+EAX*8-8h]
PACKUSWB MM6,[MMXBUFF+EAX*8-8h]
PADDD MM6,[MMXBUFF+EAX*2-10h]
;*****************
MOV EAX,[EBP+14h]
MOV EAX,[EBP-14h]
MOV EAX,[ESP-14h]
MOV EAX,[ESP+14h]
EMMS ;clear floating point tag word
RET
SSE integer instructions
The SSE instructions (streaming SIMD extensions) were
introduced by Intel in their Pentium III processors. The additional
instructions set out here all work with integers in the MMX registers.
Note that in later processors, these same instructions (except for PSHUFW) also
work with the XMM registers (if SSE2 instructions are available)
see MMX integer instructions now for XMM registers.
Before using these instructions in your code you need to check if they
are available on the processor which is running your program. This is done
by testing bit 25 of edx after calling CPUID with eax=1.
The following tests assume that somewhere in the data section
the following qword has been declared
QWORD_VALUE DQ 0
Set the breakpoint to MMXSSE_INTEGER, run the test and single-step through the code:-
MMXSSE_INTEGER: ;extensions to MMX added in the SSE instructions
MOV EAX,1 ;request CPU feature flags
CPUID ;0Fh, 0A2h CPUID instruction
TEST EDX,2000000h ;test bit 25 (SSE)
JNZ >L16 ;SSE available
PUSH 40h ;information+ok button only
PUSH 'Testbug - SSE integer test'
PUSH 'Sorry the SSE integer instructions are not available on your processor',[hWnd]
CALL MessageBoxA ;wait till ok pressed
RET
L16:
;********************* mmx average computation
;********************* (switch mmx pane to "byte" view)
MOV D[QWORD_VALUE],02040608h
MOV D[QWORD_VALUE+4],22222222h
MOVQ MM0,[QWORD_VALUE] ;give 1st tester values to MM0
MOVQ MM2,MM0 ;copying to MM2
MOV D[QWORD_VALUE],0406080Ah
MOV D[QWORD_VALUE+4],44444444h
MOVQ MM1,[QWORD_VALUE] ;give 2nd tester values to MM1
MOVQ MM3,MM1 ;copying to MM3
PAVGB MM0,MM1 ;packed average-by-byte result in MM0
;********************* (switch mmx pane to "word" view)
MOVQ MM0,MM2 ;restore MM0
PAVGW MM0,MM1 ;packed average-by-word result in MM0
;********************* mmx extract to gp register
PEXTRW EAX,MM1,2 ;extract 3rd word of MM1 to eax
PEXTRW EDX,MM1,0 ;extract 1st word of MM1 to edx
;********************* mmx insert from gp register
PINSRW MM0,EAX,0 ;insert eax to 1st word of MM0
PINSRW MM0,EDX,2 ;insert edx to 3rd word of MM0
;********************* report mmx byte maximum
;********************* (switch mmx pane to "byte" view)
MOVQ MM3,MM2
PMAXUB MM3,MM0 ;report greater-by-byte in MM3
;********************* report mmx byte minimum
MOVQ MM3,MM2
PMINUB MM3,MM0 ;report lesser-by-byte in MM3
;********************* compute sum of absolute differences
MOVQ MM3,MM2
PSADBW MM3,MM0 ;sum of absolute differences in MM3
;********************* report mmx word maximum
;********************* (switch mmx pane to "word" view)
MOVQ MM3,MM2
PMAXSW MM3,MM0 ;report greater-by-word in MM3
;********************* report mmx word minimum
MOVQ MM3,MM2
PMINSW MM3,MM0 ;report lesser-by-word in MM3
;********************* shuffle packed word integers
PSHUFW MM0,MM3,2 ;shuffle MM3 by 2 and put result in MM0
;**** multiply packed unsigned word integers high word result only
MOVQ MM0,MM3
PMULHUW MM0,MM3 ;MM3 * MM0, high word result in MM0
;********************* create byte mask from most significant bits
;********************* (switch mmx pane to "byte" view)
PMOVMSKB EAX,MM0
;
EMMS ;clear floating point tag word
RET
;
SSE2 integer instructions
The SSE2 instructions were introduced in the Intel Pentium
4 and Xeon processors. The instructions shown here make use of the MMX
registers.
Before using these instructions in your code you need to check if they
are available on the processor which is running your program. This is done
by testing bit 26 of edx after calling CPUID with eax=1.
The following test assume that somewhere in the data section
the following qword (8 bytes) has been declared
DQWORD_VALUE DQ 0
Set the breakpoint to MMXSSE2_INTEGER, run the test and single-step
through the code:-
MMXSSE2_INTEGER:
MOV EAX,1 ;request CPU feature flags
CPUID ;0Fh, 0A2h CPUID instruction
TEST EDX,4000000h ;test bit 26 (SSE2)
JNZ >L16 ;SSE2 available
CALL NOSSE2INTMESS
RET
L16:
;********************* mmx average computation
;********************* (switch mmx pane to "qword" view)
MOV D[QWORD_VALUE],02040608h
MOV D[QWORD_VALUE+4h],22222222h
MOVQ MM0,[QWORD_VALUE] ;give 1st tester values to MM0
MOVQ MM2,MM0 ;copying to MM2
MOV D[QWORD_VALUE],0406080Ah
MOV D[QWORD_VALUE+4h],44444444h
MOVQ MM1,[QWORD_VALUE] ;give 2nd tester values to MM1
MOVQ XMM3,MM1 ;copying to XMM3
PADDQ MM0,MM1 ;packed quadword add
MOVQ MM0,MM2
PSUBQ MM0,MM1 ;packed quadword subtract
MOVQ MM0,MM2
PMULUDQ MM0,MM1 ;multiply packed unsigned dword integers
RET