"Use of mnemonics" demonstrations
SIMD floating point control

This is a demonstration of using the SSE and SSE2 (streaming SIMD extensions) control instructions STMXCSR (store the MXCSR control dword), LDMXCSR (load the MXCSR control dword), and using FXSAVE to see if the processor supports the Denormals are Zeroes ("DAZ") flag.
This is not intended to be a review of the SIMD control features. For that see the Intel documentation, in particular Chapter 11 of Volume 1 of the IA-32 Intel Architecture Software Developer's Manual available from the Intel web site.
The demonstration is best understood by single stepping through the code with a debugger, having set the breakpoint to SIMD_FPCONTROL and then using Testbug's menu to run the demo.
The first part of the demonstration checks whether the SIMD instructions are supported on the processor and then checks with the user that the program is under debugger control. Click "yes" to proceed.
In your debugger you should have the SSE and SSE2 panes open to see the effect of the control on the display.
Firstly the rounding is changed from the default - "near" to "down" then "up" then "chop" and finally back to the default.
Then the flush-to-zero flag is set and cleared.
Then a divide by zero exception is caused, but like the FPU this only sets a flag if the corresponding mask is set. This means that it is up to the application to check the flags for errors (if they matter). If this is necessary then since the flags are "sticky" (they remain set until cleared) your application will need to clear the exception flags before the instruction to be checked. To do this you need to use the STMXCSR and LDMXCSR instructions as in the next fragment of code. Unfortunately there is no easy way like FCLEX in the FPU.
Then we see what happens if an exception occurs while its corresponding mask has been cleared. In fact this permits a true exception to occur - make sure you jump over this with the debugger to proceed.
Whilst single-step debugging with GoBug you can clear the exception flags by using the "zero exception flags" button (or by pressing the space bar when the pane has the focus). You can try that now, but the demonstration restores the mask and causes another exception for you to try this again.
Finally we try to set the DAZ flag (denormals are zeroes). This flag is used to permit the processor to pass over an instruction if an operand is a denormal floating point number (invalid). Apparently this is useful for streaming media processing. We have to be more careful here because DAZ was only supported in the later P4 processor and in the Xeon processor. If you try to set it on a processor which does not support it, the program will crash. The code here carries out the procedure recommended by Intel to check for DAZ support.

Set the breakpoint to SIMD_FPCONTROL:-

DATA
;
ALIGN 16
FXSAVE_BUFFER DB 512 DUP 0   ;this buffer must start on 16 byte boundary
CURRENT_MXCSR DD 0      ;place to keep the current value of MXCSR
ZERO_FPVALUE DD 0.0     ;a zero in single-precision floating point format
;
CODE
;
SIMD_FPCONTROL:
MOV EAX,1                ;request CPU feature flags
CPUID                    ;0Fh, 0A2h CPUID instruction
TEST EDX,6000000h        ;test bit 25 (SSE) or bit 26 (SSE2)
JNZ >L32                 ;SSE/SSE2 available
CALL NOSIMDCONTROL1      ;message box appears saying cannot do test
L31:
RET
L32:
CALL QUERY_USER          ;exception will occur
JNZ L31                  ;cancel by user
;***** display XMM registers in SSE or SSE2 mode ..
;****************** first lets manipulate the rounding control ..
MOV ESI,ADDR CURRENT_MXCSR      ;ESI holds CURRENT_MXCSR throughout
;****************** first lets manipulate the rounding control ..
STMXCSR [ESI]           ;store the MXCSR into memory
MOV EAX,[ESI]           ;put into EAX
AND AH,9Fh              ;clear existing rounding bits (bits 13/14 of eax)
OR AH,20h               ;set rounding down
MOV [ESI],EAX           ;put back into memory
LDMXCSR [ESI]           ;and put that into processor
;***********
STMXCSR [ESI]           ;store the MXCSR into memory
MOV EAX,[ESI]           ;put into EAX
AND AH,9Fh              ;clear existing rounding bits (bits 13/14 of eax)
OR AH,40h               ;set rounding up
MOV [ESI],EAX           ;put back into memory
LDMXCSR [ESI]           ;and put that into processor
;***********
STMXCSR [ESI]           ;store the MXCSR into memory
MOV EAX,[ESI]           ;put into EAX
OR AH,60h               ;set rounding chop (truncate)
MOV [ESI],EAX           ;put back into memory
LDMXCSR [ESI]           ;and put that into processor
;***********
STMXCSR [ESI]           ;store the MXCSR into memory
MOV EAX,[ESI]           ;put into EAX
AND AH,9Fh              ;clear existing rounding bits - back to default
MOV [ESI],EAX           ;put back into memory
LDMXCSR [ESI]           ;and put that into processor
;****************** now lets change flush to zero control ..
STMXCSR [ESI]           ;store the MXCSR into memory
MOV EAX,[ESI]           ;put into EAX
OR AH,80h               ;set flush to zero
MOV [ESI],EAX           ;put back into memory
LDMXCSR [ESI]           ;and put that into processor
;***********
STMXCSR [ESI]           ;store the MXCSR into memory
MOV EAX,[ESI]           ;put into EAX
AND AH,7Fh              ;clear flush to zero
MOV [ESI],EAX           ;put back into memory
LDMXCSR [ESI]           ;and put that into processor
;****************** lets generate a divide by zero exception
MOVUPS XMM0,[ZERO_FPVALUE]    ;move zero floating point value into XMM0
MOVUPS XMM1,XMM0        ;copying to XMM2
DIVSS  XMM0,XMM1        ;divide lowest fp value result in XMM0
;****************** now lets clear the exception flags ..
STMXCSR [ESI]           ;store the MXCSR into memory
MOV EAX,[ESI]           ;put into EAX
AND AL,80h              ;clear exception flags
MOV [ESI],EAX           ;put back into memory
LDMXCSR [ESI]           ;and put that into processor
;****************** now lets change some of the masks ..
STMXCSR [ESI]           ;store the MXCSR into memory
MOV EAX,[ESI]           ;put into EAX
AND AH,0EDh             ;clear Precision and Divide by Zero masks
AND AL,7Fh              ;and the Invalid Operation mask
MOV [ESI],EAX           ;put back into memory
LDMXCSR [ESI]           ;and put that into processor
;****************** and lets generate a divide by zero exception again
;
;  jump over the exception using the debugger ..
;
MOVUPS XMM0,[ZERO_FPVALUE]    ;move zero floating point value into XMM0
MOVUPS XMM1,XMM0        ;copying to XMM2
DIVSS  XMM0,XMM1        ;divide lowest fp value result in XMM0
;******************
;****************** now lets set all exception masks ..
STMXCSR [ESI]           ;store the MXCSR into memory
MOV EAX,[ESI]           ;put into EAX
OR AH,1Fh               ;set all masks again
OR AL,80h               ;no forgetting the Invalid Operation mask
MOV [ESI],EAX           ;put back into memory
LDMXCSR [ESI]           ;and put that into processor
;******************
;
;  this time clear the exception using GoBug's 
;  "zero exception flags" button (or press the space bar)
;
MOVUPS XMM0,[ZERO_FPVALUE]    ;move zero floating point value into XMM0
MOVUPS XMM1,XMM0        ;copying to XMM2
DIVSS  XMM0,XMM1        ;divide lowest fp value result in XMM0
;******************* and finally the DAZ mask (Denormals are zeroes) ..
;***** check first whether DAZ supported ..
MOV EAX,1               ;request CPU feature flags
CPUID                   ;0Fh, 0A2h CPUID instruction
TEST EDX,1000000h       ;test bit 24 (FXSR available?)
JZ >L33                 ;FXSAVE and FXRSTOR not available - drop out
MOV EDI,ADDR FXSAVE_BUFFER      ;get suitable 512 byte area of memory
XOR EAX,EAX             ;ready to fill it with zeroes
MOV ECX,128             ;128 dwords = 512 bytes
REP STOSD               ;fill area with zeroes (essential)
MOV EDI,ADDR FXSAVE_BUFFER      ;get the 512 byte area of memory again
FXSAVE [EDI]            ;save the FPU/MMX and XMM states
MOV ECX,[EDI+28]        ;get MXCSR_MASK into ecx
JECXZ >L33              ;its zero - DAZ not supported
;*****
STMXCSR [ESI]           ;store the MXCSR into memory
MOV EAX,[ESI]           ;put into EAX
OR AL,40h               ;set the DAZ flag
MOV [ESI],EAX           ;put back into memory
LDMXCSR [ESI]           ;and put that into processor
;*****
STMXCSR [ESI]           ;store the MXCSR into memory
MOV EAX,[ESI]           ;put into EAX
AND AL,0BFh             ;clear the DAZ flag
MOV [ESI],EAX           ;put back into memory
LDMXCSR [ESI]           ;and put that into processor
L33:
RET
;