In these tests I concentrate on the 80-bit ("extended precision") format of floating point numbers. This is the format that is actually used inside the FPU itself, and the contents of the FP stack appear in this format in GoBug's FP pane. Even if loaded in a different format, the FPU converts all numbers to this format for use by the FPU. And if a number is unloaded in any other format the FPU has to convert it first.
Visualising the floating binary point
The exponent bias
Larger exponents
Rounding test
Precision test
Precision flag test
Approximations to pi
Exponent/mantissa tests
    Visualising the floating binary point
    The exponent bias
    Larger exponents
Rounding/precision tests
    Rounding test
    Precision test
    Precision flag test
Binary coded decimal numbers
FP stack operations
Convert to ascii using FPREM1
Convert to ascii using FRNDINT
Approximations to pi
Set a breakpoint to APPROXIMATIONS_TO_PI and run the test, then single-step down the code. Widen the floating point register pane a little to see better the precision which is obtained. See that despite declaring pi as a real number to 20 decimal places in DECLARED_PI the actual number in the floating point registers is accurate only to 18 decimal places (using GoAsm). This is because although the declaration is made in an 80-bit number using "DT", of these 80 bits 1 bit is used for the sign, 15 for the exponent and only 64 bits for the mantissa, which achieves therefore a limit of about 17 or 18 bytes of precision. The value of pi as kept inside the assembler itself (using "DT PI" in GoAsm) is also accurate to 18 decimal places. In this case this value is inserted into the mantissa directly at compile time rather than being calculated. Then we see how well the mnemonic FLDPI does. This provides a value kept within the processor itself and it ought to be accurate to 18 decimal places as well although this might vary from processor to processor. Compare also the result of 22 divided by 7 produced at compile time by the assembler and at run-time using the FPU.
GoBug itself can provide a pretty accurate representation of the real number actually inside the FPU's stack. The actual accuracy of the conversion to real numbers depends on the value of the exponent and is as follows:-
Negative exponents or zero - 80 digits of precision (computations are to 82 bytes then rounded).
Exponents of 1 to 64 - no lost data. Real value shown is an accurate representation of the 80-bit number.
Exponents of 65 and above - 80 digits of precision (computations are to 83/84 digits then rounded).
Here is the code used in this test:-
;******* this is in the data section ..
DECLARED_PI DT 3.14159265358979323846E+0
DIRECT_PI DT 4000C90FDAA22168C235h
ASSEMBLEROWNPI DT PI
SEVEN DW 7 ;value of seven
TWENTYTWO DW 22D
TWORDRESULT DT 0.0
;******* this is in the code section ..
APPROXIMATIONS_TO_PI:
FINIT
FLDCW W[PREC64_CHOP] ;initialise to precision=64 bits, no rounding
FLD T[DECLARED_PI] ;3.14159265358979323846.0E+0 produced by assembler
FLD T[DIRECT_PI] ;inserted directly into the exponent and mantissa
FLD T[ASSEMBLEROWNPI] ;assembler's own PI
FLDPI ;load fpu pi
FILD W[TWENTYTWO]
FILD W[SEVEN]
FDIV ;get pi
FSTP T[TWORDRESULT]
FSTP T[TWORDRESULT]
FSTP T[TWORDRESULT]
FSTP T[TWORDRESULT]
FSTP T[TWORDRESULT]
RET
Exponent/mantissa test
For all these tests set a breakpoint to EXPONENT_MANTISSA_TEST and run the test. Open a symbol inspector to watch the results appear in the symbol TWORD_RESULT (a 10 byte value). There are three tests in all, which run one after the other.
  Visualising the floating binary point
  The exponent bias
  Larger exponents
These tests show how the real number is made from the exponent and mantissa, and you can see from these tests where the name "floating point" comes from. I find the easiest way to envisage the mantissa is to regard it as a 64-bit binary number with a "point" somewhere along the 64 bits or somewhere to the left of the 64 bits for smaller numbers, and somewhere to the right of the 64 bits for the larger numbers. For example, take the first number in the test, the real number 1.25E-1, which is really the number 0.125. When this is put into the FPU, the exponent is the hex number -2 and the mantissa is the 64 bit number 8000000000000000. Therefore the most significant bit of the mantissa is set, or binary 1. If you consider that the exponent of -2 puts the point two bits to the left of the mantissa then this makes the true mantissa .001 binary where the point is a binary point. Now you can regard this as equivalent to an eighth (because 1.000 binary, ie 1000 binary, is eight decimal). An eighth is 0.125 in decimal.
In the same way for the subsequent tests, .01 binary is the same as a quarter or 0.25 decimal, .101 binary is the same as five eighths or 0.625 decimal.
TWORDTEST_J makes an exponent of 1 and sets the two most significant bits of the mantissa, so the point must be regarded as intruding into the mantissa by 1 bit, making the mantissa 1.1 binary. Here the value to the left of the binary point is 1 decimal and to the right of the binary point is a half, making a final value of 1.5 decimal. In the same way in TWORDTEST_K the mantissa becomes 11.101000 which is 3 decimal and 0.625 decimal, to make a value of 3.625.
;******* this is in the data section to declare the real numbers to be put in the FPU
TWORDTEST_G DT 1.25E-1
TWORDTEST_H DT 0.25E0
TWORDTEST_I DT 6.25E-1
TWORDTEST_J DT 1.5E0
TWORDTEST_K DT 3.625E0
;******* this is in the code section ..
EXPONENT_MANTISSA_TEST:
FINIT
FLDCW W[PREC64_CHOP] ;initialise to precision=64 bits, no rounding
FLD T[TWORDTEST_G] ;produces exponent of -2, mantissa of 8000000000000000
FLD T[TWORDTEST_H] ;produces exponent of -1, mantissa of 8000000000000000
FLD T[TWORDTEST_I] ;produces exponent of 0, mantissa of A000000000000000
FLD T[TWORDTEST_J] ;produces exponent of 1, mantissa of C000000000000000
FLD T[TWORDTEST_K] ;produces exponent of 2, mantissa of E800000000000000
FSTP T[TWORDRESULT]
FSTP T[TWORDRESULT]
FSTP T[TWORDRESULT]
FSTP T[TWORDRESULT]
FSTP T[TWORDRESULT]
Open a data inspector to look at the symbols starting from TWORDTEST_G using the GoBug menu item "inspect, exe/dll data symbols". When looking at TWORDTEST_G in memory you need to allow for back-storage. You can easily see the mantissa value 8000000000000000. But the exponent is 3FFCh not -2h as expected. The reason: in 80-bit real numbers the exponent is biased by +3FFEh. This permits exponents of between -3FEEh and +4001h to be handled without using the most significant bit of the 80-bit number (they become 0 to 7FFFh). Instead this bit is used to indicate whether the real number itself is positive or negative.
Here we see how larger exponents are handled. Some known values are put into the floating point registers and we can view the result from the floating point register pane. The first value, 2 to the power of 40, produces a mantissa of 8000000000000000 and an exponent of 29 hex. In other words, the binary point has now moved 41 bits (that is, 29h bits) to the left of the mantissa to make the number.
The second value is the same as the first, but the value is now negative. When this value appears later in TWORDRESULT see how the most significant bit of the 80-bit number is used to signify the negative value of the real number.
The next four values are show even smaller and even larger numbers right up to the possible extremes. Note that the mantissa is only ever 64 bits, which limits the accuracy of the number which can be held. But using an exponent enables the the number either to have a large number of noughts after the accurate digits, or a decimal point and a large number of noughts before the accurate digits.
Here is the code:-
;********* this is in the data section
TWORDTEST_L DT 1099511627776.0E0 ;2**40
TWORDTEST_M DT -1099511627776.0E0 ;2**40
TWORDTEST_N DT 1.25E102 ;exp=154h
TWORDTEST_O DT 1.25E-102 ;exp=-152h
TWORDTEST_P DT 6.25E4931 ;exp=4000h
TWORDTEST_Q DT 5.0E-4932 ;exp=3FFDh
;********* this is in the code section
FLD T[TWORDTEST_L] ;2**40
FLD T[TWORDTEST_M] ;2**40
FLD T[TWORDTEST_N] ;exp=154h
FLD T[TWORDTEST_O] ;exp=-152h
FLD T[TWORDTEST_P] ;exp=4000h
FLD T[TWORDTEST_Q] ;exp=3FFDh
FSTP T[TWORDRESULT]
FSTP T[TWORDRESULT]
FSTP T[TWORDRESULT]
FSTP T[TWORDRESULT]
FSTP T[TWORDRESULT]
FSTP T[TWORDRESULT]
Rounding/precision tests
These tests demonstrate the use of the rounding and precision flags which can be set for the FPU. These flags are kept in the control word. Bits 8 and 9 are the precision flags and since there are two bits they can have a value of 0, 1, 2 or 3, although 1 is not used. Bits 10 and 11 are the rounding flags and they can have the same values.
The meaning of the values are:-
Precision 0=24 bits, 2=53 bits, and 3=64 bits.
Rounding 0=round to nearest or even, 1=round down, 2=round up and 3=chop.
The flags cause the FPU to adjust the result of calculations if necessary. As you may have seen earlier, inevitably it is impossible to represent some decimal numbers accurately in base 2. Over a large number of calculations, such inaccuracies can mount up. However, inside the FPU there are some additional bits used in calculations. These are called the "guard bits" and they are used to try to make the result more accurate. They do not appear in the result in the mantissa, nor can the contents of these guard bits be popped from the FP stack into memory. However, the rounding flags will affect how the guard bits are used.
You can use the instruction FLDCW (load control word) together with a memory value to initialise the rounding and precision flags. Here are examples of some memory values which are useful to set these flags:-
PREC64_NEAR DW 033Fh ;precision=64 bits, rounding=nearest
PREC64_DOWN DW 073Fh ;precision=64 bits, rounding=down
PREC64_UP DW 0B3Fh ;precision=64 bits, rounding=up
PREC64_CHOP DW 0F3Fh ;precision=64 bits, rounding=chop
PREC53_NEAR DW 023Fh ;precision=53 bits, rounding=nearest
PREC24_NEAR DW 003Fh ;precision=24 bits, rounding=nearest
The instruction FINIT should always be used when first using the FPU. This sets the precision flags to 64 bits, rounding nearest which is correct for most calculations. However, in critical calculations you should always set the flags yourself to make sure they are correctly set.
If the FPU rounds the result of a calculation the "P" (precision) exception flag is set. Note that once set this flag (as is the case with all the flags) remains set until cleared. It is not cleared automatically on the next calculation. The flags are cleared using the instruction FCLEX. Alternatively in GoBug you can click on the "=0" button to clear all the exception flags.
Set the breakpoint to ROUNDING_PRECISION_TEST and run the test. Widen the FP pane to see more digits in the "value" column, and single-step down the code watching the "P" exception flag carefully.
There are three tests in all, which run one after the other. In each test the values of the symbols TEN, THREE and SIX are 10, 3 and 6 respectively.
  Rounding test
  Precision test
  Precision flag test
This tests divide 10 by 3 using rounding near, then down, then up, then chop. We know that the result of this ought to be 3.33 recurring. The FPU achieves an accuracy of 18 decimal places in every case. Note that when rounding up the mantissa is one higher than in the other cases. This demonstrates the use of the guard bits. The FPU must have more information which we are not seeing in the mantissa to make this difference. You can imagine that the guard bits contain the closest possible equivalent to 0.33 recurring. Only when rounding up can this value increment the value of the mantissa.
This is demonstrated further by the second test which divides 10 by 6. Only round down is different from the others.
ROUNDING_PRECISION_TEST:
FINIT ;initialise and clear all exceptions
FLDCW W[PREC64_NEAR] ;initialise to precision=64 bits, rounding=nearest
FILD W[TEN] ;load ten
FIDIV W[THREE] ;divide by 3 and leave on stack
FCLEX ;clear exceptions
FLDCW W[PREC64_DOWN] ;initialise to precision=64 bits, rounding=down
FILD W[TEN] ;load ten
FIDIV W[THREE] ;divide by 3 and leave on stack
FCLEX ;clear exceptions
FLDCW W[PREC64_UP] ;initialise to precision=64 bits, rounding=up
FILD W[TEN] ;load ten
FIDIV W[THREE] ;divide by 3 and leave on stack
FCLEX ;clear exceptions
FLDCW W[PREC64_CHOP] ;initialise to precision=64 bits, rounding=chop
FILD W[TEN] ;load ten
FIDIV W[THREE] ;divide by 3 and leave on stack
;********************************
FINIT ;initialise and clear all exceptions
FLDCW W[PREC64_NEAR] ;initialise to precision=64 bits, rounding=nearest
FILD W[TEN] ;load ten
FIDIV W[SIX] ;divide by 6 and leave on stack
FCLEX ;clear exceptions
FLDCW W[PREC64_DOWN] ;initialise to precision=64 bits, rounding=down
FILD W[TEN] ;load ten
FIDIV W[SIX] ;divide by 6 and leave on stack
FCLEX ;clear exceptions
FLDCW W[PREC64_UP] ;initialise to precision=64 bits, rounding=up
FILD W[TEN] ;load ten
FIDIV W[SIX] ;divide by 6 and leave on stack
FCLEX ;clear exceptions
FLDCW W[PREC64_CHOP] ;initialise to precision=64 bits, rounding=chop
FILD W[TEN] ;load ten
FIDIV W[SIX] ;divide by 6 and leave on stack
These tests show the effect of precision set to 64, 53 or 24 bits. First there is divide by 3, then divide by 6. See how the result is accurate to 18-19 decimal places using 64 bits, to 15 decimal places using 53 bits, and to 6-7 decimal places using 24 bits.
Here is the code:-
FINIT ;initialise and clear all exceptions
FLDCW W[PREC64_NEAR] ;initialise to precision=64 bits, rounding=nearest
FILD W[TEN] ;load ten
FIDIV W[THREE] ;divide by 3 and leave on stack
FLDCW W[PREC53_NEAR] ;initialise to precision=53 bits, rounding=near
FILD W[TEN] ;load ten
FIDIV W[THREE] ;divide by 3 and leave on stack
FLDCW W[PREC24_NEAR] ;initialise to precision=24 bits, rounding=near
FILD W[TEN] ;load ten
FIDIV W[THREE] ;divide by 3 and leave on stack
FLDCW W[PREC64_NEAR] ;initialise to precision=64 bits, rounding=nearest
FILD W[TEN] ;load ten
FIDIV W[SIX] ;divide by 6 and leave on stack
FLDCW W[PREC53_NEAR] ;initialise to precision=53 bits, rounding=near
FILD W[TEN] ;load ten
FIDIV W[SIX] ;divide by 6 and leave on stack
FLDCW W[PREC24_NEAR] ;initialise to precision=24 bits, rounding=near
FILD W[TEN] ;load ten
FIDIV W[SIX] ;divide by 6 and leave on stack
In the first test here a value of just over a half is loaded into the FPU and repeatedly multiplied by 10 until the precision flag stops showing a loss of precision. Only after several of such multiplications is a value reached which can be multiplied by 10 without loss of precision in the FPU. In this test the value of just more than half is produced by this declaration in the data section:-
HALFPLUSABIT DT 3FFE8000000000000001h
This is an exponent of 0 (taking into account the bias of 3FFEh and a mantissa of 8000000000000001h).
The second test loads the value 2**72 into the FPU then repeatedly divides by 10 until the precision flag is not set. The value of 2**72 is created by declaring this exponent and mantissa in the data section:-
TWOBY72 DT 40478000000000000000h
The code is as follows:-
;************** testing multiply by 10 precision flag
FINIT ;initialise and clear all exceptions
FLDCW W[PREC64_NEAR] ;initialise to precision=64 bits, rounding=nearest
MOV ESI,ADDR TWORDRESULT
FLD T[HALFPLUSABIT]
L340:
FCLEX ;clear exceptions
FIMUL W[TEN] ;stack top x 10
FSTSW AX ;get status word
AND AL,20h ;keep only precision flag to view by debugger
TEST AL,20h ;see if loss of precision
JNZ L340 ;yes, so continue
FSTP T[ESI] ;finish
;************** testing divide by 10 precision flag
MOV ESI,ADDR TWORDRESULT
FLD T[TWOBY72]
L360:
FCLEX ;clear exceptions
FIDIV W[TEN] ;stack top/10
FSTSW AX ;get status word
AND AL,20h ;keep only precision flag to view by debugger
TEST AL,20h ;see if loss of precision
JNZ L360 ;yes, so continue
FSTP T[ESI] ;finish
RET
Binary coded decimal numbers
These tests demonstrate how the FPU can be used together with BCD techniques to display a number as an ascii string in decimal. In each case a real number is loaded into the FPU and then taken out of the FPU using the instruction FBSTP which puts the number into memory as a packed 10 byte BCD number, in other words a BCD number of 20 digits. This is done in the function BCD34. Each nibble is then removed, converted to ascii and written to the BUFFER in turn.
The third value to be loaded cannot be converted to BCD using FBSTP. The failure is indicated by the "I" flag (invalid operation). This is because the number is beyond the limit of the FBSTP instruction. In order to use BCD to display such
large numbers you would need to reduce it first by using divide or one or more FSCALE instructions.
See also BCD coding for more information about BCD numbers and how to use the instructions AAA and DAA.
For the test, set the breakpoint to BCD_NUMBERS_TEST and run the test, then single step down the code. F5 to trace into the function BCD34. After edi and esi have been given the address of BUFFER and TWORDRESULT respectively open a data inspector on each (use GoBug's menu item "inspect, data by register").
;**** this is in the data section ..
TWOBY55 DT 40368000000000000000h ;2**55, exp=38h
TWOBY59 DT 403A8000000000000000h ;2**59, exp=3Ch
TWOBY60 DT 403B8000000000000000h ;2**60, exp=3Dh
;**** this is in the code section ..
BCD_NUMBERS_TEST:
FINIT ;initialise and clear all exceptions
FLDCW W[PREC64_NEAR] ;initialise to precision=64 bits, round=NEAR
FLD T[TWOBY55]
CALL BCD34
FLD T[TWOBY59]
CALL BCD34
FLD T[TWOBY60]
CALL BCD34
RET
;
BCD34:
MOV EDI,ADDR BUFFER
MOV ESI,ADDR TWORDRESULT
FBSTP T[ESI]
MOV ECX,10D
ADD ESI,9D ;get to end of buffer
L2:
MOV AL,[ESI]
DEC ESI
MOV AH,AL
SHR AL,4 ;get first number
ADD AL,'0'
STOSB ;put into buffer
AND AH,0Fh ;get second number
MOV AL,AH
ADD AL,'0'
STOSB ;put into buffer
LOOP L2
RET
FP stack operations
These tests demonstrate some instructions which manipulate the FP stack directly, the condition codes and also show how the stack pointer works.
Set the breakpoint to FP_STACKOPS_TEST, run the test and single step down the code.
Two numbers are loaded then FXCH swaps these two numbers at the top of the stack.
The next instruction is FXAM. This causes the "condition codes" to give information about the stack top value. The condition codes are 4 bits contained in the FPU and GoBug shows them in the order C3,C2,C1,C0. They have the following meaning (NaN=not a number, + and - means positive and negative respectively):-
+unnormal 0000
+NaN 0001
-unnormal 0010
-NaN 0011
+normal 0100
+infinity 0101
-normal 0110
-infinity 0111
+0.0 1000
empty 1001
-0.0 1010
empty 1011
+denormal 1100
empty 1101
-denormal 1110
empty 1111
Note that you should only test C0 and C3 when testing for empty (look for 1001).
The next instruction is FCHS which changes the sign of the number at the top of the stack.
Now when continuing through this test, you will need to watch the contents of the stack carefully, paying particular attention to the stack pointer (shows as "SP= "). The stack pointer is a 3-bit number kept in the FPU. In normal use, any instruction which loads data onto the FP stack will decrement the stack pointer by 1 and any instruction which unloads data from the stack will decrement it by 1. So at any one time the stack pointer shows how many of the registers are in use. If SP=0 then there are either none in use or all 8 in use. The stack pointer also indicates which of the FP actual registers in the processor is currently at the stack "top". The stack top is where the next instruction to the FPU will operate, ie. ST0.
This demonstration proves that ST0 to ST7 are just virtual values. Suppose there are two numbers in the FP stack. If you load another number, the existing data is not physically moved at all. Instead the new number is loaded into the next free register and the stack pointer changes to suit.
When considering the value of SP, it is easier to regard it as showing which physical FP register is at the stack top after the last instruction. When the FPU is initialised, SP is 0. After the first load SP is 7. Consider this as meaning that if you now unloaded the value which was just loaded, it would come from physical register number 7 (that is, the top physical register where 7 is at the top and 0 is at the bottom). After loading 8 numbers, SP will be 0 again. If you now unloaded the value which was just loaded, it would come from the very bottom physical register. Of course, GoBug always shows the register which is at the stack top on its first line. This is shown as ST0 irrespective of which physical register actually holds the number. This is because in normal operation all transactions with the FPU are carried out using the stack top.
In these tests FINCSTP and FDECSTP are used to increment and decrement the stack pointer respectively. In normal use you would not want to do this at all, but using these instructions helps to demonstrate what the stack pointer means.
You are about to carry out the very first FINCSTP. Consider what is to happen. SP is now 6 which means that the value now showing as ST0 is the value contained in the physical register just below the top physical register. Increasing SP to 7 should rotate the display so that the value in the top physical register shows as ST0 instead. What's in the top physical register? It would be the value first loaded, which currently shows in ST1. After the FINCSTP the next instruction, FXAM proves that the number at ST0 is valid. Now things have got confused because the number which was at ST0 is now showing at ST7 as you would expect but the value shows as "empty". The number at ST1 which should be nothing is some spurious value. The reason for this is that the processor keeps its own record of the validity of the stack contents. This is kept in the "Tag word". This is not actually shown in GoBug's FP display but this is tested to show the correct thing under "value". This shows "empty" if the number is not really there. Note how even the Tag word is hoodwinked into believing that some numbers are valid after a few FCHS instructions.
Often the FP registers appear to contain numbers but they may not be valid at all, holding only remnants of previous data there. You may well see this even after the FPU is initialised using FINIT, when SP=0.
The FXAM instructions also prove that a number which appears to be in a register may not actually be a valid entry at all.
Finally there is a little light relief. Too many numbers are loaded onto the stack. Watch the "I" flag (invalid operation). Note that the stack circulates as you would expect because the stack pointer circulates, but that the number which is loaded last is bad (as reported by FXAM). Note that the FPU doesn't recover from this, the Tag word then showing the top of the stack as empty. Then too many numbers are unloaded from the stack, the very last one setting the "I"flag again. The moral of the story - count them in and count them out!
Here is the actual code used:-
FP_STACKOPS_TEST:
FINIT ;initialise and clear all exceptions
FLD T[TWORDTEST_1]
FLD D[DWORDTEST] ;load 22.625E+0
FXCH ;swap these two
FXAM ;examine top of the stack
FCHS ;change sign of stack top
FINCSTP ;increment stack pointer
FXAM ;examine top of the stack
FINCSTP ;increment stack pointer
FXAM ;examine top of the stack
FINCSTP ;increment stack pointer
FXAM ;examine top of the stack
FDECSTP ;decrement stack pointer
FDECSTP ;decrement stack pointer
FDECSTP ;decrement stack pointer
FXAM ;examine top of the stack
;*** load too many numbers
FLD T[TWORDTEST_1]
FLD T[TWORDTEST_1]
FLD T[TWORDTEST_1]
FLD T[TWORDTEST_1]
FLD T[TWORDTEST_1]
FLD T[TWORDTEST_1]
FLD T[TWORDTEST_1] ;error
FCLEX ;clear all exceptions
;*** unload too many numbers
FSTP Q[QWORDRESULT]
FSTP Q[QWORDRESULT]
FSTP Q[QWORDRESULT]
FSTP Q[QWORDRESULT]
FSTP Q[QWORDRESULT]
FSTP Q[QWORDRESULT]
FSTP Q[QWORDRESULT]
FSTP Q[QWORDRESULT]
FSTP Q[QWORDRESULT] ;error
FCLEX ;clear all exceptions
RET
Convert to ascii using FPREM1
This test demonstrates how FPREM1 can be used to convert a real number into ascii for display on the screen. There is a full description of the code in the FPU v CPU help file. There is a slight difference in the source code in that the number 1.23456789012345678E18 is loaded for this test to make it more interesting and the result is actually loaded into the buffer.
Set the breakpoint to FPREM1_TEST and run the test. Widen the FP pane and single-step down the code, making a data inspector when edi contains the address of BUFFER (use the menu item "inspect, data by register"). The number appears in the buffer in reverse order.
Convert to ascii using FRNDINT
This test demonstrates how FRNDINT can be used to convert a real number into ascii for display on the screen. The code is similar to that in the FPU v CPU speeds file. There is a slight difference in the source code in that the number 1.23456789012345678E18 is loaded for this test to make it more interesting and the result is actually loaded into the buffer.
Note that it is essential to set the rounding to "chop" when using FRNDINT in this way. This stops it rounding up when converting the floating point number to an integer. See rounding/precision tests to see the correct value to use.
Set the breakpoint to FRNDINT_TEST and run the test. Widen the FP pane and single-step down the code, making a data inspector when edi contains the address of BUFFER (use the menu item "inspect, data by register"). The number appears in the buffer in reverse order.
FINIT ;initialise the fpu
FLDCW W[PREC64_CHOP] ;initialise to precision=64 bits, no rounding
MOV EDI,OFFSET BUFFER
MOV ECX,20D ;20 digits only
MOV ESI,OFFSET DWORDRESULT
FLD T[TWORDSPEEDTEST_2] ;load 64 bit number (in 80 bit real format) (exponent=8)
L30: ;number is always at the top of the stack
FLD ST0 ;PUSH copy of number at top of stack
FILD W[TEN] ;0=10 and PUSH: number now pushed to 1 & 2
FDIV ST2,ST0 ;2=2/0 divided value now at 2, orig no. at 1
FLD ST2 ;make copy at stack top of divided value
FRNDINT ;round stack top up to an integer
FMUL ;0*1 (div value*10) leaving result on stack top
FSUB ;subtract mul result from div res result in stack top
FISTP D[ESI] ;convert remainder to integer and POP
MOV AL,[ESI]
OR AL,AL ;see if negative (can happen due to rounding in FRNDINT)
JNS >L34 ;no
XOR AL,AL ;yes so make it zero
L34:
ADD AL,'0'
STOSB ;write the number as ascii
DEC ECX
JZ >L36
FICOM W[ONE] ;compare stack top with 1
FSTSW AX ;get status word
TEST AH,45h ;check C0,C2,C3 condition code - see if number <1
JZ L30 ;no, need to continue
L36:
FFREE ST0 ;free last stack element
FINCSTP ;and increment stack pointer