Testbug's speed tests

Comparing ASM and other functions

Writing hex
   leading zeroes allowed
   leading zeroes suppressed

These tests compare various methods of writing a 32 bit number to the screen in hex. This can be done much more quickly than writing the number in decimal since no division is needed. The first series of tests permit leading zeroes, which is the sort of write you would need if the output was required for a table of values. The number to be written is 75EC9310h. The second series of tests suppress the leading zeroes, which is the sort of output you would want if the result was to be written within a string of text. For these tests Testbug gives you a choice of value to pass to the procedures since they drop out early if the number is small.
In each case the procedure is carried out 5,000 times. The time output in tick counts is measured using QueryPerformanceCounter.

Writing hex - leading zeroes allowed
This first test using HEXWRITE proves that it may be a mistake to convert 16-bit assembler code to 32-bits without thinking about quicker ways to achieve the same end. This is what happened here and the result is a slow procedure - enough to give an assembler programmer sleepless nights!

HEXWRITE:           ;write hex number from eax into [edi]
MOV EDX,EAX         ;keep number in edx
MOV ECX,EAX         ;keep number in ecx too
SHR EDX,16          ;look only at first 2 bytes
CALL HX13           ;write first 4 hex digits held in dx
MOV EDX,ECX         ;restore number into edx
CALL HX13           ;write second 4 hex digits held in dx
RET
;
HX13:               ;called by hexwrite
MOV AL,DH           ;get first byte in al
SHR AL,4            ;get 1st nibble in al
CALL HX12           ;write to [edi]
MOV AL,DH           ;get first byte again
AND AL,0FH          ;only look at 2nd nibble
CALL HX12           ;write to [edi]
MOV AL,DL           ;get second byte in al
SHR AL,4            ;get 1st nibble in al
CALL HX12           ;write to [edi]
MOV AL,DL           ;get second byte again
AND AL,0FH          ;only look at 2nd nibble
CALL HX12           ;write to [edi]
RET
;
HX12:               ;called by hx13
ADD AL,48           ;convert to ascii char
CMP AL,57
JNA >L1
ADD AL,7            ;write hex letter if necessary A=65
L1:
MOV [EDI],AL        ;write the nibble
INC EDI             ;ready for next
RET
This second procedure is the fastest on my machine. It was written by Wayne J. Radburn. It relies on 256 numbers already in memory in a table. The procedure simply picks the right one to suit the number to be written. First you need to declare this table in the data section (all these numbers appear the right way round for GoAsm: if you are using any other assembler except nasm, then they should be reversed!):-
sHEXw   DW "00","01","02","03","04","05","06","07","08","09","0A","0B","0C","0D","0E","0F"
        DW "10","11","12","13","14","15","16","17","18","19","1A","1B","1C","1D","1E","1F"
        DW "20","21","22","23","24","25","26","27","28","29","2A","2B","2C","2D","2E","2F"
        DW "30","31","32","33","34","35","36","37","38","39","3A","3B","3C","3D","3E","3F"
        DW "40","41","42","43","44","45","46","47","48","49","4A","4B","4C","4D","4E","4F"
        DW "50","51","52","53","54","55","56","57","58","59","5A","5B","5C","5D","5E","5F"
        DW "60","61","62","63","64","65","66","67","68","69","6A","6B","6C","6D","6E","6F"
        DW "70","71","72","73","74","75","76","77","78","79","7A","7B","7C","7D","7E","7F"
        DW "80","81","82","83","84","85","86","87","88","89","8A","8B","8C","8D","8E","8F"
        DW "90","91","92","93","94","95","96","97","98","99","9A","9B","9C","9D","9E","9F"
        DW "A0","A1","A2","A3","A4","A5","A6","A7","A8","A9","AA","AB","AC","AD","AE","AF"
        DW "B0","B1","B2","B3","B4","B5","B6","B7","B8","B9","BA","BB","BC","BD","BE","BF"
        DW "C0","C1","C2","C3","C4","C5","C6","C7","C8","C9","CA","CB","CC","CD","CE","CF"
        DW "D0","D1","D2","D3","D4","D5","D6","D7","D8","D9","DA","DB","DC","DD","DE","DF"
        DW "E0","E1","E2","E3","E4","E5","E6","E7","E8","E9","EA","EB","EC","ED","EE","EF"
        DW "F0","F1","F2","F3","F4","F5","F6","F7","F8","F9","FA","FB","FC","FD","FE","FF"
Now here is the procedure (I've kept Wayne's notation out of deference to him):-
 
D2sHEXw:                ;[esi]=value [edi]=string
xor     eax,eax
mov     al,[esi]
;
mov     dx,[sHEXw+eax*2]
mov     al,[esi+1h]     ;may appear out of place since I optimized the use of eax.
mov     [edi+6h],dx     ;swapping these two instructions would result in a 1 clock penalty.
;
mov     dx,[sHEXw+eax*2]
mov     al,[esi+2h]
mov     [edi+4h],dx
;
mov     dx,[sHEXw+eax*2]
mov     al,[esi+3h]
mov     [edi+2h],dx
;
mov     dx,[sHEXw+eax*2]
mov     [edi],dx
;
ret
Neat?

Here is another of Wayne's procedures. This one is not as fast as D2sHEXw but it uses a lot less space.
Again you need to declare a small table in the data section:-
sHEXb DB "0123456789ABCDEF"
And here is the procedure itself:-

D2sHEXb:                ;eax=value (allow for reverse storage) [edi]=string
add     edi,7h          ;point to end of string and translate
mov     ecx,8h          ;the eight characters right to left
;
L100:
mov     ebx,eax
shr     eax,4h
and     ebx,0Fh
mov     dl,[sHEXb+ebx]
mov     [edi],dl
dec     edi
loop    L100
ret
Here is variation on Wayne's D2sHEXw, but which uses the same table of values. This seems to be about as fast as D2sHEXw, but uses a little less code.
REG2HEXw:               ;eax=value [edi]=string
XOR ECX,ECX
MOV CL,AL
MOV DX,[sHEXw+ECX*2]
MOV [EDI+6h],DX
MOV CL,AH
MOV DX,[sHEXw+ECX*2]
MOV [EDI+4h],DX
SHR EAX,16D
MOV CL,AL
MOV DX,[sHEXw+ECX*2]
MOV [EDI+2h],DX
MOV CL,AH
MOV DX,[sHEXw+ECX*2]
MOV [EDI],DX
RET
Here is another method which is small in code and reasonably fast. It uses the rotate instruction ROL:-
HEXROTATE1:             ;write hex number from eax into [edi]
MOV ECX,8
L20:
ROL EAX,4               ;get high order nibble into al
MOV DL,AL
AND DL,0Fh              ;use only least sig nibble
ADD DL,48               ;convert to ascii char
CMP DL,57
JNA >L21
ADD DL,7                ;write hex letter if necessary A=65
L21:
MOV [EDI],DL            ;write the nibble
INC EDI                 ;ready for next
LOOP L20
RET
Here is another method which uses the small table sHEXb and the rotate instruction ROL to produce nice compact code, although not as fast as the others.
HEXROTATE2:             ;write hex number from eax into [edi]
MOV ECX,8
L20:
ROL EAX,4               ;get high order nibble into al
MOV DL,AL
AND EDX,0Fh             ;use only least sig nibble
MOV DL,[sHEXb+EDX]
MOV [EDI],DL            ;write the nibble
INC EDI                 ;ready for next
LOOP L20
RET
This procedure gets rid of the loop in HEXROTATE2 and therefore ought to be faster. However this may not be the case on modern processors.
HEXROTATE3:             ;write hex number from eax into es:edi
ROL EAX,4               ;get high order nibble into al
MOV DL,AL
AND EDX,0Fh             ;use only least sig nibble
MOV DL,[sHEXb+EDX]
MOV [EDI],DL            ;write the nibble
INC EDI                 ;ready for next
ROL EAX,4               ;get high order nibble into al
MOV DL,AL
AND EDX,0Fh             ;use only least sig nibble
MOV DL,[sHEXb+EDX]
MOV [EDI],DL            ;write the nibble
INC EDI                 ;ready for next
ROL EAX,4               ;get high order nibble into al
MOV DL,AL
AND EDX,0Fh             ;use only least sig nibble
MOV DL,[sHEXb+EDX]
MOV [EDI],DL            ;write the nibble
INC EDI                 ;ready for next
ROL EAX,4               ;get high order nibble into al
MOV DL,AL
AND EDX,0Fh             ;use only least sig nibble
MOV DL,[sHEXb+EDX]
MOV [EDI],DL            ;write the nibble
INC EDI                 ;ready for next
ROL EAX,4               ;get high order nibble into al
MOV DL,AL
AND EDX,0Fh             ;use only least sig nibble
MOV DL,[sHEXb+EDX]
MOV [EDI],DL            ;write the nibble
INC EDI                 ;ready for next
ROL EAX,4               ;get high order nibble into al
MOV DL,AL
AND EDX,0Fh             ;use only least sig nibble
MOV DL,[sHEXb+EDX]
MOV [EDI],DL            ;write the nibble
INC EDI                 ;ready for next
ROL EAX,4               ;get high order nibble into al
MOV DL,AL
AND EDX,0Fh             ;use only least sig nibble
MOV DL,[sHEXb+EDX]
MOV [EDI],DL            ;write the nibble
INC EDI                 ;ready for next
ROL EAX,4               ;get high order nibble into al
MOV DL,AL
AND EDX,0Fh             ;use only least sig nibble
MOV DL,[sHEXb+EDX]
MOV [EDI],DL            ;write the nibble
INC EDI                 ;ready for next
RET
And finally lets compare all these assembler routines against the offering from the Win32 API. Unlike when writing decimal the API cannot compete at all. It does require a control string to be declared as follows:-
HCONTROL_STRING DB '%lX',0  

MOV EDI,ADDR BUFFER PUSH EAX ;send number to convert as argument PUSH ADDR HCONTROL_STRING ;control input PUSH EDI ;address of output string CALL wsprintfA ADD ESP,12D ;this is unusual but required

Writing hex - leading zeroes suppressed
This first procedure is an extension of REG2HEXw which suppresses leading zeroes and which uses the large table of values:-

NZHEXROTATE:            ;eax=value [edi]=string
XOR ECX,ECX
ROL EAX,8               ;get high order byte into al
MOV CL,AL
JECXZ >L40              ;no action if both digits are zero
MOV DX,[sHEXw+ECX*2]
TEST CL,0F0h            ;see if 1st digit is zero
JNZ >L58                ;no, so treat normally
MOV [EDI],DH            ;only write 2nd digit
INC EDI
JMP >L60
L40:
ROL EAX,8               ;get high order byte into al
MOV CL,AL
JECXZ >L44              ;no action if both digits are zero
MOV DX,[sHEXw+ECX*2]
TEST CL,0F0h            ;see if 1st digit is zero
JNZ >L62                ;no, so treat normally
MOV [EDI],DH            ;only write 2nd digit
INC EDI
JMP >L64
L44:
ROL EAX,8               ;get high order byte into al
MOV CL,AL
JECXZ >L48              ;no action if both digits are zero
MOV DX,[sHEXw+ECX*2]
TEST CL,0F0h            ;see if 1st digit is zero
JNZ >L66                ;no, so treat normally
MOV [EDI],DH            ;only write 2nd digit
INC EDI
JMP >L68
L48:
ROL EAX,8               ;get high order byte into al
MOV CL,AL
JECXZ >L50              ;if both digits are zero, write zero
MOV DX,[sHEXw+ECX*2]
TEST CL,0F0h            ;see only 1st digit is zero
JNZ >L70                ;no, so treat normally
MOV [EDI],DH            ;only write 2nd digit
INC EDI
RET
L50:
MOV B[EDI],'0'
INC EDI
RET
L58:
MOV [EDI],DX
ADD EDI,2
L60:
ROL EAX,8               ;get high order byte into al
MOV CL,AL
MOV DX,[sHEXw+ECX*2]
L62:
MOV [EDI],DX
ADD EDI,2
L64:
ROL EAX,8               ;get high order byte into al
MOV CL,AL
MOV DX,[sHEXw+ECX*2]
L66:
MOV [EDI],DX
ADD EDI,2
L68:
ROL EAX,8               ;get high order byte into al
MOV CL,AL
MOV DX,[sHEXw+ECX*2]
L70:
MOV [EDI],DX
ADD EDI,2
RET
This procedure is an extension of HEXROTATE3 which suppresses leading zeroes. Since it uses the small table of values it is reasonably compact in code:-
NZHEXROTATE2:           ;eax=value [edi]=string (no leading zeroes)
MOV ECX,8
L200:
ROL EAX,4               ;get high order nibble into al
MOV DL,AL
AND EDX,0Fh             ;use only least sig nibble
CMP DL,CH               ;see if zero
JNZ >L203               ;no so don't look for leading zeroes any more
LOOP L200               ;yes so ignore it
MOV B[EDI],'0'          ;all zero - write at least one zero
INC EDI
RET
L202:
ROL EAX,4               ;get high order nibble into al
MOV DL,AL
AND EDX,0Fh             ;use only least sig nibble
L203:
MOV DL,[sHEXb+EDX]
MOV [EDI],DL            ;write the nibble
INC EDI                 ;ready for next
LOOP L202
RET