Testbug's speed tests

Comparing basic coding

Data in the code section
I was in a rebellious mood, but got my knuckles wrapped over this one (until the AMD64 came along!). Since we have been using Win32, assembler programmers feel much more relaxed about regarding the whole virtual memory area as available for our program's data and code. However, programs are still divided into data and code - but why should they be? We dutifully declare a data section and put all data in there, and then we declare a code section and put all the code in there. Writers and teachers tell us it is "bad programming practice" to do otherwise. But how do they know? We know that when we look at an executable file with a file viewer or with a debugger, the only difference between the sections appears to be the names. Why are we being such good boys and girls? Why not abolish the difference and just put everything in the code section? This test shows why, on processors below the P4 at least. On those processors access to data in the code section is much slower than access to data in the data section. Why? The answer must be that the processor and its cache are optimised to execute reading and writing instructions from the data section rather than the code section. Of course the processor knows which is which from the information fed to its descriptor registers and the segment registers by the operating system. Note the Windows operating system will not permit a "write" to the code section anyway, without a call to the API VirtualProtect to change the section's attributes. This is so even if the code section is marked "writeable" in the executable. On the P4, it seems it makes no difference to speed whether data is held in the data section or in the code section. Now that could transform the way assembler programmers write their code!
This test searches a table and returns when the correct value in the table has been found. The only difference between the "code" and the "data" tests is that in the code tests the table is declared in the code section and in the data tests the table is declared in its rightful place, the data section. The test is carried out 5000 times and the time taken for completion is tested using the api QueryPerformanceCounter. The table has 60 entries. The "early" find stops at the 5th entry, the "middle" find stops at the 30th entry and the "late" find stops at the 56th entry.
I don't give the code here because it is basically the same as the sample code I wrote for a standardised callback procedure in Windows. This code gets rid of all those long chains and conditional jumps in a window procedure. Instead it uses a table of procedures and message values. When the correct message value in the table is found the correct procedure is called to carry out the required action.

Note on AMD64

The AMD64 makes use of RIP-Relative addressing. RIP is the new 64-bit instruction pointer (it replaces EIP). The AMD64 processor can read data at an address relative to the RIP at any one time using an 32-bit offset value. Up to now relative addressing was only used for jumps to a new instruction place. For this to work, it probably makes no difference to the processor whether the data is in executable or non-executable areas and the processor may in fact be blind to this. Using RIP-relative addressing can shorten code considerably (no need for large displacement values to be included in the instruction) so development tools should endeavour to use it as much as possible.
 See my standardised callback sample code