GoAsm
Assembler Manual

GoLogo A "Go" development tool: http://www.GoDevTool.com

Version 0.61

by Jeremy Gordon - email
with assistance from Wayne Radburn
GoAsm Assembler and Tools forum
(in the MASM Forum)

Go to Alphabetical Index

Introduction

How to use this manual
Why is a new assembler needed?
Versions and updates
Discussion forum
Integrated Development Environments (IDEs)
Legal stuff
Acknowledgements

GoAsm's design

GoAsm's features in a nutshell
Syntax aims and compatibility with other assemblers
Why GoAsm does not type or parameter check
Why GoAsm requires square brackets for writing to, and reading from, memory
Supported mnemonics

Beginners

Make an asm file
Insert some code and data
Assemble the file with GoAsm
Link the object file to make the exe

Basic GoAsm elements

Starting GoAsm
Sections - declaration and use
Data declaration
Code and the starting address
Labels: unique, re-usable and scoped
Short and long code jumps
Accessing labels
Calling (or jumping to) procedures
Calling Windows APIs in 32-bits and 64-bits
Calling Windows APIs using INVOKE
PUSH or ARG pointers to strings and data
Moving pointers to strings and data into registers
Type indicators
Repeat instructions
Using character immediates in code
Numbers and arithmetic
Characters in GoAsm
Operators

Advanced features

Structures - different types and uses
Unions
Definitions: equates, macros and #defines
Importing: using run-time libraries
Importing: data, by ordinal, by specific Dll
Exporting procedures and data
Automated register/flags save and restore using USES...ENDU
Callback stack frames in 32-bits and 64-bits
Automated stack frames using FRAME...ENDF, LOCALS, and USEDATA
Conditional assembly
Include files (#include and INCBIN)
Merging: using static code libraries
Unicode support
64-bit assembly
x86 compatibility mode (32-bit assembly using 64-bit source)
Sections - some advanced use
Adapting existing source files for GoAsm

Miscellaneous

Special push instructions
Segment overrides
Using source information
Using the location counters $ and $$
Alignment and the use of ALIGN
Using SIZEOF
Using branch hints
Syntax to use FPU, MMX, and XMM registers
Reporting assemble time
Other GoAsm interrupts
GoAsm list file
GoAsm's error and warning messages
Using GoAsm with various linkers

Appendices

"Hello World 1" (Windows console program)
"Hello World 2" (Windows GDI program -1)
"Hello World 3" (Windows GDI program -2)
"Hello Dialog" (create a dialog and see various ways of writing to it)

"Hello Unicode 1" (draws Unicode characters to console)
"Hello Unicode 2" (draws Unicode characters in dialog and message box)
"Hello Unicode 3" (draws Unicode characters using TextOutW, and also demonstrates Unicode/ANSI switching)

"Hello 64World 1" (simple 64-bit console program)
"Hello 64World 2" (simple 64-bit windows program)
"Hello 64World 3" (switchable 32-bit or 64-bit windows program)

Writing Unicode programs
Writing 64-bit programs
"Run Time Loading" (demonstrates how to use run-time loading in large application running on both W9x/ME and NT/2000/XP/Vista and using both ANSI and Unicode APIs)
Do nothing Linux program by V.Krishnakumar
The windows character set


Selection of Tutorials
Full list

Quick start to making a simple Windows program
For those new to Programming
For those new to Assembly Language
For those new to Windows
For those new to Symbolic Debugging
Understand bits, binary and bytes
Understand hex numbers
Understand finite, negative, signed and two's complement numbers
Understand registers
Understand the stack - Part 1
Understand flags and conditional jumps
Understand reverse storage
Some programming hints and tips
Standardized window and dialog procedure
Understand the stack - Part 2
FAQ "When I click on the GoAsm or GoLink or GoRC icon something just flashes on the screen but nothing else happens".

Introduction

How to use this manualtop

If you are interested in why I wrote GoAsm, and the legal and licencing stuff, read on.

If you want a quick view in a nutshell of some of GoAsm's features, then click here.

If you are a beginner and want to see how to make a simple Windows program then click here.

If you would like to see some sample GoAsm code then click here to have a look at a simple "Hello World" Windows console program
or here to see a straightforward "Hello World" GDI (with window) program
or there is a "Hello World" GDI program making full use of stack frames, structures, locally scoped labels, INVOKE and definitions (macros).
There is a list of sample Unicode and also 64-bit programs in the list of appendices.

If you want to read about the aims which drove my design for GoAsm then click here.

If you are just interested in how to use GoAsm then click here for GoAsm's basic elements, here for more advanced ones or here to read about miscellaneous matters.

If you are new to assembler -
welcome to the joys of assembler programming! Make fast, compact working programs. Assembler works well with Windows. It is true that assembler is a low level language but the Windows API (Applications Programming Interface) is a very high level language. The two are very compatible, in 64-bit as well as in 32-bit programming. This document will help you to pick up assembler programming. Have a look at the beginner's section, the appendices and the tutorials for some starter articles.

Why is a new assembler needed?top

There are a number of assemblers available, the most popular being Microsoft's MASM, NASM (from a team originally headed by Simon Tatham and Julian Hall), Borland's TASM and Eric Isaacson's A386. For my part, in the context of Windows programming, none of these assemblers can be regarded as perfect. Some have annoying defects. In writing GoAsm I have attempted to produce an assembler which always produces the minimum size code and which has a clear and obvious syntax, reduced source script requirements and extensions to help with programming for Win32 and Win64. This has also given me the opportunity to write a linker GoLink, which is finely tuned to work together with GoAsm.
Others have also tried to escape the mould, most notably René Tournois who wrote the executable-maker Spasm (now called RosAsm) and Tomasz Grysztar with his flat assembler (FASM).

Versions and updatestop

My intention is to keep GoAsm free from known bugs. So I usually work on bug fixes as soon as I discover them (unless I am on holiday). I usually provide a fix to those who report bugs by sending them (or posting) a copy of GoAsm with a version number which has a suffix letter. Relatively minor bugs are usually accumulated in this way and then eventually published as an update. Such updates can be obtained from my web site at www.GoDevTool.com. A serious bug may result in an immediate publication of a GoAsm update. I am also continuing to enhance GoAsm from time to time: this tends to result in a beta version of GoAsm which is available for test. These test versions are often also available from my web site. Only after testing do these beta changes become an official update.

Discussion forumtop

There is a GoAsm Assembler and Tools forum run by Hutch as part of the MASM forum, where you can air your views about the "Go" tools, ask questions of me and other users, and check for updates. The forum also provides an opportunity for me to consult with you about proposed enhancements to GoAsm and the other "Go" tools.

Integrated Development Environments (IDEs) top

IDEs are editors which help you to get the correct programming syntax and then run the development tools to create the output files. A full list of IDEs which are suitable for use with GoAsm is available from here.

Legal stufftop

Copyright

GoAsm is Copyright © Jeremy Gordon 2001-2016 [MrDuck Software] - all rights reserved.

GoAsm - licence and distribution

You may use GoAsm for any purpose including making commercial programs. You may redistribute it freely (but not for payment nor for use with a program or any material for which the user is asked to pay). You are not entitled to hide or deny my copyright.

Disclaimer

I have made every effort to ensure that the output of GoAsm and its accompanying program AdaptAsm is accurate, but you do use them entirely at your own risk. I cannot accept any liability for them failing to work properly or giving the wrong output nor for any errors in this manual.

Acknowledgementstop

I owe particular thanks recently to Wayne J Radburn, of Gatineau, Québec, who has undertaken and successfully implemented enhancement and bug fixes in later versions of GoAsm (0.57 to 0.61). I would also thank Edgar Hansen of Kelowna, British Columbia, Canada ("Donkey") for his continued support and encouragement to Wayne and to myself and to all GoAsm users. The three of us now hold the GoAsm and GoLink source code, and this will help to secure the future of the "Go" project. I also give thanks to those others who have encouraged me to write this program and have offered helpful comments, reports and guidance in particular:-
Leland M. George of West Virginia, Daniel Fazekas of Budapest, Greg Heller of the Congo ("Bushpilot"), René Tournois of Louisville, Meuse, France ("betov"), Ramon Sala of Barcelona, Spain, Bryant Keller of Cartersville, Georgia, Emmanuel Zacharakis (Manos), and Brian Warburton of Weybridge, UK. Thanks also for the support, suggestions and bug reports from grv, Jeff Aguilon, Jonne Ahner, Thomas Hartinger, Martyn Joyce, Kazó Csaba, Dmitry Ilyin, Patrick Ruiz, and from all contributors to the GoAsm and Tools Forum, and other forums which I have not already mentioned.


GoAsm's design

GoAsm's features in a nutshelltop


Syntax aims and compatibility with other assemblerstop

The syntax acceptable to the assembler is of central importance to any assembler programmer. It varies between assemblers. GoAsm does not create 16 bit code and it works only in "flat" mode (no segments). Because of this, its syntax is very simple. I have chosen what I regard as the best syntax with the main aim of clarity and consistency. You may disagree with my opinion on this. If so I would be interested to hear your views.
When writing GoAsm I toyed with the idea of making it wholly compatible with the syntax used by other assemblers, but this was simply not possible because of the variations between them some of which would produce inconsistency. I also decided against making GoAsm fully compatible with any one assembler.
You will recognise syntax from other assemblers. Where possible I have tried to keep close to what I regard as the best syntax from each assembler in general use. You will also recognise some syntax used in "C" programming. I have mainly followed the "C" "preprocessor" syntax since it seems pointless to do otherwise. This also makes GoAsm's use of preprocessor commands compatible with my resource compiler GoRC.

Why GoAsm does not type or parameter checktop

After some thought I decided that GoAsm ought not to type or parameter check. This, I believe, reduces the size of the source script enormously, and adds to its flexibility and readability. I have concluded that even basic type checking in assembler programming for Windows is not at all essential, and is more trouble than it is worth.
Let me explain.
In type checking the Assembler must check that references to areas of memory are made with the correct size and type of data to suit what the area of memory is intended to be used for. This is achieved through a two-stage process. Firstly when the area of memory is declared the programmer must allocate it a certain "type". Then when the area of memory is used, the programmer has again to state the type of memory intended to be used. If there is a mismatch the assembler or compiler will show an error.

Some assemblers, like NASM, do no type checking at all. Others, like A386, do only basic type checking based on the byte, word, dword, qword and tword types. MASM and TASM, like "C", allow you to specify your own types using TYPEDEF and then type-check based on those specified types.
Parameter checking checks that the correct number of parameters are passed to an API and also type checks the parameters. Most assemblers do not parameter check but MASM permits parameter checking when the INVOKE pseudo mnemonic is used.
The overheads required to achieve full type and parameter checking like a "C" compiler are enormous. Just look through a Windows header file and see the long lists of various types allocated to various structures and to the parameters of APIs. Then look at the efforts of the programmer which are required in the source script to ensure that no error is thrown up by the assembler or compiler.

I decided to follow the NASM example and not even offer basic type checking as A386 provides. I have used A386 over many years and have enjoyed its clean syntax, but I have only found its basic type checking a hindrance when programming for Windows. This is because there are often occasions when you want to write to data, or read from data using a different size of data than used to declare the data in the first place.

As for parameter checking, again I have not even tried to offer this since in my view it unnecessarily complicates things. It again requires enormous lists of APIs and parameters to be provided to the assembler or compiler so that it can check that these match what you are giving the API. Miss one and your program does not compile. Take the example of

PUSH 40h,EDX,EAX,[hwnd]
CALL MessageBoxA
Here is a call to an API which takes 4 parameters. Now it is said that you would like the assembler to tell you if you send the wrong number of parameters. But you don't need this warning. Your program would simply crash if you sent the wrong number of parameters, and you are going to test this call aren't you? Yes! There is no hidden, latent, fault here which will not be noticed at testing stage. Then, it is said that you need type checking in case you send the wrong data size to an API as a parameter. I just can't see this. All parameters to APIs are dwords (with one or two exceptions out of thousands). So you won't be sending the wrong size of data to an API.
I agree that it may be possible to send the wrong type of data to an API. For example you might send a constant when it ought to be a handle. Or you might send the contents of a memory address when it ought to be a pointer to a memory address. However, the API simply will not work if there is an error - again there is no hidden, latent, fault here which will not be noticed at testing stage.

Abolishing parameter and type checking not only frees the assembler from a great deal of work, making it faster in operation, but it also frees the programmer from the headache of manipulating header and include files. It provides greater fluidity in memory addressing, since errors will not be thrown up if you want to use data of a size which does not match the size of the data declaration. So in GoAsm even if lParam has been declared as a dword,

MOV [lParam],AL
is still allowed. And if LOGFONT is a simple structure of dwords, GoAsm is quite happy for example with
MOV B[LOGFONT+14h],1
which you might want to use to set a font to italic.

By not type and parameter checking I have been able to abolish EXTRN. GoAsm does not need to know the type of symbols which are declared outside the source file (ie. to be found during linking). I hope you will agree this relieves you from a lot of hard work and anguish in having to add those EXTRNs in your larger programs.
The corollary to the abolition of type and parameter checking is that you must tell GoAsm the size of the data to be worked on, if this is not obvious.
So, for example,
MOV [MemThing],23h is an error. To load 23h as a byte into MemThing you need to code MOV B[MemThing],23h. This is because GoAsm will not know at assembly time whether the 23h should be loaded as a byte, word, or dword, all of which are permitted by the MOV instruction.

In some ways the requirement for a type indicator (when the type is not obvious) is helpful. This is because you can see from the instruction itself how much memory is affected by the instruction. You don't have to look up a particular data declaration to see its type in order to see what the instruction will do. So, for example:-

MOV B[MemByte],23h  ;comforting to see this is limited to a byte operation
FLD Q[NUMBER]       ;useful to know real number loaded with double precision
INC B[COUNT]        ;essential to know this can count only up to 256
Another advantage arising from no parameter checking is that there is no need for GoAsm to decorate the names of calls to other modules or to imports. When using GoLink this is a considerable advantage since there is no need for LIB files at the linking stage. But it does mean that GoAsm object files will differ from object files made using a "C" compiler or with MASM because those files will contain decorated symbols while the GoAsm ones will not. Since Version 0.26.10 GoLink has been able to accept object files from both sets of tools and link them with GoAsm object files (just use the GoLink /mix switch - see the GoLink help file).

Why GoAsm requires square brackets for writing to, and reading from, memory top

Assembler programmers have debated for a long time whether for consistency all memory addressing should be done using square brackets. The argument is that since you must use square brackets when the address is contained in a register for example MOV EAX,[EBX], then you should also use square brackets when the address is pointed to by a label for example MOV EAX,[lParam]. I have followed the debate with interest. MASM and A386 made it optional, so that these two instructions did exactly the same thing:-
MOV EAX,lParam
MOV EAX,[lParam]
However, A386 differentiated between labels with and without colons so that the above was only true if lParam was declared as follows
lParam DD 0
but not if it was declared as:-
lParam: DD 0
In that case MOV EAX,lParam in A386 would act the same was as MOV EAX,OFFSET lParam. Very confusing!
NASM took the plunge by making it a requirement for any memory addressing to be in square brackets. However it still allowed:-
MOV EAX,lParam
In NASM this is the same as MOV EAX,OFFSET lParam for other assemblers.
So when looking at assembler code, without knowing the syntax of the assembler concerned, you can never be really sure what MOV EAX,lParam does. The same instruction can do two entirely different things depending on which assembler is used.
Borland TASM when switched to "Ideal" mode outlawed MOV EAX,lParam altogether, and only allowed
MOV EAX,[lParam]
or
MOV EAX,OFFSET lParam
I tend to agree with this approach. The main aim here is to ensure that coding is unambiguous.
For this reason I have decided that GoAsm must be strict about this question. This avoids all ambiguity. Therefore in GoAsm
MOV EBX,wParam
is completely outlawed, unless wParam is a defined word. In order to get the offset in GoAsm you must use
MOV EBX,ADDR wParam
or if you prefer
MOV EBX,OFFSET wParam
which means the same thing
In order to address memory in GoAsm you must use
MOV EBX,[wParam]

Supported mnemonicstop

What is a "mnemonic"?

A mnemonic is an instruction in word form which you use in your assembler source script. GoAsm assembles the mnemonic instructions and converts them into the opcodes which the processor executes. This is sometimes called machine code. The mnemonics themselves are recommended by the processor manufacturers. They are intended to convey in short form as precisely as possible what the instruction does. Although there are now over 550 mnemonics, in fact in everday use an assembler programmer only uses about 20 or 30 of these on a regular basis. See for those new to Assembler for a list of the most commonly used mnemonics.
For the sake of transportability of source scripts and consistency in the light of possible updates, all assemblers normally recognise all mnemonics at the level at which they assemble. But the processor knows nothing about the mnemonics and only works in machine code itself. Non-assembler programmers never use mnemonics. A compiler working solely in "C" for example still produces machine code, yet it does not work with mnemonics as such (unless switched to in-line assembly mode).

Which mnemonics are supported by GoAsm?

GoAsm supports all the mnemonics at its level of assembly including the x87 floating point instructions, MMX, 3DNow! (with extensions), SSE, SSE2, SSE3, SSSE3, and SSE4 instructions, as well as AES, ADX, and a few other miscellaneous newer instructions. GoAsm supports the CMP pseudo instructions which may be used with the XMM registers.
GoAsm does not support some mnemonics because they are used solely for 16-bit programming. These are IBTS, IRETW, JCXZ, RETF, and XBTS.
GoAsm does not support the string mnemonics which require additional operands, and where there is an easier mnemonic to use. These are:-
CMPS - use CMPSB or CMPSD
INS  - use INSB or INSD
LODS - use LODSB or LODSD
MOVS - use MOVSB or MOVSD
OUTS - use OUTSB or OUTSD
SCAS - use SCASB or SCASD
STOS - use STOSB or STOSD
XLAT - use XLATB

Beginnerstop

1. Make an asm file

The asm file is a file which you make and edit using an ordinary text editor, such as Paws which you can download from my web site, www.GoDevTool.com, or a program like Notepad or Wordpad which comes with Windows. If you use Notepad or Wordpad you should make sure you save the file in a format which adds no control or formatting characters, other than the usual end of line characters (carriage return and line-feed). This is because GoAsm only looks for plain text. You can achieve this by saving the file as a "text" document. If you don't use an extension for the file (the extension is the characters after the "dot") then the editor may give the file a ".txt" extension but you can change this by renaming the file (you can rename the file by right-clicking on the name using Windows Explorer or My Computer).
It may be that you cannot see the extension on your computer, because it may be set that way. To see the extensions of your files from Windows Explorer, choose the menu item "View", "Folder options", then the "View" tab and ensure that the "Hide file extensions for known file types" is not checked. The procedure may differ slightly in different versions of Windows.
It is traditional amongst programmers to give their source scripts an extension which matches the language in which the source code is written. For example you might have an assembler file called "myprog.asm". Similarly you will usually find source code written in the "C" language with the extension ".c" or ".cpp" (for "C++"), ".pas" for pascal and so on. However, there is no magic in these extensions. GoAsm will accept files of any extension or files which do not have an extension.
The .asm file contains your instructions to the processor in words and numbers. These are executed by the processor when the program is run. It is said therefore, that the .asm file contains your "source code" or your "source script".

2. Insert some code and data

As an example let's look at the code and data in a simple 32-bit Windows program which writes "Hello World" to the MS-DOS (command prompt) window (the "console"). This is what you would put into your asm file:-

DATA SECTION
;
KEEP DD 0               ;temporary place to keep things
;
CODE SECTION
;
START:
PUSH -11                ;STD_OUTPUT_HANDLE
CALL GetStdHandle       ;get, in eax, handle to active screen buffer
PUSH 0,ADDR KEEP        ;KEEP receives output from API
PUSH 24,'Hello World (from GoAsm)'    ;24=length of string
PUSH EAX                ;handle to active screen buffer
CALL WriteFile
XOR EAX,EAX             ;return eax=0 as preferred by Windows
RET
Note that anything after a semi-colon is ignored, so you can insert comments. See operators for other comment forms. See provide good comments and descriptions for the importance of comments.
The first line in this file opens the data section. See sections - declaration and use for the importance of sections and how to use them.
Then we declare a data area of 4 bytes (DD means a "dword" or "doubleword" which is 4 bytes), identify it with a label "KEEP" and initialise it to zero. See data declaration for an explanation of "data" and how to declare it.
Then we open the code section and provide the label "START" to tell the processor where to start executing the instructions. See code and the starting address for an explanation of "code" and the starting address.
The next instruction "PUSH -11" puts minus 11 decimal on the stack ready for the call to the Windows API GetStdHandle on the next line. See understand the stack for an explanation of the stack and the PUSH instruction. See understand finite, negative, signed and two's complement numbers for an explanation what is meant by "-11 decimal". See for those new to Windows for an explanation of an API. The instruction "CALL" transfers execution to the API and on return from the API, execution continues on the next line. See transferring execution to a procedure.
Then there are five more PUSHes onto the stack. Note that some of these are repeated PUSH instructions separated by commas. See repeat instructions to see how this works. These PUSHes get ready for the call to the API WriteFile. In order, they are PUSH 0 (zero); then the address of KEEP (see accessing labels; then the number 24 decimal which is the length of the string (ie. the words in quotes), then a pointer to the string (see PUSH or ARG pointers to strings; then the register EAX (see understanding registers).
Then the value of zero is put in the register EAX using the instruction XOR EAX,EAX. This does the same as MOV EAX,0 but produces less code. See some programming hints and tips for similar tips.
Finally RET finishes the program by returning to the caller (in this case Windows itself). See an explanation of this.

3. Assemble the file with GoAsm

Having put in the code and data to your file you are ready to make your program. This is done in two steps. First you need to assemble your file and then you need to link it. In order to do this you need to open an MS-DOS (command prompt) window. See how to do this. In this case you use the command line:-

GoAsm /fo HelloWorld.obj filename
where filename is the name of your asm file. See starting GoAsm for how to use the command line for GoAsm.
GoAsm makes an "object" file containing your code and data; this file has the extension ".obj" and is in a format suitable for the linker. See more information about the object file.

4. Link the object file to make the exe

The final step is to "link" your program to make the final executable. You can use GoAsm's companion program GoLink to do this. This is what you need in the command line:-
GoLink /console helloworld.obj kernel32.dll
(add "-debug coff" if you want to watch the program in the debugger).

Note that the GetStdHandle and WriteFile calls are to kernel32.dll which is why the name of that Dll appears in the GoLink command line. See for more information about Dlls. See using GoAsm with various linkers for more information about using GoLink and other linkers if you prefer. See the GoLink help file for other GoLink options.

GoLink creates the file HelloWorld.exe. You can then run this program from the MS-DOS (command prompt) window. Type in HelloWorld and press enter. You will see the string you sent to WriteFile is written in the console.

So let's recap by looking back at the lines in your source script.
See that first you asked Windows for a handle to the console window. This was returned by the API GetStdHandle and held in the EAX register. This handle and the string to write were passed to WriteFile. In other words you told Windows to write the specified string to the console. Information about exactly how to use the APIs and the parameters which need to be passed to them is available from Microsoft from the MSDN site (look for the "Platform SDK"). Finally see suggestions how to organise your programming work.


Basic GoAsm elementstop

Starting GoAsm

The command line syntax is:-

GoAsm [command line switches] filename[.ext]

Where,

filename is the name of the source file

Command-line Switches
/b beep on error
/c always put output file in current directory
/d define a word (eg. /d WINVER=0x400)
/e empty output file allowed
/fo specify output path/file eg. /fo asm\myprog.obj
/gl retain leading underscore in external "C" library calls
/h or /? help
/l create listing output file
/ms decorate for mslinker
/ne no error messages
/ni no information messages
/nw no warning messages
/no no output messages at all
/sh share header files (header files may be opened by other programs during assembly)
/x64=assemble for AMD64 or EM64T
/x86=assemble 64-bit source in 32-bit compatibility mode

If no extension is given for the inputfile, GoAsm looks for the file without any extension. If that file is not found than GoAsm looks for the file with an assumed .asm extension.
If no path is given for the input file it is assumed to be in the current directory.
If no filename is given for the output file an object file with the same name as the inputfile is created. For example MyAsm.asm will create a file called MyAsm.obj.

The directory which receives the output file is as follows:-

  • the path given if /fo is specified, or if it is not specified:
  • the current directory if /c specified, or if it is not specified:
  • the path given for the input file, or if no path given:
  • the current directory
    If no extension is given for the output file, .obj is created by default. The listing file is given the same name as the output file with the extension .lst and is created in the same directory as the output file.

    Sections - declaration and usetop

    Why sections are needed

    You must declare a section before you can start coding. The reason for this is that the processor needs to know the attributes of the instructions it is being given. Also the Windows system relies on the attributes to identify parts of your code. Some common attributes are read-only (cannot be written to), read-write (can be written to) and execute (code instructions). Internally the processor deals with the instruction in the most appropriate and speedy manner to suit the attribute. For example, code instructions use the processor code cache, non-code material is regarded as data and may be kept in the data cache.
    When you declare a section in your source script, GoAsm automatically sets the attribute of the section. Once this is done you can start to write the code or data in your program.

    How to declare a section

    In Windows programming we are only interested in four section types, code, data, const, and uninitialised data. You declare code, data or const sections as follows:-

    CODE SECTION
    DATA SECTION
    CONST SECTION or CONSTANT SECTION

    The words "code", "data", "const" and "constant" are reserved to section declarations and an error will be signalled if these words are used elsewhere in your source.

    GoAsm also allows shortened forms to declare a section as follows:-

    CODE
    DATA
    CONST

    You can also use .CODE, .DATA and .CONST if you wish.

    GoAsm automatically adds the attributes to suit the processor and Windows. A code section is given the attributes read, execute, code. A data section is given the attributes read, write, initialised data. A const section is given the attributes read, initialised data (you won't be able to write to a const section). Uninitialised data has the attributes read, write, uninitialised data.
    Except to add the shared attribute, you can't override these attributes yourself. This is because to do so is pointless in the Windows system which has control over the attributes of the section as loaded and running. For example even if you give a code section the write attribute, Windows will not allow you to write to it. Also Windows will not permit you to execute code in a data section. You can change this behaviour however, by calling the API VirtualProtect at run-time.
    In GoAsm you can use a code section to hold read-only data, although there may be a reduction in performance if you do this.

    Declaring a section also sets certain switches in GoAsm which affect syntax and coding. The rules are as follows:-

    • All labels in a code section must have a colon. The reason for this is to identify to GoAsm what is a label and what is not, ensuring that misspelt mnemonics and directives will always be reported as an error.
    • Re-usable labels are only permitted in a code section. If you use these label types in a data section they will be regarded as unique labels and put in the symbol table.
    • GoAsm will report an error if you have an instruction which would write to a const section. The const section is intended for initialised data and strings which will not be written over.

    Uninitialised data section

    GoAsm will make an uninitialised data section in the object file if you declare uninitialised data. This will be called ".bss" to suit other tools. GoAsm names this for you and you cannot change the name because some linkers expect this name. With most linkers, including GoLink, the .bss section does not find its way into the exe. Instead it is merged with a read/write section in the exe. The attributes of the uninitialised data section are read, write, uninitialised data.
    The advantage in declaring uninitialised data, rather than initialised data, is that the executable will be smaller. This is because only the amount of uninitialised data is specified in the executable, and the data itself is not kept in the executable. There is no need to do so since it has no specific value. See declaring ordinary uninitialised data.

    Switching to and from sections

    It's easy to switch to a section and back again. Just use either
    CODE SECTION or
    DATA SECTION
    CONST SECTION or their shortened forms as appropriate. You can do this as often as you like through your source script. GoAsm and the linker will concatenate all instructions intended for each section.

    See also sections - some advanced use on naming sections, shared sections, section ordering, and section alignment.


    Data declarationtop

    What is "data"?

    In a way all the instructions given to a processor are "data". But assembler programmers use the word to mean information which is either fixed or which can be changed at run-time and which is not actually executed. Data is of four main types:-
    1. Read-only data specified at assemble-time (when the program is compiled) and which is kept in the const section which has a read-only attribute. This is called "initialised" data because its contents are fixed in your source script. At run-time this data can be read but it cannot be written to. In your source script you can give this data labels so that it can be referenced easily.
    2. Data specified at assemble-time and kept in the data section in the executable. Again the contents of the data will be fixed in your source script, but at run-time this data can be read, or written to and read using the data labels.
    3. Data not specified at assemble-time but held in an area which is reserved for it. This is called "uninitialised data" and the amount reserved is recorded in the executable. In your assembler source script you specify how much data to be reserved. You can give it labels but you cannot initialise its contents. The advantage of this type of data is that it takes up no space in the executable. At loading-time the data is established but it has no particular contents at that time. At run-time the data can be read, or written to and read in the same way as data in the data section.
    4. Data established at run-time either by the program itself or by the system. This type of data is not established at assemble-time from your assembler source script. Instead is is established by the operating system when your code is executed.

    Declaring initialised numerical data

    GoAsm follows the traditional assembler syntax for declaring data in your source script.
    In a data or const section a label need not be terminated in a colon. In a code section this is necessary, to help catch syntax errors. Some examples (using data section):-
    HELLO1 DB 0                 ;one byte with label "HELLO1" set to zero
           DB 0                 ;second byte set to zero
    HELLO2 DW 34h               ;two bytes (a word) set to 34h
    HELLO3 DD 12345678h         ;four bytes (a dword) set to 12345678h
    HELLO4 DD 12345678D         ;four bytes (a dword) set to 12345678 decimal
    HELLO5 DD 1.1               ;four bytes (a dword) set to real number 1.1
    HELLO6 DQ 0.0               ;8 bytes (a qword) set to real number 0.0
    HELLO7 DQ 123456789ABCDEFh  ;8 bytes (a qword) set to 123456789ABCDEFh
    HELLO8 DQ 1234567890123456  ;8 bytes (a qword) set to 1234567890123456 decimal
    HELLO9 DT 1.1E0             ;10 bytes (a tword) set to real number 1.1
    HELLOA DT 123456789ABCDEFh  ;10 bytes (a tword) set to 123456789ABCDEFh
    
    Note that DB, DW, DD and DQ accept numbers in both decimal and hex; DD, DQ and DT accept real numbers too.
    See also declaring real numbers, directly loading the exponent and mantissa and loading a file using INCBIN.

    Declaring more data on each line

    A comma after an initialiser means that another initialiser is expected which declares more data, as follows:-
    Label DB 0,0,0,0             ;four bytes set to zero
          DW 33h,44h,55h,66h     ;four initialised words
          DD 33h,44h,55h,66h     ;four initialised dwords
          DD 1.1,2.2             ;two DD real numbers
          DQ 1.1,2.2             ;two DQ real numbers
          DQ 3333h,4444h         ;two DQ hex numbers
          DT 1.1,2.2             ;two DT real numbers
          DT 5555h,6666h         ;two DT hex numbers
    

    Declaring ordinary uninitialised data

    GoAsm follows the traditional assembler syntax here but like A386 does not require the uninitialised section (the ".bss" section) to be declared. Instead, a simple ? ensures that the data is treated as uninitialised. Some examples (within data or const section text):-
    HELLO1 DB ?   ;one byte with label "HELLO1" recorded as uninitialised
    HELLO2 DW ?   ;two bytes (a word)
    HELLO3 DD ?   ;four bytes (a dword)
    HELLO4 DQ ?   ;8 bytes (a qword)
    HELLO5 DT ?   ;10 bytes (a tword)
    
    Orphaned uninitialised data is not allowed: you cannot mix initialised and uninitialised data so this is an error:-
    DATA6    DD 5 DUP 0
             DB ?
             DB 0
    
    However this is ok:-
    DATA6    DD 5 DUP ?        ;5 dwords for the customer
             DB ?              ;a byte to hold the main course
             DB ?              ;and a byte to hold the sauces
    
    This is to allow you to separate areas of uninitialised data so that each separate area can have its own comment
    Uninitialised data cannot be declared until a section has been opened. You can declare uninitialised data within code section text, but the labels must end in colons as usual for the code section, for example:-
    HELLO1: DB ?   ;one byte with label "HELLO1" recorded as unitialised
    HELLO2: DW ?   ;two bytes (a word)
    

    Declaring duplicate data

    GoAsm uses the well established DUP syntax, but does not require any initialiser to be in brackets. Some examples (using data section):-
    HELLO1 DB 2 DUP 0      ;two bytes with label "HELLO1" both set to zero
    HELLO1A DB 800h DUP ?  ;2K buffer not initialised
    HELLO2 DW 2 DUP 0      ;four bytes all set to zero
    HELLO3 DD 2 DUP ?      ;eight bytes in uninitialised section
    HELLO4 DD 2 DUP 1.1    ;real number 1.1 repeated twice as dwords
    HELLO5 DQ 2 DUP 1.1    ;real number 1.1 repeated twice as qwords
    HELLO6 DQ 2 DUP 333h   ;qword repeated twice
    HELLO7 DT 2 DUP 1.1    ;real number 1.1 repeated twice as twords
    HELLO8 DT 2 DUP 444h   ;tword repeated twice
    
    You can use DUP to declare some data and then initialise each data component individually for example:-
    HELLO300 DB 3 DUP <23,24,25>    ;declare three bytes and set them to 23,24,25
    
    which does the same as:-
    HELLO300 DB 23,24,25            ;declare three bytes and set them to 23,24,25
    
    Although it may seem pointless to do this, the syntax does make it easier to initialise a member of a structure if it contains DUP. See initialising structure members which have DUP data declarations.

    Initialising using character values

    You can initialise to character values by putting characters in quotes, for example:-
    Letters DB 'a'
            DW 'xy'
    Sample  DD 'form'
    ZooDay  DQ 'Saturday'
    
    Unless inserting Unicode strings GoAsm carries out no conversion to the character, so that the actual value inserted in the object file will depend on the current character set at the time of assembly.
    GoAsm does not put in memory the word and dword string declarations here using reverse storage. This follows NASM's lead but is different from MASM's handling of such strings. This means that the first byte in the word above is 'x' and the second is 'y'. The dword at Sample is stored as 'f' then 'o' then 'r' then 'm' when viewed as bytes. This then permits you to code
    MOV EDI ADDR BUFFER
    MOV EAX,[Sample]
    STOSD
    
    which inserts into the buffer the string: form.
    Any bytes not initialised are given a value of zero, for example:-
            DW 'a'        ;first byte is a, second is zero
            DD 'ab'       ;'a' then 'b' then two zero bytes
    
    Repeat character value initialisations are allowed, for example:-
    DD 3 DUP "Hi"
    
    This inserts H then i then two zeroes and this is done three times.

    Declaring strings

    Strings may be in single or double quotes. Some examples are:-
    String1 DB 'This is a string'
            DB 'This is a string with "internal" quotes'
    String2 DB "A string in double quotes"
            DB "I enjoyed the string's contents"
    String3 DB '"A string itself in double quotes"'
            DB "'A string itself in single quotes'"
            DB "'A string's own single quotes'"
    String4 DB """A string's own single and double quotes"""
            DB '''A string itself with "internal" quotes'''
    
    In String4 the doubled-up quotation marks are retained as part of the string itself as one quote. This only occurs for the leading and trailing cases as shown (unlike GoRC, which also does this within the string).

    Declaring more than one string per line

    A comma after a string means that another initialiser is expected which may declare more data, or another string as follows:-
    String1 DB 'This is a string with null terminator',0
            DB 'First string',0,'And another string',0
    String2 DB 22h,"A string's own double quotes",22h
    
    The ASCII values you can use here if you wish are 22h for double quotes and 27h for single quotes.

    Longer strings

    For longer strings you can straddle lines using another DB on the next line, for example:-
    LongString1 DB 'His first program looked like it would be a great success '
                DB 'until he ran it for the first time',0
    LongString2 DB 'His fundamental error:',0Dh,0Ah
                DB 'he did not test it as he went along',0
    
    The ASCII values 0Dh and 0Ah are carriage return and line feed respectively, used to start a new line when the string is drawn on the screen.

    Unicode strings

    In Windows programming you sometimes need to declare Unicode strings in the data or const section, for example in a dialog template. There are several ways to do this in GoAsm and they are described in detail in writing Unicode programs. Briefly, you can use the following methods:-
  • Rely on the basic Unicode format of the source script (GoAsm can read Unicode UTF-16 and UTF-8 files).
  • Use the Lquote symbol used in "C" programming for example:-
                DB L'Hello how are you?'
    
  • Declare Unicode sequence using DUS:-
                DUS 'I am a Unicode string with new line and null terminator',0Dh,0Ah,0
    
    See also overriding using the STRINGS directive.

    Inserting blocks of data

    For larger blocks of data than will conveniently fit on one line, you can either use INCBIN to load the contents (or part contents) of a file. Alternatively you can use DATABLOCK_BEGIN and DATABLOCK_END if it is more convenient to hold the block of data in the source file itself.

    The syntax for a DATABLOCK is as follows:-

    MyBlockData DATABLOCK_BEGIN      ;comment
        .
        . data is inserted here
        .
    DATABLOCK_END
    
    Here all the material between DATABLOCK_BEGIN and DATABLOCK_END is inserted in the output file, and you can then address the data using the label MyBlockData.
    GoAsm regards the data as starting just after the end of the line holding DATABLOCK_BEGIN, and as ending at the end of the last line before DATABLOCK_END.
    The data is inserted in its raw state (no conversion takes place). This means that characters which may not be displayed in an ordinary editor such as spaces or tabs will also be loaded. It also means that the format of the data and the characters which can be used in the data are limited only by the editor you are using to write the source code.

    Initialising using addresses of labels

    Often you need to initialise a dword to the address of a label. In other words after this has been assembled and linked the dword will hold a pointer to the label. The label can be either a data or a code label. For example:-
    MS1 DB 'First string to use',0
    MS2 DB 'Second string to use',0
    Strings DD MS1,MS2            ;Strings to hold address of the strings
    
    then to get ready to use the string MS2 instead of coding
    MOV ESI,ADDR MS2
    
    you can code
    MOV ESI,[Strings+4]
    
    Whole tables can be created using this method and addressed by taking advantage of the * index register multiplier (scaling) for example
    MOV ESI,[Strings+EAX*4]
    
    Here eax, which is zero indexed, holds which string to use. When eax is zero the first string will be used, when eax is one the second string and so on if there are more strings.
    Here is an example using code labels:-
    PROCEDURE_TO_CALL DD FIRSTPROC,SECONDPROC
    MOV ESI,ADDR PROCEDURE_TO_CALL     ;get procedures in esi
    MOV ESI,[ESI+EAX*4]                ;get correct procedure
    CALL [ESI]                         ;call the procedure
    

    Code and the starting addresstop

    What is "code"?

    Code is made up of the instructions contained in a "code" section, that is, with the attributes "code" and "execute". You tell the processor which code instructions to execute. The processor takes the instructions byte by byte and executes them. Each byte of executable code is called an "opcode".

    What is the "starting address"?

    This is also known as the "entry point". In an ordinary executable (.exe file) this is where execution starts immediately after loading. In a dynamic linked library (.dll file) it is where some execution takes place during the loading process.

    How is execution controlled?

    Once execution starts in your executable at the starting address your program has control of where execution will continue. In practice you divert execution using the CALL, JMP and conditional jump mnemonics.

    How do you set the starting address?

    From the above you can see that unless your source script is data-only it is essential to provide a starting address for your program. In GoAsm this is achieved very simply by giving the starting address a label and then telling the linker what this is. Different linkers take this instruction in different ways.
    My linker GoLink assumes the starting address to be START unless told otherwise. So, to use this default you would have the following line in your source script where you want execution to commence:-
    START:
    
    This can be upper or lower case or a mixture.

    If you don't want to use START, you can specify the starting address using one of these methods in GoLink's command line or command file:-

    -entry STARTINGADDRESS
    /entry STARTINGADDRESS
    
    If you are using ALINK only the first method works.

    If you are using the MS linker you need to make a slight change to your label. It must be preceded by an underline character. So your label is _START: in your source script. Then you would use one of these instructions to the linker (without the underline character):-

    -ENTRY START
    /ENTRY START
    
    What is happening here is that the MS linker is designed to work with a "C" compiler which will decorate global labels with the underline character. So the linker looks for the label _START, rather than START. Assembler programmers have had to put up with such quirks in Windows tools for many years but now we have our independence!
    See also using GoAsm with various linkers.

    Labels: unique, re-usable and scopedtop

    What is a label?

    A label is a name which you provide to identify a particular place in data or in code. It is like a bookmark. You can refer to that place and access it by using the label. A data label refers to data, and a code label refers to executable code. A symbol is a label which appears in the symbol table of the object file and which can therefore be seen by the debugger if a debug version of the executable is made.

    Unique labels

    A unique label is one which can only be used once in your source script and in linked object files. It is a label with "global" scope, that is to say, at link-time it can be accessed by other object files. Usually you would provide a name which describes the data or code function, for example NAME_LIST or CALCULATE_RESULT. If you have set your linker to provide debug output, all unique labels are put in the symbol list and passed to the debugger. In GoAsm you make a unique label as follows:-
    NAMEOFLABEL:
    
    This does not output any code, but sets a bookmark called NAMEOFLABEL at the point in data or code where it appears. If you are in a data section, the colon is not obligatory, nor is it obligatory if the label gives the name of an automated stack frame. Therefore the following lines all create unique labels:-
    (in data section)
    HELLO  DB 0        ;label HELLO
    BYE:   DB 0        ;label BYE
    MEAGAIN            ;label MEAGAIN
    (in code section)
    RICE:              ;label RICE
    PEAS: FRAME        ;label PEAS
    BEANS FRAME        ;label BEANS
    
    You can see from this that a single word which is not known to GoAsm to be a directive, mnemonic, data declaration, initialisation of data, or a defined word will be regarded as label. GoAsm expects a colon after a code section label. This is because there are numerous words which must be used in a code section and if they are misspelt, it is important that an error is declared rather than the word being misconstrued as a label.

    Re-usable labels

    Sometimes you need to label parts of your source script with names which you have used before. GoAsm provides two levels of such re-usable labels which can be used in a code section:-
  • locally scoped re-usable labels beginning with a period, and
  • unscoped re-usable labels made up of digits or a character+digits

    The scope of a label defines from where it can be accessed using it own unmodified name. Lets look at these two types of re-usable labels in turn.

    Locally scoped re-usable labels

    These types of labels are created using a period followed by the label for example
    .looptop           ;label looptop
    .fin               ;label fin
    
    The boundary of the scope of these labels is defined by the unique code labels in the source script. In other words the label can be jumped to provided there is no unique label in the way. So for example:-
    JZ >.fin
    CALCULATE:
    .fin
    RET
    
    here the jump instruction will not find .fin because the label CALCULATE is a unique code label in the way.

    If you want to jump past a unique code label to a locally scoped re-usable label, you can either use another unique code label as the destination of the jump, or you can use an unscoped re-usable label. Or for advanced use, you can use the locally scoped label within an automated stack frame see re-usable label scope in automated stack frames.

    Locally scoped re-usable labels are sent to the debugger as symbols together with their "owner". Therefore the symbol sent to the debugger in the above example is CALCULATE.fin, and another way to jump past that unique label would be with JZ >CALCULATE.fin.

    Unscoped re-usable labels

    You will often have very insignificant jumps destinations and loops in your code which do not need any name at all. For these you can use a label whose name will not be passed to the debugger as a symbol. This is useful when debugging to limit the symbol table to the most significant names in your code. These labels are made up either of all digits, or one character then one or more digits. You may also use a period as a decimal point which makes it much easier to add new local labels to existing code. The label itself must always end in a colon. Here are examples of unscoped re-usable labels:-
    L1:
    24:
    24.6:
    
    You can even use a single stand-alone colon. You might use this for those extremely insignificant jump destinations in your code.

    Jumping to labels: short and long code jumpstop

    There are several jump instructions. Some will jump if the flags are in a particular state. These are called the "conditional jump instructions". Then there is the JMP instruction which will always jump to the destination irrespective of the flags. Then there is the LOOP instructions and its conditional variants which drop through if ecx=1. The CALL instruction jumps and returns afterwards. All these instructions need a label to jump to.

    The direction indicators

    In order to make your source script more readable, GoAsm uses direction indicators to indicate the direction of the jump. The "back" direction indicator is optional. For example, using locally scoped re-usable labels:-
    JZ >.fin         ;jump forward to .fin
    JMP >.exit       ;jump forward to .exit
    LOOP .looptop    ;loop backwards to .looptop
    LOOP <.looptop   ;loop backwards to .looptop (alternative form)
    
    Here is an example using unscoped labels:-
    JZ >L10          ;jump forward to L10
    JNC L3           ;jump backwards to L3
    JNC <L3          ;jump backwards to L3 (alternative form)
    JMP 100          ;jump backwards to 100
    

    Jumps to unique labels

    These are treated differently, depending on whether the jump is made using a conditional jump mnemonic or not.

    Conditional jumps to unique labels

    You can code conditional jumps to unique labels in the same way as you would for jumps to locally scoped or unscoped labels. In other words use the forward indicator ">" if the jump is to a place later on in the source script. Optionally you may use the backward indicator < to show that the jump is to a place earlier in the source script, or you can omit it.
    Basically GoAsm will not permit you to jump out of a file using a conditional jump. So instead of coding:-
    JZ EXTERNALLABEL
    
    you should code
    JNZ >
    JMP EXTERNALLABEL
    :
    
    This is to help with error checking. GoAsm assumes a conditional jump was meant to be to a place inside the existing source script.

    Unconditional jumps to unique labels

    You may use a direction indicator for these jumps if you wish, but you don't have to.
    If you do use a direction indicator, this will tell GoAsm only to look for the label in the source script: GoAsm will not tell the linker to look for the label in other source scripts.
    If you don't use a direction indicator, GoAsm will find the label if it is in the source script, but if not, it will tell the linker to look for the label in other source scripts.
    For example:-
    JMP LABEL               ;look for label in all source scripts
    JMP <INTERNALLABEL1     ;only look for label earlier in source script
    JMP >INTERNALLABEL2     ;only look for label later in source script
    

    Jumps to single colons

    Single colons are treated as unscoped labels and can be used for your most insignificant jumps, for example:-
    :
    CALL PROCESS
    LOOPZ <
    
    or
    CMP EAX,EDX
    JZ >
    CALL PROCESS
    :
    RET
    

    The importance of long or short jumps

    A short jump is coded using the short relative jump form of instruction which is only 2 bytes. This tells the processor to jump back or ahead in the range +127 bytes or -128 bytes. The actual amount of the jump is contained in the second opcode byte, which is why this type of instruction is limited to this range.
    To jump outside this range a 6 byte instruction is needed. This is called the long relative jump form of instruction.
    Using short jumps not only tightens up your code but also increases speed of execution because the processor has to read and execute fewer bytes in order to carry out the instruction. This will be important for looped instructions which are executed many times in sequence.

    Telling GoAsm to code a long jump

    Use either the LONG operator or << or >>, for example
    JZ >>.fin         ;long forward jump to .fin
    JZ LONG >.fin     ;long forward jump to .fin (alternative form)
    JC <<A1           ;long backward jump to A1
    JC LONG A1        ;long backward jump to A1 (alternative form)
    JC LONG <A1       ;long backward jump to A1 (alternative form)
    
    Note that there is no long form of LOOP and its variations, nor of JECXZ. If you need a long jump for these instructions use this instead:-
    DEC ECX
    JNZ LONG L2          ;long jump replacing LOOP
    OR ECX,ECX           ;test for ecx=0
    JZ LONG >L44         ;long jump replacing JECXZ
    

    Whether long or short jumps are coded

    GoAsm always tries to make the smallest possible code, consistent with being a one-pass assembler. Here are the rules which are followed:- GoAsm will show an error if a short jump is specified but cannot be achieved. This is to ensure that you have not made an error in your source script. For example you might intend to jump only a short distance but have forgotten to add the destination of the jump to your source script.

    Accessing labelstop

    Getting the address of a label

    This must be done using the ADDR or OFFSET operator. In the final executable under Windows this gives the distance to the label from the start of the section, plus the position of the section in virtual memory. In other words, the address of the label in memory when the executable is loaded and running.

    Here are examples using unique labels:-

    MOV ESI,ADDR Process_dabs  ;get in esi the address of the code label Process_dabs
    MOV ESI,ADDR Hello2        ;get in esi the address of the string labelled Hello2
    MOV ESI,ADDR HelloX+10h    ;get in esi the address 16 bytes beyond HelloX
    
    Here is an example using a locally scoped re-usable label:-
    MOV ESI,ADDR CALCULATE.fin ;get in esi the address of the code label .fin in the CALCULATE procedure
    
    Here is an example using a formal structure:-
    MOV ESI,ADDR Lv1.pszText   ;get in esi the address of the psztext member in the formal structure Lv1
    
    For 64-bit code, note that a PUSH, ARG, or MOV to memory of an ADDR or OFFSET for a non-local label (local labels are handled differently) will make use of the R11 register. and take advantage of the shorter RIP-relative addressing of the LEA instruction as follows:-.
    LEA R11,ADDR Non_Local_Label
    PUSH R11
    
    LEA R11,ADDR Non_Local_Label
    MOV [MEMORY64],R11
    
    This will also take place with INVOKE when pushing arguments with ADDR, which also includes use of pointers to a string or raw data (ex. 'Hello' or <'H','i',0>).

    Reading data from the place pointed to by a label

    Reading data from the place pointed to by a label is quite different from getting the address of a label. Here you are reading the data value in the area of memory concerned. This must be done using square brackets. Examples are:-
    MOV ESI,ADDR Hello1     ;get in esi the address of the dword Hello1
    MOV EAX,[ESI]           ;get in eax the value of Hello1
    
    or this which does the same thing:-
    MOV EAX,[Hello1]        ;get in eax the value of Hello1
    

    Writing to the place pointed to by a label

    Here you cause a write to data as follows:-
    MOV ESI,ADDR Hello1     ;get in esi the address of the dword Hello1
    MOV [ESI],EAX           ;write the value in eax to Hello1
    
    or this which does the same thing:-
    MOV [Hello1],EAX        ;write the value in eax to Hello1
    

    Reading and writing to labels using displacement

    Suppose you have simple structure of data declared as follows:-
    PARAM_DATA DD 0     ;+0h
               DD 0     ;+4h
               DD 55h   ;+8h
               DD 0     ;+0Ch
               DD 0     ;10h
    
    Then you can use the label to read from and write to a particular part of the structure using a displacement value as follows:-
    MOV ESI,ADDR PARAM_DATA
    MOV EAX,[ESI+8h]           ;get in eax value of third dword
    MOV [ESI+8h],EDX           ;and insert edx instead
    
    or this which does the same thing:-
    MOV EAX,[PARAM_DATA+8h]    ;get in eax value of third dword
    MOV [PARAM_DATA+8h],EDX    ;and insert edx instead
    
    The displacement value can be any value up to 0FFFFFFFFh. It can be positive or negative. Non-numeric elements must be separated by the plus sign.
    See more about structures.

    Reading and writing to labels using indexation

    Suppose you have 16 dwords of data declared as follows:-
    PARAM_DATA DD 10h DUP 0
    
    Then you could use indexation (scaling) to multiply the index register to suit:-
    MOV ESI,ADDR PARAM_DATA
    MOV EAX,[ESI+ECX*4]        ;get in eax value of ecx dword
    MOV [ESI+ECX*4],EDX        ;and insert edx instead
    
    or this which does the same thing:-
    MOV EAX,[PARAM_DATA+ECX*4]        ;get in eax value of ecx dword
    MOV [PARAM_DATA+ECX*4],EDX        ;and insert edx instead
    
    You can use indexation of 0,2,4 or 8. The following instructions are all valid:-
    MOVZX EAX,B[PARAM_DATA+ECX]       ;get in eax value of ecx byte
    MOVZX EAX,W[PARAM_DATA+ECX*2]     ;get in eax value of ecx word
    MOV Q[PARAM_DATA+ECX*8],EDX       ;insert edx at ecx qword
    
    Non-numeric elements must be separated by the plus sign.
    In 32-bit coding, only the general purpose 32-bit registers can be used as an index register - EAX,EBX,ECX,EDI,EDX,ESI, or EBP. You cannot use ESP as an index register.
    In 64-bit coding, you can use the general purpose 32-bit registers or 32-bit addressing versions of the new registers (R8D to R15D). Also you can use the 64-bit extensions of the general purpose registers - RAX,RBX,RCX,RDI,RDX,RSI, or RBP, and the new 64-bit registers R8 to R15. You cannot use RSP as an index register. Note that the above instructions using PARAM_DATA and indexation do not use RIP-relative addressing, so the Image Base should be well below 7FFFFFFFh.

    Reading and writing to labels using indexation and displacement

    Suppose you have 24 dwords of data declared as follows where the final dword in each case holds the result required:-
    PARAM_DATA DD 19h,0,0,22222h
               DD 1Ah,0,0,44444h
               DD 1Bh,0,0,66666h
               DD 1Ch,0,0,88888h
               DD 1Dh,0,0,0AAAAAh
               DD 1Eh,0,0,0CCCCCh
    
    Then you could use indexation (scaling) and displacement as follows:-
    MOV ESI,ADDR PARAM_DATA
    CMP EAX,[ESI+ECX*4]        ;see if there is eax value at ecx dword
    JNZ >L2                    ;no
    MOV EDX,[ESI+ECX*4+0Ch]    ;yes so get the result in edx
    
    or this which does the same thing:-
    CMP EAX,[PARAM_DATA+ECX*4]        ;see if there is eax value at ecx dword
    JNZ >L2                           ;no
    MOV EDX,[PARAM_DATA+ECX*4+0Ch]    ;yes so get the result in edx
    
    You can use indexation of 0,2,4 or 8. The displacement value can be any value up to 0FFFFFFFFh. In your source script it can be positive or negative. Non-numeric elements must be separated by the plus sign.
    In 32-bit coding, only the general purpose 32-bit registers can be used as an index register - EAX,EBX,ECX,EDI,EDX,ESI, or EBP. You cannot use ESP as an index register.
    In 64-bit coding, you can use the general purpose 32-bit registers or 32-bit addressing versions of the new registers (R8D to R15D). Also you can use the 64-bit extensions of the general purpose registers - RAX,RBX,RCX,RDI,RDX,RSI, or RBP, and the new 64-bit registers R8 to R15. You cannot use RSP as an index register. Note that the above instructions using PARAM_DATA and indexation do not use RIP-relative addressing, so the Image Base should be well below 7FFFFFFFh.

    Calling (or jumping to) procedurestop

    What is a "procedure"?

    A procedure is a series of code instructions with a label to which execution can be transferred. Other names for this are "function", "routine" or "subroutine". Here is an example of a short procedure:-
    PROCESS_HASH:       ;label to the procedure
    XOR EAX,EAX
    MOV EDX,ESI
    CALL PH23
    MOV EDX,866h        ;return from the procedure with edx=866h
    RET
    

    Transferring execution to a procedure

    Usually execution is transferred to the procedure by the use of the CALL instruction. This instruction causes the processor to PUSH onto the stack the position in code just after the CALL instruction and then execution will continue in the procedure being called. At the end of the procedure there will be a RET. This instruction causes the processor to POP from the stack the position in code immediately after the CALL and then execution will continue from that point.
    Unusually execution can be transferred to the procedure by the use of the JMP instruction. At the end of the procedure there could also be another JMP instruction, as in this example:-
    PROCESS_HASH:
    XOR EAX,EAX
    MOV EDX,ESI
    CALL PH23           ;transfer execution to the PH23 procedure and return
    MOV EDX,866h        ;return from the procedure with edx=866h
    JMP >SOMEWHERE_ELSE
    ;
    START:              ;start place for execution
    JMP PROCESS_HASH
    ;
    

    CALL and JMP to procedure syntax

    The usual way to call or jmp to a procedure is to use its code label, for example:-
    CALL PROCESS_HASH
    JMP PROCESS_HASH
    
    Sometimes the address of the procedure to go to is held in memory pointed to by a label or a register or even held at a known place in memory in which case you can use for example:-
    CALL [PROCADDRESS]
    CALL [PROCTABLE+20h]
    CALL [ESI]
    CALL [ESI+EDX]
    JMP [4000000h]
    
    Sometimes the address of the procedure to go to is held in a register in which case you can use for example:-
    CALL EAX
    JMP EDI
    

    More complex syntax

    Hopefully you will never have to use any of these forms but GoAsm does allow them (using either CALL or JMP):-
    #define Hello PROCESS_HASH
    CALL Hello       ;treated as a call to PROCESS_HASH
    CALL 100h        ;treated as a call to a relative address
    CALL [HELLO3+ECX+EDX*4]
    CALL [HELLO3+ECX+EDX*4+9000h]
    CALL $$          ;a call to the start of the current section
    CALL $+20h       ;a call 20h bytes ahead
    

    CALL and JMP to procedures outside the object file or section

    Some assemblers require you to say in your source script whether a call is to somewhere outside the object file which is being made, using EXTRN. They also require the destination of the call to be marked as GLOBAL or PUBLIC. You don't have to do either of these things with GoAsm because if the destination of the call is not found when assembling, it is assumed to be an external call. Also all labels which are not local ones or which have re-usable names are assumed to be "global". GoAsm works in the same way when a call or jump is to be made to a code section with another name.

    So if you want to call a procedure in another source script (which will be producing another object file) just call it in the usual way. Similarly if you have a procedure in another executable (usually a Dll) you can do the same.

    For example, suppose you have written My.Dll containing a calculation algorithm you wish to use with the label CALCULATE. You could call it as follows:-

    CALL CALCULATE
    

    In your list of Dlls you give to GoLink you will specify My.Dll. GoLink will first look for the code label CALCULATE in the object files, but will then look in the specified Dlls. Most other linkers look in library files (.lib files) for the functions they contain, which means you have to make a lib file. Either way, in GoAsm syntax there is nothing further for you to do in your source script. If the linker does not find the destination of the call, an error will be shown.
    This form of the call is a relative call using the opcode E8.

    You could also use this form:-

    CALL [CALCULATE]
    
    For this type of call GoAsm uses the opcodes FF15. This is a call to an absolute address. In 32-bit assembly this is a call to a 32-bit address, but in 64-bit assembly its a call to a 64-bit address.

    See also:-
    using static code libraries
    direct importing by ordinal or specific Dll
    using the C Run-time library


    Calls to Windows APIs - 32-bits and 64-bitstop

    Calling Windows APIs (which reside in Windows system Dlls) is very simple where there are no parameters, for example in 32-bit Windows you can use:-

    CALL GetModuleHandle
    
    or its more advanced alternative which can be used either for 32-bit or 64-bit Windows:-
    INVOKE GetModuleHandle
    
    There is nothing else to put in the source script. Since the function being called resides outside the executable you are making, it is the linker's job to find the Dll which contains the GetModuleHandle procedure and it will record the name of the Dll in your executable. GoLink does this from a list of Dlls which you supply.

    Most Windows APIs, however, expect to be sent parameters (also known as "arguments") when they are called. It is the programmer's job to ensure that these parameters are sent to the API correctly. The parameters contain the information, or pointers to information, which tell the API what to do. Sometimes the parameters contain addresses of places in memory where the API will insert information.

    How you send the parameters depends on whether you are assembling for 32-bit or 64-bits Windows. This is because they each use different calling conventions, and this affects the way parameters are sent and used. 32-bit Windows uses the standard calling convention (STDCALL) and 64-bit Windows uses the so-called fast calling convention (FASTCALL).

    GoAsm provides ARG and INVOKE which can be used for both platforms. GoAsm creates the correct code to suit the calling convention to be used. If you are writing only for 32-bits you can use PUSH and CALL to send the parameters, but if you want to port your code to 64-bit Windows later, you will need to change these to ARG and INVOKE. In both 32-bit and 64-bit source code you would use CALL to call procedures in your own executables, unless you are sending parameters to them using one of these calling conventions.

    In the STDCALL calling convention used in 32-bit Windows, all the parameters are put on the stack by the caller, and the stack pointer (ESP) is moved to the top of the parameters on the stack. Then the API is called. The API uses the parameters on the stack and before returning it restores the stack to equilibrium by moving the stack pointer to the position it was before the first parameter was put on the stack.
    In the FASTCALL calling convention used in 64-bit Windows, the first four parameters are put in the RCX,RDX,R8 and R9 registers instead of on the stack. However, subsequent parameters are put on the stack. The caller needs to ensure that the stack pointer (in this case RSP) is moved to the top of the parameters as usual, allowing for the first four parameters which are held in registers (this is to permit the API to keep them on the stack as if they had been put there in the first place). Another difference is that the API does not restore the stack into equilibrium before returning from the call (this change makes it easier for a handful of APIs which do not have a fixed number of parameters).

    To enable the same source to be used both for 32-bit and 64-bit programming you would send the parameters using ARG and then call the API using INVOKE, for example:-

    ARG 40h,RDX,RAX,[hwnd]
    INVOKE MessageBoxA
    
    In 32-bit assembly the ARG simply does the same as PUSH, and INVOKE does the same as CALL. GoAsm accepts a PUSH instruction of a 64-bit General Purpose register, so PUSH RDX is treated the same as PUSH EDX. Therefore the above call works on both platforms. In 32-bit assembly it translates as:-
    PUSH 40h,EDX,EAX,[hwnd]
    CALL MessageBoxA
    
    However in 64-bit assembly, the same code translates as:-
    MOV R9,40h
    MOV R8,RDX
    MOV RDX,RAX
    MOV RCX,[hwnd]
    SUB RSP,20h
    CALL MessageBoxA
    ADD RSP,20h
    
    See writing 64-bit programs for more details.

    Calls to Windows APIs - using INVOKE

    It is obviously important to send the parameters to the API in the right order. INVOKE helps you to do this by permitting you to put the parameters after the name of the API like in "C". This also helps when working with Windows documentation which always describes the parameters for APIs using "C" syntax. For example here is how the API MessageBox is described:-
    int MessageBox(
        HWND hwnd,            // handle of owner window
        LPCTSTR lpText,       // address of text in message box
        LPCTSTR lpCaption,    // address of title of message box
        UINT uType            // style of message box
       );
    
    Using INVOKE you can follow the same order, for example:-
    INVOKE MessageBoxA, [hwnd],EAX,EDX,40h
    
    which is the same as:-
    ARG 40h,RDX,RAX,[hwnd]
    INVOKE MessageBoxA
    
    Note that ARG (like PUSH) reads the parameters one way, whereas parameters after INVOKE are read the other way.

    INVOKE lets you straddle two or more lines using the continuation character:-

    INVOKE CreateWindowExA, WS_EX_OVERLAPPEDWINDOW, ADDR szClassName, \
                            ADDR szWindowName,\
                            WS_OVERLAPPEDWINDOW+THING,\
                            100,16,400,0,0,0,[hInstance],0
    
    Since GoAsm looks at the parameters to INVOKE starting from the end, errors near the end will be found first.

    When using INVOKE, if you like to tuck away your parameters in a defined word then GoAsm will still get them in the correct order, for example:-

    z_function_params=3,2,1
    INVOKE z_function, z_function_params
    
    produces the same code as:-
    ARG 1,2,3
    INVOKE z_function
    

    Calls to Windows APIs - ANSI and Unicode versions

    Windows APIs which accept character input or give character output (usually in the form of character strings) tend to have two different versions, an ANSI version and a Unicode version. The ANSI version accepts and/or outputs strings in ANSI, where a single byte of value 0 to 255 represents a single character based on the current character set. These characters are also sometimes called "multibyte" characters. The ANSI version of the API will end in "A" as in the CreateWindowExA example below. The Unicode version accepts and/or outputs strings in Unicode, that is two bytes per character based on the current Unicode character set. These are also sometimes called "wide" characters. The Unicode version of the API will end in "W".
    In your source script you need to specify which API you wish to call by adding A or W at the end of the API name. When you link your object file and the linker has been unable to find the API in the other executable (or in the .lib files if you are not using GoLink) this is probably because you have forgotten to add the required A or W. Another reason, however, could be that you haven't provided GoLink with the name of the Dll holding the API (or the correct .lib files if you are not using GoLink).
    If you want automatically to make the correct "A" or "W" call depending on whether you are making an ANSI or Unicode version of your application this can be done using Unicode/ANSI switching. See writing Unicode programs for information about this in detail.
    There is no difference between 32-bit assembly and 64-bit assembly in this respect. This is because 64-bit Windows has ANSI and Unicode versions of the APIs just like 32-bit Windows.
     

    PUSH or ARG pointers to strings and datatop

    Pointers to null terminated strings

    GoAsm supports an extension of PUSH or ARG which is very helpful when programming in Windows. Often in Windows you need to send to an API a parameter which is a pointer to a null-terminated string for example (in 32-bits):-
    MBTITLE   DB 'Hello',0
    MBMESSAGE DB 'Click OK',0
    PUSH 40h, ADDR MBTITLE, ADDR MBMESSAGE, [hwnd]
    CALL MessageBoxA
    
    To make this easier GoAsm permits the use of PUSH or ARG like this:-
    PUSH 40h,'Hello','Click OK',[hwnd]
    CALL MessageBoxA
    
    or, if you were writing source for 32-bit or 64-bit platforms:-
    ARG 40h,'Hello','Click OK',[hwnd]
    INVOKE MessageBoxA
    
    or if you prefer to send parameters after INVOKE:-
    INVOKE MessageBoxA, [hwnd],'Click OK','Hello',40h
    
    You can also use this with Unicode strings as follows:-
    ARG 40h,L'Hello',L'Click OK',[hwnd]
    INVOKE MessageBoxW
    INVOKE MessageBoxW, [hwnd],L'Click OK',L'Hello',40h
    
    When you use any of these forms the string will always be null-terminated. What is happening here is that GoAsm places the string in the const section if there is one (or the data section if there is one, if not, in the code section) and adds a null-terminator. Then GoAsm creates the correct instruction and gives it a pointer to the string. No symbol is made for debugging purposes.

    In 64-bit assembly, GoAsm ensures that Unicode strings are aligned on a word boundary as required by the system. Note that this is similar to PUSH ADDR and will make use of the R11 register and take advantage of the shorter RIP-relative addressing of the LEA instruction.

    Pushing pointers to raw data

    You can do a similar thing with ordinary raw data (in bytes) using the < and > operators. For example:-
    PUSH <23,24,25>                  ;push a pointer to the bytes 23,24,25
    
    or
    PUSH <23,6 DUP 20h,23>           ;push a pointer to the bytes 23,six spaces then 23
    
    or
    PUSH <'Hi',0Dh,0Ah,'There',0>    ;push a pointer to the null terminated string on two lines
    
    You can also use the < and > operators in this way with ARG and after INVOKE. What is happening here is that GoAsm places the data declaration between the < and > operators in the const section if there is one (or the data section if there is one, if not, in the code section). Then GoAsm creates the correct instruction and gives it a pointer to the data. No symbol is made for debugging purposes.
    Note that when using the < and > operators in this way no null terminator is added to strings.

    In 64-bit assembly, GoAsm ensures that data is aligned on a word boundary as would be required by the system if the data contains Unicode strings.

    Moving pointers to strings and data into registerstop

    You can also establish null terminated strings and data and move pointers to them into registers using the following syntax (for example):-
    MOV EAX,ADDR 'This is a string'
    MOV EAX,ADDR <'String',0Dh,0Ah>
    
    When GoAsm deals with this code it places a null terminated string or the data between the < and > operators in the const section if there is one (or the data section if there is one, if not, in the code section). Then GoAsm gives the pointer to the data so created to the instruction. No symbol is made for debugging purposes.
    Note that when using the < and > operators no null terminator is added to strings.
    Note also how this differs from the syntax for inserting character immediates into a register. The difference is in the use of the ADDR operator.

    This works the same way in 64-bit programming except that GoAsm ensures that a Unicode string or data is word aligned in memory as required by the system. Note that this is similar to PUSH ADDR and will make use of the R11 register and take advantage of the shorter RIP-relative addressing of the LEA instruction.


    Using character immediates in codetop

    GoAsm does not reverse store word and dword character immediates as MASM does. So for example,
    MOV AL,'1'
    MOV AX,'12'          ;regarded as bytes - 1 first then 2
    MOV EAX,'ABCD'       ;regarded as bytes - A first, then B then C then D
    
    This makes it much easier to add short strings to memory eg. to add the extension .fil to a filename in memory you can code:-
    MOV [EDI],'.fil'     ;or
    MOV EAX,'.fil'
    MOV [EDI],EAX
    
    and not
    MOV [EDI],'lif.'     ;or
    MOV EAX,'lif.'
    MOV [EDI],EAX
    
    CMP works in the same way for example:-
    CMP AL,'1'
    CMP EAX,'ABCD'
    CMP [EDI],'.fil'
    
    This does not change the usual reverse order of material not in quotes so for example when you want to add a carriage return and then a linefeed to text you can still use:-
    MOV AX,0A0Dh
    STOSW
    
    Here the carriage return (0Dh) which is in AL, is loaded into memory first, then the linefeed (0Ah) in AH is loaded into memory.
    If the string is shorter than the register or memory type absolute zeroes are added for example,
    MOV EAX,'ABC'        ;codes as A then B then C then zero
    
    When writing source code for Unicode programs you can ensure that character immediates are Unicode or if necessary, switched between ANSI and Unicode see using the correct string in quoted immediates and switching quoted strings and immediates.

    In 64-bit programming you can use the 64-bit registers to contain character immediates which are 8 characters long, for example:-

    MOV RAX,'Saturday'
    
    However, the CMP instruction is limited to 32-bits, so for example
    CMP RAX,'Saturday'
    
    would show an error.

    Type indicatorstop

    Why they are needed and how they are provided

    Looking at the instruction
    MOV [ESI],20h
    
    This puts the number 20h into a place in memory whose address is contained in the register esi. But what is missing from this instruction is whether the number should be loaded as a byte, as a word or as a dword. In other words should one, two or four bytes of memory be altered? All assemblers require a type indicator in instructions of this sort. The syntax in other assemblers is (using dword as an example):-
    MOV DWORD PTR [ESI],20h       ;MASM
    MOV DWORD [ESI],20h           ;NASM
    MOV D[ESI],20h                ;A386
    
    Of course I have used the A386 syntax which requires a lot less typing so that in GoAsm the type indicators you can use are:-
    B meaning byte
    W meaning word (two bytes)
    D meaning dword (four bytes)
    Q meaning qword (eight bytes)
    T meaning tword (ten bytes)
    
    You can also use these two switchable type indicators:-
    S meaning string (default of 1 for ANSI byte, or 2 for Unicode word)
    P meaning pointer (default of 4 for 32-bit dword, or 8 for 64-bit qword)
    
    See here for more on using the switched type indicator for Unicode/ANSI switching.
    See here for more on using the switched type indicator for 32-bit/64-bit switching.

    Type indicator also required for named memory references

    Like NASM, GoAsm does not type-check, so it will not know the size of this sort of operation:-
    INC [COUNT]
    
    Here GoAsm does not know (and in fact does not care) whether COUNT is a byte, word or dword. Therefore you must give this a type indicator too for example:-
    INC B[COUNT]
    
    Although this is a little more work for the programmer, in fact it can be argued that it makes your source script easier to read and understand, since you can always see the size of the operation from the instruction itself, rather than having to go back to see if COUNT was declared as a byte, word or dword.

    What instructions require a type indicator?

    Generally all instructions where the size of the operation is not obvious. Some of these examples use named memory references, others memory references pointed to by registers:-
    AND B[MAINFLAG],0FEh
    ADC W[EAX],66h
    ADD D[MEM_AREA],66h
    BT D[EBX],31D
    CMP D[HELLOWORD],0Dh
    DEC D[ECX]
    DIV B[HELLO]
    INC D[EDX]
    MOV B[MEM_AREA],23h
    MOVSX EDX,B[EDI]
    MUL B[HELLO]
    NEG W[ESI]
    NOT D[HELLO3]
    OR B[MAINFLAG],1h
    SETZ B[BYTETEST]
    SHL W[IAMAWORD],23h
    SHL D[IAMADWORD],CL
    SUB D[EBP+10h],20D
    TEST B[ESP+4h],1h
    XOR D[IMAWORD],11111111h
    
    And in 64-bit programming you might also see, for example
    ADC W[RAX],66h
    BT D[R12],31D
    INC Q[RDX]
    NEG W[R15D]
    

    What instructions do not require a type indicator?

    Where the size of the operation is obvious from the use of a register for example
    AND [MAINFLAG],CL
    CMP [HELLOWORD],EDI
    MOV [IAMABYTE],AL
    MOV [IAMADWORD],ESI
    OR [MAINFLAG],BH
    XCHG CL,[ESI]
    
    Also none of the mmx, xmm or 3DNow! instructions require a type indicator. Several of the x87 floating point instructions do not need a type indicator. Those which do can take more than one operand size. There are also several instructions which can only take one operand size so with these there is no need for a type indicator. For example CALL, JMP, PUSH, and POP always take a dword. See half stack operations for the use of PUSHW and POPW. Also some less common instructions do not need a type indicator, for example ARPL, BOUND, BSF, BSR, CMOV (in all forms), CMPXCHG, and CMPXCHG8B.
     

    Repeat instructionstop

    Repeat instructions are available for PUSH, POP, INC, DEC, and of course when declaring data, for example:-
    PUSH 0,23h,[hwnd],ADDR lParam,EAX
    POP EAX,[EBP+2Ch],[hwnd]
    DEC ECX,EDX,[COUNT]
    INC [EBP+10h],EDI
    DB 23h,24h,25h
    
    The instructions here are always assembled in left-to-right order.
     

    Numbers and arithmetictop

    Numbers

    Most assemblers use the following syntax for numbers:-
    66ABCDEh      ;a hex number
    34567789      ;a decimal number
    1100011B      ;a binary number
    1.0           ;a real number
    1.0E0         ;a real number
    
    GoAsm accepts these numbers but also supports numbers in these formats:-
    9999999D      ;a decimal number
    0x456789      ;a hex number
    
    A hex number which begins with a letter (that is A to F, being values 10 to 15 decimal) must begin with a zero, for example:-
    0A789ABCDh
    or
    0xA789ABCD
    

    Arithmetic

    GoAsm can perform limited arithmetic in data declarations, duplicate amounts in DUP, definitions, when declaring definitions, when using definitions, and in operands to code instructions. You are not allowed to use the multiply sign (asterisk) inside square brackets other than when using an index register.

    Be careful using the OR, AND and NOT logical operators, since these are actually mnemonics. Although GoAsm recognises them if you use them in places where mnemonics are not expected, you can use instead | for OR, & for AND, and ! for NOT.

    Arithmetic in brackets is carried out first, otherwise calculations are carried out in strict left-to-right order. Here are some examples:-

    DB 2*3
    DB (2+30h)/(2+1)
    DD (2000h+40h-20h)/2
    DD SIZEOF HELLO/2
    DD 444444h & 226222h
    DB 20h/2 DUP 44h
    DB 6+2 DUP 0
    #define globule (2*3)/2
    DB globule
    DD globule|100h
    DD 2D00h>>8
    DQ 2D00h<<48
    MOV EAX,globule|100h
    MOV EAX,SIZEOF HELLO*2
    MOV EAX,ADDR HELLO+10h
    MOV EAX,0x68+0x69-0x70
    MOV EAX,[MemName+0x68+0x69-0x70]
    MOV EAX,[ESI*4+45000h]
    MOV EAX,[ESI*4+SIZEOF HELLO/2]
    MOV EAX,8+8*2       ;result is 32
    MOV EAX,8+(8*2)     ;result is 24
    
    Divisions are rounded according to the result eg.
    MOV EAX,32/3        ;puts 11 into eax
    MOV EAX,31/3        ;puts 10 into eax
    MOV EAX,10/4        ;puts 3 into eax
    
    GoAsm assumes that all multiplication and division is carried out using unsigned numbers. MUL and DIV are used at compile-time and not their signed counterparts IMUL and IDIV. See understand signed numbers for more about signed numbers.

    Declaring real numbers

    Real numbers are numbers which can contain a representation of a value of less than 1. GoAsm expects all real numbers in the source script to be in the form of a floating point number, that is a number made up of digits which have point within them. The point must be represented by a "period" (a full stop, ASCII character value 2Eh). The point can be anywhere within the digits. The real number may have a signed decimal exponent at the end of the number (using "e" or "E" following the IEEE Floating Point Standard). The x87 floating point registers of the processor can accept real numbers to 32, 64 or 80 bit resolutions. The 3DNow! and SSE instructions work with 32-bit real numbers and the SSE2 instructions use 64-bit real numbers.
    Sometimes these types are called:-
    32-bit single-precision
    64-bit double-precision
    80-bit extended-precision
    So real numbers can be declared as dwords (32-bit), qwords (64-bit) or twords (80-bit). Here are some examples of real number data declarations:-
    DD 1.6789E3
    DQ 1.6789E3
    DT 1.6789E3
    DD 3 DUP 7.6789E-2
    DQ 678.27896435E3
    DT 1.2
    
    You may also declare PI directly either as a tword, qword or dword as follows:
    DD PI            ;pi as a dword
    DQ PI            ;pi as a qword
    DT PI            ;pi as a tword
    
    GoAsm tries to achieve maximum accuracy in providing pi by writing a known number directly into the mantissa.

    You can also declare real numbers as follows:-

    PUSH 1.1
    MOV EAX,1.1
    
    Both of these use a 32-bit format for the real number. The first places that number on the stack and the second moves it into the specified register.

    GoAsm's conversion accuracy

    GoAsm uses special algorithms to ensure optimum accuracy in loading the real number data declaration to data. In the case of a tword (80 bit) data declaration the calculation is performed if necessary to a maximum of 92 bits and then rounded down to fit into the 64 bit mantissa. Conversion to a qword (64 bits) is carried out using the maximum available precision (53 bit mantissa) with "near" rounding. Conversion to a dword (32 bits) is carried out using the maximum available precision (24 bit mantissa) with "near" rounding.

    Directly loading the exponent and mantissa

    Instead of using real numbers to load the floating point registers you can declare a tword and load the exponent and mantissa directly using the FLD instruction. In order to do this you will need to know the exponent and mantissa values to load (these can either be calculated or found and checked using one of the fpu panes in GoBug). Suppose, for example you want a representation of pi which is as accurate as possible and you know that this is an exponent of +0002 and a mantissa of +C90FDAA22168C235h. Then you can declare this number using:-
    DIRECT_PI DT 4000C90FDAA22168C235h
    
    and load it using:-
    FLD T[DIRECT_PI]
    
    The most significant bit (bit 79) in this tword declaration is a sign bit indicating whether the real number is positive or negative. In this case the number is positive because the sign bit is not set. The remainder of the first four hex digits contain the exponent. This is biased by a value of +3FFEh in 80 bit real numbers. This permits exponents of between -3FEEh and +4001h to be handled without using the most significant bit (the exponents become 0 to 7FFFh). The remainder of the hex digits contain the mantissa.
    It is much more difficult to load the exponent and mantissa directly using dword and qword data declarations. This is because the division between exponent and mantissa in those types of numbers is not at a 4 bit boundary. This makes it difficult to work out the correct hex numbers to declare, bearing in mind the bias which needs to be applied.
     

    Characters in GoAsmtop

    Strings of characters

    In your source script you will often be relying on character representation for example:-
    Mess DB 'I am a string of characters',0
    PUSH 'This is supposed to be a carat ^'
    MOV EAX,'$|@'
    
    It must be asked what actual values are loaded by GoAsm when issuing these instructions? At assemble time GoAsm views your source script using Windows file mapping, and then reads it character by character. In other words GoAsm is given the value of the characters in the source script by Windows. When GoAsm loads in the object file strings of the sort shown above, it loads the same value character as given to it by Windows. In the case of conversions from ANSI to Unicode strings, these are passed first through the API MultiByteToWideChar. This means that the value given to GoAsm by Windows will match that in the current character set (code page). Accordingly you need to ensure that the character set used in the computer which runs GoAsm is the character set for which your program is designed to run.

    If you are using a source script which is in a Unicode format (UTF-8 or UTF-16) then the codepage issue disappears. The correct characters are given by their Unicode value.

    Characters specified directly

    Sometimes you will specify characters by their actual values to try to deal with character set variations for example,
    CMP AL,124D       ;see if character is an OR as in some character sets
    JZ >L4            ;yes
    CMP AL,221D       ;see if character is an OR as in some character sets
    JZ >L4            ;yes
    
    Here you have already allowed for a possible variation in the user's own character set. If necessary you can arrange for your code to test the user's character set at run-time, and to test for the correct characters or use the correct strings accordingly. You can also test the language of the user's machine and provide strings in the correct language. The resource APIs provide a way this can be done automatically - see the manual to GoRC, my resource compiler.
     

    Operatorstop

    These are some operators which may be used in the source script which have a special meaning to GoAsm.
    ,               - the instruction is not finished, continue
    ; or //         - a comment line - ignore to end of line
    /*.........*/   - continuous comment - ignore between the marks
    \               - the material is continuing on the next line
    - number        - the number is negative
    ! number        - invert the number (like NOT)
    NOT             - invert the number
    ~ number        - same
    +               - the plus sign
    -               - the minus sign
    *               - the multiply sign
    /               - the divide sign
    |               - bitwise OR
    OR              - bitwise OR
    &               - bitwise AND
    AND             - bitwise AND
    << number       - bit shift left by the number
    >> number       - bit shift right by the number
    (....)          - perform calculation in brackets first
    

    ## in a definition has a special meaning see using double hashes in definitions.
     


    Advanced featurestop

    Structures - different types and uses

    What are structures?

    Structures are data areas of a fixed size which hold data in various components (structure members). They can range from very loose arrangements to highly formalised ones with structures within structures (nested structures). They can be data areas established by ordinary data declaration or from STRUCT templates. Structures are very important in Windows programming and GoAsm supports all types.
    See also unions.

    Using simple structures in Windows programming

    Let's take the LV_COLUMN structure which is used to organise the columns in a listview control. The following code sends the LVM_INSERTCOLUMN message (value 101Bh) to the ListView control to make a new column with the index number of the column in eax. The column details are contained in the LV_COLUMN structure. Here is how it might be used in 32-bit code:-
    PUSH ADDR LV_COLUMN,EAX,101Bh,hListView
    CALL SendMessageA              ;insert eax column
    
    Now let's look more closely at the LV_COLUMN structure.
    In the Windows header file Commctrl.h (pre-Win_IE 300 version) which contains information about the structure it is described as a structure of six dwords. In one sense therefore the structure can be regarded as 6 dwords which can be declared very simply as follows:-
    LV_COLUMN DD 6 DUP 0
    
    However, in the Windows information, each of the six dwords has a name which gives some idea of what it is used for, which is useful. Also the very first dword is a mask which identifies which of the later members of the structure are valid. This mask is important because a later version of the structure has another two members, and the mask needs to be different. So it might be better to declare the structure in data like this so that the mask can be initialised with a value, and so that you can see the names in your source script:-
    LV_COLUMN
      DD 0Fh       ;+0h mask
      DD 2h        ;+4h fmt=LVCFMT_CENTER=2
      DD 0         ;+8h cx
      DD 0         ;+0Ch pszText
      DD 0         ;+10h cchTextMax
      DD 0         ;+14h iSubItem
    
    Here see that whilst declaring the structure in data we have taken the opportunity to initialise two of the members with values which will not change and have included in the comments the offset details, member names and other information.

    Reading from and writing to the simple structure

    It is very easy to read from and write to the simple structure shown above for example:-
    MOV EDI,ADDR LV_COLUMN
    MOV ESI,ADDR ColumnText     ;get the column text to use
    MOV [EDI+0Ch],ESI           ;and give it to the structure
    MOV D[EDI+8h],50D           ;and make the width 50 pixels
    
    or you can use:-
    MOV ESI,ADDR ColumnText     ;get the column text to use
    MOV [LV_COLUMN+0Ch],ESI     ;and give it to the structure
    MOV D[LV_COLUMN+8h],50D     ;and make the width 50 pixels
    

    More formalised structures using STRUCT

    Some programmers prefer to be more formal when using structures by using a structure template. This is done in two stages. The first stage is to make a template by using STRUCT and give a name to the template. This does not actually declare any data.

    Here is an example of a structure template made with the name LV_COLUMN:-

    LV_COLUMN STRUCT
      mask       DD 0Fh       ;mask
      fmt        DD 2h        ;LVCFMT_CENTER=2
      cx         DD 0
      pszText    DD 0
      cchTextMax DD 0
      iSubItem   DD 0
    ENDS
    
    I have added some comments here to help understand the initialisation of two members of the structure. Note ENDS (literally END STRUCT) marks the end of the template. If you prefer you can also mark the end of the template by giving the structure name again followed by ENDS eg.
    LV_COLUMN ENDS
    
    The second stage is to use the template. You do this by using the template in the data section, usually preceded by a label, for example:-
    Lv1 LV_COLUMN
    
    Here you have declared six dwords using the LV_COLUMN structure template and you have given the structure declaration the label Lv1.

    The symbols created by formalised structures

    In GoAsm, symbols are made for the label of the structure itself and also for the each named member of the structure. These can then be referenced directly and also can be passed to the debugger.
    So for example:-
    RECT STRUCT
         left   DD
         top    DD
         right  DD
         bottom DD
    ENDS
    rc RECT
    
    creates the following symbols:-
    rc
    rc.left
    rc.top
    rc.right
    rc.bottom
    

    Reading from and writing to the formalised structure

    Using the formalised structure allows you to be more specific in your source script when reading from and writing to the structure, for example:-
    MOV ESI,ADDR ColumnText     ;get the column text to use
    MOV [Lv1.pszText],ESI       ;and give it to the structure
    MOV D[Lv1.cx],50D           ;and make the width 50 pixels
    
    or even
    MOV ESI,ADDR ColumnText     ;get the column text to use
    MOV EDX,ADDR Lv1.pszText    ;get the psztext member
    MOV [EDX],ESI               ;and load the text to use
    MOV EDX,ADDR Lv1.cx         ;get the cx member
    MOV D[EDX],50D              ;and make the width 50 pixels
    
    But there is still nothing to stop you from doing this which is the same thing:-
    MOV ESI,ADDR ColumnText     ;get the column text to use
    MOV [Lv1+0Ch],ESI           ;and give it to the structure
    MOV D[Lv1+8h],50D           ;and make the width 50 pixels
    
    Although it is more complex to set up, the advantage of the former method is that when you look at your code in the symbolic debugger the symbols in the structure will appear in full, with both the structure label and the member name appearing which is some advantage. This is because GoAsm creates symbols for all the members of the structure and passes these to the linker. As far as I am aware this is unique to GoAsm and other assemblers do not do this.

    Getting the offset of structure members

    Sometimes you need to get the offset of a member within the structure. You do this by referring to the structure by name followed by a period and the name of the member, for example
    POINT STRUCT
       left  DD 0
       right DD 0
    ENDS
    
    Then
    MOV EBX,POINT.right
    
    This loads the value 4 into EBX, which is the distance of the member from the beginning of the structure.

    This way of getting an offset is sometimes useful to get information sent by Windows in a structure. As an example, the OFNHookProc callback procedure receives from Windows information in a WM_NOTIFY message. The lParam parameter contains a pointer to an OFNOTIFY structure. This is a nested structure with the following form:-

    OFNOTIFY STRUCT
      hdr     NMHDR
      lpOFN   DD
      pszFile DD
    ENDS
    
    where the NMHDR structure is:-
    NMHDR STRUCT
      hwndFrom DD
      idFrom   DD
      code     DD
    ENDS
    
    So within your window procedure you can get the value of the member idFrom in the NMHDR (identifier of the control sending the message) as follows:-
    MOV ESI,[EBP+14h]                 ;get the pointer to the OFNOTIFY structure
    MOV EAX,[ESI+OFNOTIFY.hdr.idFrom]
    MOV EDX,[ESI+OFNOTIFY.pszFile]
    
    In fact what is happening here is that OFNOTIFY.hdr.idFrom resolves to a value of 4; OFNOTIFY.pszFile resolves to a value of 10h. These are their correct offsets from the beginning of the OFNOTIFY structure. Of course the structures concerned must be known to GoAsm. This is done by including the structure templates in the assembler source script, somewhere earlier in the file.

    Overriding the initialisation of the structure

    Suppose you have a structure called RECT as follows:-
    RECT STRUCT
        left   DD 10
        top    DD 10
        right  DD 120
        bottom DD 90
    ENDS
    
    You can override the initialisation of the structure using the < and >, { and } operators for example
    rc1 RECT <0,20,120,300>
    
    sets the dwords in the data structure to 0, 20, 120 and 300 respectively.
    You can use the question mark and the comma, or just the comma to ignore some members, for example:-
    rc1 RECT <0,?,?,300>
    rc1 RECT <0,,,300>
    
    here you override only the first and fourth members of the structure.

    Using braces you can pick and choose which members to override:-

    rc1 RECT {left=2,top=5}
    
    or you can mix the two methods:-
    rc1 RECT <{left=2,top=5},300h>
    
    When using braces you don't need to specify the full symbol name (in the above example this would be "rc1.left" and "rc1.top"). Instead you only specify the ultimate name ("left" and "top"). The override is also carried out into nested structures, so if you use the same names for members within a nested structure it is possible to initialise several members at once using one brace override.

    Initialising structure members which have DUP data declarations

    If members of a structure are established using DUP, you can either override the initialisation by using a string or by specifying each element within < and > brackets:-
    UP STRUCT
      DB 27 DUP 0
      DB 2  DUP 0
    ENDS
    Pent UP <'My cat was born on 23 April',<23h,4h>>
    
    So, for example here is the GUID structure and a typical initialisation for COM:-
    GUID STRUCT
        Data1 dd ?
        Data2 dw ?
        Data3 dw ?
        Data4 db 8 dup ?
    GUID ENDS
    IID_IShellLink GUID <0000214eeh, 00000h, 00000h, <0c0h, 00h, 00h, 00h, 00h, 00h, 00h, 46h>>
    

    Some syntax rules when using STRUCT

    One important rule is that since GoAsm is a one-pass assembler structure templates must be made in the source script before they are used. This is because GoAsm cannot know in advance how large the structure is going to be. GoAsm is rather more relaxed with its syntax for STRUCT than other assemblers. STRUC means the same as STRUCT. There is no need to provide any initial values at all, and it does not matter that members are not named, so
    RECT STRUCT
         left   DD
         top    DD
         right  DD
         bottom DD
    ENDS
    
    and
    RECT STRUCT
         left   DD 0
                DD 2 DUP 0
         bottom DD 0
    ENDS
    
    and
    RECT STRUCT DD 4 DUP 0 ENDS
    
    are equally valid structure declarations. However, where members are named they must be on a new line.
    You may reuse the name for structure members, provided the structure's name is different, eg.
    RECT STRUCT
         left   DD 0
         top    DD 0
         right  DD 0
         bottom DD 0
    ENDS
    RECT2 STRUCT
         left   DD 0
         top    DD 0
         right  DD 0
         bottom DD 0
    ENDS
    
    If you use ? in the initialisation of the structure members this has the same effect as using zero. This does not result in the data being recorded as uninitialised, as it would do with an ordinary data declaration, so
    RECT STRUCT
         left   DD ?
         top    DD ?
         right  DD ?
         bottom DD ?
    ENDS
    rc1 RECT
    
    is perfectly valid, but the data will go in the section initialised to zero as if zeroes had been used.

    In a structure template you can make additional data on one line in the usual way so that this would be a structure template of four dwords:-

    RECT STRUCT
         lefttop     DD 0,0
         rightbottom DD 0,0
    ENDS
    

    Repeat structure declarations

    It may be useful to create arrays and tables using structure templates. For example:-
    RECT <>,<>,<>,<>
    
    Creates four RECT structures (four dwords in each). Since no label has been used in front of the RECT, no symbols at all will be created and passed to the debugger. In this example:-
    Buffer RECT <0,0,10,10>,<5,5,20,20>,<8,8,30,30>
    
    an array is made of three RECT structures (four dwords in each) initialised to the values provided. Symbols will only be made for the very first structure. This is to avoid duplication of symbol names.

    If you want the members of the array to have unique symbol names you would need to use (for example):-

    Buffer1 RECT <0,0,10,10>
    Buffer2 RECT <5,5,20,20>
    Buffer3 RECT <8,8,30,30>
    
    or
    Buffer RECT3 <0,0,10,10,  5,5,20,20,  8,8,30,30>
    
    where RECT3 is a structure of 3 RECTS.

    If you don't need to initialise the structures you can repeat them using either:-

    Buffer RECT <>,<>,<>
    
    which creates three RECT structures, or
    Buffer RECT,RECT,RECT
    
    which does the same thing.

    You can also use DUP to repeat structures for example:-

    ThreeRects RECT 3 DUP <>
    FiveRects  RECT 5 DUP <23,24,25,26>
    
    In the second example each RECT is initialised to the same value. Initialisation of duplicated structures in this way can only be done at the top level and not in nested structures.

    Nested structures using STRUCT

    Structures can be nested by using a structure within another, so
    RECT STRUCT
         left   DD 0
         top    DD 0
         right  DD 0
         bottom DD 0
    ENDS
    StructTest STRUCT
        a DD    6
        b RECT
        c DD    7
        d DD    8
    ENDS
    
    Then
    Hello StructTest
    
    Creates seven dwords. The symbols created (and passed to the debugger) are:-
    Hello
    Hello.a
    Hello.b
    Hello.b.left
    Hello.b.top
    Hello.b.right
    Hello.b.bottom
    Hello.c
    Hello.d
    
    and they can be read from or written to in the usual way, for example
    MOV D[Hello.b.left],100h     ;make rectangle start at 256 pixels
    
    Like structure members, nested structures need not be named, so that this is perfectly valid:-
    StructTest STRUCT
          DD     6
          RECT
        c DD     7
        d DD     8
    ENDS
    

    Internally nested structures

    Structures can be nested by declaring a structure within a structure, so
    StructTest STRUCT
        a DD    6
        b STRUCT
          left   DD 0
          top    DD 0
          right  DD 0
          bottom DD 0
          ENDS
        c DD    7
        d DD    8
    ENDS
    
    Then
    Hello StructTest
    
    produces the same result as StructTest in the previous example. The only difference is that the RECT structure is not available for use elsewhere.

    Overriding initialisation in nested structures

    You need carefully to use the < and > brackets to initialise the correct members of the nested structure. Each < bracket will go deeper into the nest and each > bracket will come back by one out of that nest. When coming out of a nest a comma is expected after the > bracket. So in the StructTest nested structure (given above):-
    rc1 StructTest <23,<10,20,120,300>,44,55>
    
    will initialise the main structure and also its nested RECT member
    rc1 StructTest <,<10,20,120,?>,44,55>
    
    will only override the initialisation of some members as will
    rc1 StructTest <,<10,20,120,>,44,55>
    
    but this will not change the nested RECT member:-
    rc1 StructTest <,,44,55>
    
    A good way to keep track of the brackets is to visualise a question mark for those members which you do not want to alter. Or you can even insert the mark for easier reading, for example the last example can be written:-
    rc1 StructTest <?,?,44,55>
    

    Override priority

    The higher the level of override the greater its priority so for example, suppose you have these structure templates:-
    RECT STRUCT
         left   DD 1
         top    DD 2
         right  DD 3
         bottom DD 4
    ENDS
    StructTest STRUCT
        a DD    6
        b RECT  <3333h,4444h,5555h,>
        c DD    7
        d DD    8
    ENDS
    
    Then
    Hello StructTest <,<,0Bh,0Ch,>,,>
    
    Then RECT would be initialised to 3333h,0Bh,0Ch,4

    Overrides naming members using { } braces have a higher priority than overrides using the < and > brackets.

    Using strings in structures

    You can use strings in structures in the same way as you would in an ordinary data declaration for example:-
    StringStruct STRUCT
       DB 'I am a lonely string in a struct',0
       DB 'I will keep you company',0
    ENDS
    

    Structures with strings: initialisation and override

    If a structure is declared with ?, then it could be any size when used with strings eg.
    Rect STRUCT
       a DB ?
         DB 0
       b DB ?
         DB 0
    ENDS
    RC1 Rect <'Hello',,'Goodbye'>
    
    will set the Rect structure to null terminated strings of 5 and 7 bytes respectively.
    When the size of the member of a structure is already set eg.
    Rect STRUCT
       a DB 'Hello'
         DB 0
       b DB 'Goodbye'
         DB 0
    ENDS
    
    Then overriding the initialisation will not change the size of the members, so that eg.
    RC1 Rect <'Goodbye',,'Hello'>
    
    would result in a string at label Rect.a of 'Goodb' and a string at label Rect.b of 'Hello ', where the rest of the string is padded with nulls.

    The initialisation of structures members established using DUP can also be overriden by strings for example:-

    UP STRUCT
       DB 20 DUP 0
    ENDS
    Pent UP <"Hello">
    
    results in the string Hello followed by 15 nulls.

    See also Using Unicode strings in structures.

    Conditional assembly in structures

    You can use conditional assembly directly within a structure, for example
    NMHDR STRUCT
    hwndFrom DD
    idFrom   DD
    code     DD
    ENDS
    ;
    NMTTDISPINFO STRUCT
    hdr NMHDR
    lpszText DD
    #if STRINGS UNICODE
    szText DW 80 DUP ?
    #else
    szText DB 80 DUP ?
    #endif
    hinst    DD
    uFlags   DD
    lParam   DD
    ENDS
    
    DATA
    Use1 NMTTDISPINFO
    Use2 NMTTDISPINFO <<>,,"Hello",,,,>
    
    The second use of the structure will assemble the string "Hello" either in Unicode or in ANSI depending on whether STRINGS are defined as Unicode.
    see conditional assembly.

    Unionstop

    What are unions?

    Unions, like structures, are data areas of a fixed size which hold data in various components (union members). Like structures, no data area is actually created when you declare a union template. This is done when you use the template. Unions differ from structures in that each member of the union starts off at same address in memory. Unions are useful if you want to use different labels to address the same data area. Which label you address at run-time might then depend on eventualities such as the version of the operating system on which the program is being run. The size of a union is always set to the largest data declaration within it. You can mix unions with structures to form complex templates. You can declare unions in local data and repeat them in the same way as you can for structures.

    For example, take the union template declared as follows:-

    Thing UNION
          Cat DD 0
          Dog DW 0
          Rat DB 0
    ENDS
    
    Then you can use this template as follows:-
    Hungry Thing
    
    This then sets aside a data area of 4 bytes (a dword). Why only 4 bytes? Because each member starts in the same place. The end of the union template is marked by ENDS although you can use ENDUNION if you prefer.

    The symbols created by this union are:-

    Hungry
    Hungry.Cat
    Hungry.Dog
    Hungry.Rat
    
    And you can address these labels in the usual way, for example:-
    MOV [Hungry.Cat],EAX
    MOV AL,[Hungry.Dog]
    MOV ESI,[Hungry.Rat]
    
    Which of course since each member starts in the same place, is the same as:-
    MOV [Hungry.Cat],EAX
    MOV AL,[Hungry.Cat]
    MOV ESI,[Hungry.Cat]
    

    Nested unions

    You can nest unions in structures, or nest structures in unions, or nest unions in unions, for example

    Laugh STRUCT
         Balm     DW 0
         Ointment DB 0
    ENDS
    ;
    Zebra UNION
         Tiger  DD 0
         Hyaena Laugh
    ENDS
    ;
    Lion STRUCT
         BagPuss DB 3 DUP 0
         Striped Zebra
    ENDS
    ;
    Fierce Lion
    
    Which produces the following symbols at the following offsets from Fierce:-
    Fierce                         +0
    Fierce.BagPuss                 +0
    Fierce.Striped                 +3
    Fierce.Striped.Tiger           +3
    Fierce.Striped.Hyaena          +3
    Fierce.Striped.Hyaena.Balm     +3
    Fierce.Striped.Hyaena.Ointment +5
    

    Internally nested unions

    Unions can be nested by declaring them within a structure or union, so
    Lion STRUCT
         BagPuss DB 3 DUP 0
         Striped UNION
                 Tiger  DD 0
                 Hyaena STRUCT
                        Balm     DW 0
                        Ointment DB 0
                        ENDS
                 ENDS
    ENDS
    ;
    Fierce Lion
    
    produces the same result as the previous example. The only difference is that Laugh and Zebra are not available for use elsewhere.

    Initialising union members

    In the same way as in structures you can use the < and > operators to initialise unions with strings or numeric values, for example:-
    Cat UNION
    Ginger DB
    Tortie DW
    Grey   DD
    Tabby  DQ
    ENDS
    Hungry    Cat <"a string for Ginger">
    Anxious   Cat <,4444h>                  ;initialises the word
    Sleepy    Cat <,,55555555h>             ;initialises the dword
    Insistent Cat <,,,6666666666666666h>    ;initialises the qword
    
    It is even more difficult when using unions to keep track of the < and > operators, so instead if you prefer, you can specify the name of the member inside { and } operators, or you can initialise them at run time, for example:-
    Scaredy Cat {Ginger="a string for Ginger"}
    GString DB "a string for the Grey cat"
    MOV [Scaredy.Grey],ADDR GString         ;loads a pointer to GString
    
    Remember that since union members are at the same place a later initialisation can rub over an earlier one.
    The optional question mark (?) is useful to show that you do not want to rub out an earlier override.

    Example:-

    Laugh STRUCT
         Balm     DW 6666h
         Ointment DB
    ENDS
    ;
    Zebra UNION
         Tiger  DD 88888888h
         Hyaena Laugh
    ENDS
    ;
    Lion STRUCT
         BagPuss DB
                 DB
                 DB
         Striped Zebra
    ENDS
    ;
    Fierce Lion <{Ointment=0AAh}22h,33h,44h,<?,<?,55h>>>
    
    Which initialises the data area as follows:-
    At Fierce.BagPuss            (at offset +0) 22h,33h,44h
    Then at Fierce.Striped.Tiger (at offset +3) 66h,66h,0AAh,88h
    
    What happened here is that the Tiger dword at +3 in the Zebra union was initialised to 88888888h but then overidden by the values of Balm and Ointment (which were within the same union). Only the very last byte survived.
     

    Definitions: equates, macros and #definestop

    Making something mean something else

    Equates, macros and #defines give a meaning to a word. The word is defined. Once defined, from that point on in the source script, the definition is used instead of the original word if the context permits it. If the word as defined means a number, then assembler programmers tend to call the definition an "equate", because traditionally the word would be defined using the EQU operator or the = operator. If the word as defined means a string, then traditionally it would be called a text equate. If the word as defined means something more than just a number or string, perhaps for example a series of instructions, then assembler programmers tend to call it a "macro". This is because traditionally the word would be defined using the MACRO operator.

    When using GoAsm, for definitions which can be fitted onto one line, you might like to use EQU or =, or #define as you would in "C". Just use the one you like best. You can use the continuation character ("\") to allow definitions to span more than one line, but it is better if you use MACRO...ENDM instead. This avoids syntax problems.

    Since GoAsm is a one-pass assembler, you must ensure that your definitions are not used before they are declared in the source script. Once a word has been defined you can change its definition but GoAsm will warn you of this since it may not be intended.

    Here are some examples how definitions can be used.

    Defining words to mean numbers or strings (data examples)

    Here are three examples which define a word as a constant value. The first uses =, the second uses EQU and the third uses #define. They all do the same thing.
    WS_CHILD=40000000h
    WS_CHILD EQU 40000000h
    #define WS_CHILD 40000000h
    
    You can use arithmetic or strings or even other definitions when you define a word. Here are some examples:-
    SKIP_VALUE EQU 20h|40h
    #define SKIP_VALUE 20h|40h
    HelloText='Hello world'
    #define HelloText "Hello world"
    MANIA=SKIP_VALUE+WS_CHILD
    
    If you don't give a value for the equate it is set to a value of 1 that is, in Windows-speak, TRUE. For example:-
    NT_VERSION=
    NT_VERSION EQU
    #define NT_VERSION
    
    Once a word is defined you can use the word in almost any situation where the definition is valid, for example:-
    DB HelloText
    PUSH WS_CHILD|WS_VISIBLE|SS_OWNERDRAW
    MOV EAX,WS_CHILD
    MOV EAX,[ESI+SKIP_VALUE]
    MOV EAX,MANIA+800h
    

    Defining words to provide code instructions

    Here is an example of how you can define a word to mean a code instruction:-
    #define lParam [EBP+14h]
    
    Then to use the definition you could code as follows:-
    MOV lParam,EAX        ;same as MOV [EBP+14h],EAX
    

    Using arguments when defining words

    Arguments are values which are given when the definition is used. These values are then used within the definition or macro itself. So, for example:-
    RECTB(%a,%b,%c,%d) = DD %a,%b,%c,%d
    
    Then, you can declare four dwords initialised as specified in the arguments:-
    rc1 RECTB (10,10,100,200)      ;same as DD 10,10,100,200
    
    Here is another example using #define:-
    #define DBDATA(%a,%b) DB %a DUP %b
    DBDATA(3,'x')                    ;same as DB 3 DUP 'x'
    
    There is an important syntax rule when using arguments in definitions. When giving the definition the arguments in brackets must be tight against the word which is defined so that
    RECTB(%a,%b,%c,%d)
    
    is good but
    RECTB (%a,%b,%c,%d)
    
    is bad. This rule is to ensure that GoAsm knows the things in brackets are arguments and not something else.
    You must also ensure that the names of the arguments are unusual and will not be found in any other material used in the definition or macro. This avoids other things being replaced inadvertently. Hence the percentage sign is used in the above examples.

    Multi-line definitions

    The definition can straddle lines to make them more readable or to allow you to add comments. For example:-
    #define WS_POPUP            0x80000000L
    #define WS_BORDER           0x00800000L
    #define WS_SYSMENU          0x00080000L
    #define WS_POPUPWINDOW      (WS_POPUP          | \
                                 WS_BORDER         | \
                                 WS_SYSMENU)
    
    This last example was taken straight out of the Windows header file Winuser.h and you can see that it is in typical "C" syntax. GoAsm is quite happy with this. In fact in GoAsm the brackets are optional, so this is also perfectly good syntax:-
    #define WS_POPUPWINDOW       WS_POPUP          | \
                                 WS_BORDER         | \
                                 WS_SYSMENU
    
    You may prefer to use MACRO...ENDM instead of the continuation character. The above example can then be re-written as:-
    WS_POPUPWINDOW MACRO WS_POPUP    |
                         WS_BORDER   |
                         WS_SYSMENU
                   ENDM
    

    Multi-line definitions: stack frame example (windows callback)

    This applies only to 32-bit programming.

    You can use the multi-line definition method to make a word mean several lines of code instruction, for example:-

    OPEN_STACKFRAME(a) =  PUSH EBP \
                          MOV EBP,ESP \
                          SUB ESP,a*4 \
                          PUSH EBX,EDI,ESI
    CLOSE_STACKFRAME   =  POP ESI,EDI,EBX \
                          MOV ESP,EBP \
                          POP EBP
    
    Using MACRO...ENDM this is:-
    OPEN_STACKFRAME(a) MACRO PUSH EBP
                          MOV EBP,ESP
                          SUB ESP,a*4
                          PUSH EBX,EDI,ESI
                       ENDM
    CLOSE_STACKFRAME   MACRO POP ESI,EDI,EBX
                             MOV ESP,EBP
                             POP EBP
                       ENDM
    
    In this example the word OPEN_STACKFRAME is defined to make a stack frame which could typically be used in a windows procedure called by the Windows system. It has an argument which holds the number of dwords in the stack frame to accept local data (the stack pointer is moved by that amount so that the stack can be used to hold local data). The second definition in this example closes the stack frame. Now here is how to use these definitions. In the code section:-
    WndProc:              ;name of this procedure
    OPEN_STACKFRAME (6)   ;create space for 6 dwords of local data
            ;----------------- insert window procedure code here
    CLOSE_STACKFRAME
    RET 10h               ;remove from stack 4 parameters sent by windows
    
    Now lets add some refinement so that the stack can be accessed easily:-
    lParam =[EBP+14h]     ;
    wParam =[EBP+10h]     ; get ready to access the parameters which
    uMsg   =[EBP+0Ch]     ; are sent by Windows to the window procedure
    hwnd   =[EBP+8h]      ;
    ;
    hDC    =[EBP-4h]      ; some names of data things
    hBrush =[EBP-8h]      ; often used in different
    hPen   =[EBP-0Ch]     ; window procedures
    DATA1  =[EBP-10h]     ;
    DATA2  =[EBP-14h]     ; space for more local data
    DATA3  =[EBP-18h]     ;
    
    Inside the stack frame which has been made here the parameters sent by Windows (hwnd, uMsg, wParam and lParam) will always be on the stack at EBP+14h to EBP+8h. Then at EBP+4h we find the return address after the call. At EBP we have the previous value of EBP which we pushed when the stack frame was made. Then at EBP-4h to EBP-18h we have the space for our local data, which in this example can be accessed using the definitions hDC, hBrush, hPen, DATA1, DATA2 and DATA3 (or whatever you want to call them). Then at EBP-1Ch we have the value of EBX when it was pushed when the stack frame was made. Likewise EDI is at EBP-20h and ESI is at EBP-24h. All these values are protected while the stack frame remains open (they will not be written over by other functions until the callback is finished). To access the data within the stack frame you must make sure you don't change ebp (or if you do use it, you restore it to its original value). You don't have to access the data by name. In this example MOV EAX,[hBrush] is the same as MOV EAX,[EBP-8h]. This is a matter of style and taste. Using these methods you can establish as much local data as you want, and if you stick to a fixed method like this you will always know where your local data is. In this example, the first dword of local data is always at EBP-4h. Subtract 4 from this value to access each additional dword of local data.

    There are many other ways of dealing with the stack in callbacks. See callback stack frames in 32-bits and 64-bits, and automated stack frames using FRAME...ENDF, LOCALS, and USEDATA.
    See also "understand the stack" part 1 and part 2.

    Conditional assembly in macros

    If you use conditional assembly in your definitions it is recommended that you use MACRO. ..ENDM instead of the continuation character method.

    Here is an example:-

    STRINGS UNICODE
    CODE
    ;
    #define REPORT
    ;
    MBMACRO(%lpTextW) MACRO
    #ifdef REPORT
    INVOKE MessageBoxW,0,addr %lpTextW,"Report",40h
    #endif
    ENDM
    ;
    MBMACRO("This code was assembled!")
    
    In the above code the Message Box is displayed. But if the line defining REPORT is commented out, then it is not displayed. The "This code was assembled!" string is sent as an argument to the macro.
    see conditional assembly.

    Counting the arguments with ARGCOUNT

    This applies only to 32-bit programming.

    ARGCOUNT returns the number of arguments given when the definition is used and this can be used with conditional assembly in definitions. For example, suppose you have macro26 defined as:-

    macro26(%a,%b,%c,%d,%e,%f) = #if ARGCOUNT=6   \
                                PUSH %f           \
                                #endif            \
                                #if ARGCOUNT >=5  \
                                PUSH %e           \
                                #endif            \
                                #if ARGCOUNT >=4  \
                                PUSH %d           \
                                #endif            \
                                #if ARGCOUNT >=3  \
                                PUSH %c           \
                                #endif            \
                                #if ARGCOUNT >=2  \
                                PUSH %b           \
                                #endif            \
                                #if ARGCOUNT >=1  \
                                PUSH %a           \
                                #endif
    
    and then you use macro26 as follows
    macro26(4,3,2,1)
    
    then the value of ARGCOUNT would be four and the code would be the same as if you had coded:-
    PUSH 1,2,3,4
    
    In the above example, the first two pushes are skipped over because ARGCOUNT is neither 6 (in the first test) nor greater than or equal to 5 (in the second test).

    This can be enlarged to provide a "C" function call macro where the stack is cleared up after the call (the correct number of bytes is added to the stack pointer ESP after the call):-

    macro26(%x,%a,%b,%c,%d,%e,%f) = #if ARGCOUNT=7   \
                                   PUSH %f           \
                                   #endif            \
                                   #if ARGCOUNT >=6  \
                                   PUSH %e           \
                                   #endif            \
                                   #if ARGCOUNT >=5  \
                                   PUSH %d           \
                                   #endif            \
                                   #if ARGCOUNT >=4  \
                                   PUSH %c           \
                                   #endif            \
                                   #if ARGCOUNT >=3  \
                                   PUSH %b           \
                                   #endif            \
                                   #if ARGCOUNT >=2  \
                                   PUSH %a           \
                                   #endif            \
                                   CALL %x           \
                                   ADD ESP,ARGCOUNT-1*4
    
    and here is another way to do it:-
    cinvoke(funcname,%1,%2,%3,%4,%5) = \
                                   #if ARGCOUNT=1 \
                                   invoke funcname  \
                                   #elif ARGCOUNT=2   \
                                   invoke funcname,%1  \
                                   #elif ARGCOUNT=3      \
                                   invoke funcname,%1,%2  \
                                   #elif ARGCOUNT=4        \
                                   invoke funcname,%1,%2,%3 \
                                   #elif ARGCOUNT=5          \
                                   invoke funcname,%1,%2,%3,%4 \
                                   #elif ARGCOUNT=6             \
                                   invoke funcname,%1,%2,%3,%4,%5 \
                                   #endif                          \
                                   #if ARGCOUNT>1                   \
                                   ADD ESP,ARGCOUNT-1*4               \
                                   #endif
    
    These would then be used as follows:-
       cinvoke(_cprintf,23,24,25,26,27)
       macro26(_cprintf,23,24,25,26,27)
    
    If you don't like using the continuation character, you can use MACRO...ENDM instead.

    Using double hashes in definitions

    A double hash in a definition joins two elements, removing all spaces in between. This enables you to create one single word out of one or more components, for example:-
    LVERS=0030
    MVERS=0044h
    VERSION=LVERS##MVERS
    ;
    MOV EAX,VERSION
    
    Here VERSION is defined as the number 00300044h.

    When to use definitions and when not to

    Some programmers use definitions as often as possible. In my opinion this makes the source script more difficult to read. My criterion here is how easy it is for someone else to run through the source code and see what it is doing. If that person must frequently refer to other files or to lists of definitions to understand the source code, then it is bad coding. Used in moderation in Windows programming, however, definitions can be helpful to explain the source script. For example:-
    PUSH WS_CHILD|WS_VISIBLE|SS_OWNERDRAW
    
    means a lot more than:-
    PUSH 5000000Dh
    
    although the comment could provide clarity without using a definition:-
    PUSH 5000000Dh        ;WS_CHILD, WS_VISIBLE, SS_OWNERDRAW
    
    This example reduces clarity in your code and should be avoided:-
    #define wParam EBP+10h
    MOV EAX,[wParam]      ;same as MOV EAX,[EBP+10h]
    
    The reason this is bad is that the reference [wParam] makes it appear that wParam is a label. Better is:-
    #define wParam [EBP+10h]
    MOV EAX,wParam
    
    This is clearer because in GoAsm the only thing you can address in this way without using square brackets is a definition.

    This is also bad and should be avoided at all costs:-

    #define GET_LPARAM MOV EAX,[EBP+14h]
    GET_LPARAM
    
    Better programming practice is to use
    CALL GET_LPARAM
    
    and call it properly as a function. However when manipulating the stack it is very difficult to use a procedure since the CALL and RET themselves alter the stack. So in this instance it may be convenient to use a definition if source script clarity does not suffer. See for an example the OPEN_STACKFRAME and CLOSE_STACKFRAME examples above.

    Also there seems little point in doing this:-

    THOUSAND=1000D
    MOV EAX,THOUSAND
    
    when this would do perfectly well:-
    MOV EAX,1000D
    
    One use for definitions is if you want your source to be understandable to non-English speakers. Then you can translate all mnemonics using equates: see using defined words in Unicode files.
     

    Importing: using run-time librariestop

    Using the C Run-time library

    With GoAsm it's easy to use the C run-time library. These are a number of functions contained in Crtdll.dll or Msvcrt.dll (or their variants) which are usually found on a Windows computer in the system folder. Information about the library is available from Microsoft's developer site (MSDN). The main thing to remember when using these functions is that although you send parameters to the function on the stack, the function does not restore the stack. Therefore you will need to use ADD ESP,x after the call, x being the number of bytes used for the parameters.
     

    Importing: data, by ordinal, by specific Dlltop

    Importing data

    At run-time, data can only be imported indirectly. That is to say you can only import a pointer to data. However, using that pointer you can get the data itself.
    For example, assuming DATA_VALUE is a data export in another program in GoAsm you get the pointer and the data as follows:-
    MOV EBX,[DATA_VALUE]       ;get pointer to DATA_VALUE
    MOV EAX,[EBX]              ;get the value
    
    In the same way as for importing procedures from other programs at link-time you give the linker (GoLink anyway) the name of the executable containing the import.

    Direct importing by ordinal

    The type of importing we have been looking at using CALL procedure will import the procedure by name. What happens here is that when the Windows loader starts the program it searches through the Dlls for the imports required by the program. This is done by comparing the names of the Dll exports against the names of the program's imports. To speed up this process with private Dlls sometimes exporting and importing by ordinal is used. Then the loader can find the correct import by using an index to a table within the Dll. Note that it is unwise to do this in the case of Windows system Dlls since the ordinal numbers of the exports are not guaranteed to be the same across different Dll versions.

    Using GoAsm and its companion program GoLink, you can import by ordinal using this simple syntax:-

    CALL MyDll:6
    
    This will call procedure number 6 in MyDll.dll. Note that the extension "dll" is assumed if no extension is given. Suppose you want a Dll to call a function in the main executable by ordinal, then you could use:-
    CALL Main.exe:15
    
    This calls the 15th function in Main.exe.
    Calls to ordinals using the absolute form (using opcodes FF15) will result from using this syntax:-
    CALL [Main.exe:15]
    
    You should not include the path of the file in the call. GoLink carries out a wide search for specified files, but if it is necessary to provide a path this should be given to the linker and not incorporated in the call in the assembler script.
    Obviously in order to use this method of calling a function by ordinal you must ensure that the ordinal number of the function is fixed see exporting by ordinal.

    Note: the above only applies to GoLink

    There is another way to use ordinals using LoadLibrary to load the Dll (or return a handle if it is already loaded) and then calling GetProcAddress passing the ordinal value to get the value of the procedure to call. Finally you call the procedure as returned by GetProcAddress.

    Importing by specific Dll

    Occasionally you may want to force the linker to use an import from a specific Dll. You might need to do this if two or more Dlls (given to the linker at link-time) offer functions with the same name. You can therefore force GoLink to link to a particular Dll using the syntax:-
    CALL NameOfDll:NameOfAPI
    
    Note: this only applies to GoLink
     

    Exporting procedures and datatop

    You can make your procedures and your data available to other executables by exporting them. In Windows it is usual for Dlls to be used for exports, but sometimes a Dll will need to call a procedure or use data in an Exe file, and in that case the Exe file will also export. Exporting can be done either at link time (you tell the linker which symbols to export), or using GoAsm, you can also do it at assemble time. GoAsm then gives the linker the export information via the .drectve section in the object file (note that not all linkers support this).
    There are two ways to declare exports in GoAsm. You can either declare all the exports at the top of your file (before any sections are declared) or you can declare them within your source code as you go along. You may use either or both of these methods to suit your own preference.
    Here is an example of declaring all the exports before any sections are declared:-
    EXPORTS CALCULATE, ADJUST_DATA, DATA_VALUE
    
    Here is an example of declaring an export as you go along:-
    EXPORT CALCULATE:
    CMP EAX,EDX                ;code for the
    JZ >4                      ;procedure
    
    This code exports the label to the procedure CALCULATE.
    If you prefer you can have the two words on separate lines for example:-
    EXPORT
    CALCULATE:
    CMP EAX,EDX                ;code for the
    JZ >4                      ;procedure
    

    Exporting data

    Data can only be exported indirectly. That is to say you can only export a pointer to data. However, using that pointer the importing program can get the data itself.
    You may use exactly the same method to export a data label as used to export a code label. There is no need to declare the label as data or code. This is because it is the linker's job to check whether the label is within a data or code section. Here is an example of a data label export:-
    EXPORT DATA_VALUE DD 0
    
    This exports the data label DATA_VALUE. The recipient would obtain the value of DATA_VALUE in the following way:-
    MOV EBX,[DATA_VALUE]       ;get pointer to DATA_VALUE
    MOV EAX,[EBX]              ;get the value
    

    Exporting by ordinal

    Normally exports are conducted by name. What happens here is that when the Windows loader starts the program it searches through the Dlls for the imports required by the program. This is done by comparing the names of the Dll exports against the names of the program's imports. To speed up this process with private Dlls sometimes exporting and importing by ordinal is used. Then the loader can find the correct import by using an index to a table within the Dll. Note that it is unwise to do this in the case of Windows system Dlls since the ordinal numbers of the exports are not guaranteed to be the same across different Dll versions.
    In order to use this method it is clearly imperative that the exporting program specifies an ordinal value for a particular export and the linker must not change this. Again, using GoAsm you can specify the correct ordinal value and pass this to the linker via the .drectve section (not all linkers support this).
    Here is how to specify an export by ordinal if exports are listed before any sections are declared:-
    EXPORTS CALCULATE:2, DATA_VALUE:6
    
    Here the linker will be instructed to use the ordinals 2 and 6 for the exports. If you are using the alternative method of declaring exports (within a section) you can use for example:-
    EXPORT:2 CALCULATE:
    
    or in the case of data:-
    EXPORT:6 DATA_VALUE DD 0
    

    Exporting by ordinal without a name

    Exporting by ordinal does not stop the name of the export appearing in the final executable. This is because it is the importing program which decides whether to import by ordinal or by name. All the exporting program can do is to fix the ordinal value. However sometimes a programmer might like to ensure no name for the export appears in the final executable. You sometimes see such "no name" exports in system Dlls for example, probably in order to hide the job carried out by particular functions. In order to do this in GoAsm add the word NONAME to the end of the export for example:-
    EXPORT:2:NONAME
    CALCULATE:
    
    Here the value of the code label CALCULATE will be exported as ordinal number 2, but the name of the export will not appear in the final executable. This means that if another program tried to call the CALCULATE function it would fail. The function can only be called by ordinal.
     

    Automated register and flags save and restore using USES...ENDUtop

    The USES statement followed by a list of registers, causes the registers to be PUSHed on the stack in the same order in which they appear in the list. Then, until ENDU is found in the source script (or ENDUSES if you prefer) any RET encountered will cause the registers to be POPed from the stack in reverse order. For example:-
    ProcX:
    USES EAX,EBX,ECX     ;ready to PUSH the registers on the stack
    CMP EAX,ESI          ;first mnemonic causes the PUSHes to occur
    ;
    ; code for the procedure
    ;
    .finnc
    CLC                  ;ready to return carry flag not set
    RET                  ;POP all registers in reverse followed by RET
    ;
    .finc
    STC                  ;ready to return carry flag set
    RET                  ;POP all registers in reverse followed by RET
    ENDU                 ;end all special POP action when RET found
    
    You can also automatically push and pop the flags, using FLAGS which is a reserved word in GoAsm:-
    USES FLAGS
    
    You cannot change or add to the list of registers from within the procedure. To do that you would need an ENDU followed by a fresh USES list. If you need to RET without popping the registers you can use RETN ("normal" RET).

    In 64-bit programming you can use the extended versions of the general purpose registers (RAX to RSP) and also the new 64-bit registers (R8 to R15). You can also use the 32-bit versions of the general purpose registers (EAX to ESP). This is because when PUSHing registers the 32-bit forms and the 64-bit forms are interchangeable (they produce the same opcode). So, whether you are assembling for 32-bits or for 64-bits,

    USES RAX,RBX,RCX
    
    will code the same as
    USES EAX,EBX,ECX
    
    This helps towards transportability of your code between the two platforms.
     

    Callback stack frames in 32-bits and 64-bitstop

    Introduction
    Stack frames in 32-bit Windows
            Accessing the parameters from the stack
            Restoring the stack to equilibrium before returning to the caller
            Provide space on the stack for local data
            Preserve for Windows the EBX, ESI, EDI and EBP registers
    Stack frames in 64-bit Windows
            Record and access the parameters
            Provide space on the stack for local data
            Preserve for Windows the non-volatile registers

    See also "understand the stack" part 1 and part 2.

    Introduction

    In Windows programming, stack frames are needed for window procedures, hook procedures, superclassing and subclassing, Dll load and unload procedures, and for other callbacks. Callback procedures are all called by Windows, using your program's own thread.

    In GoAsm the creation and use of stack frames is all automated when you use FRAME...ENDF.
    See automated stack frames for how to use this in practice.

    Since 32-bit and 64-bit Windows stack frames are different, they need to be treated separately. But the syntax for using FRAME...ENDF and their companion instructions such as LOCALS and USEDATA..ENDU are the same on both platforms. For this reason when you use FRAME...ENDF it is possible to use the same source script for both. See writing 64-bit programs for more information about 64-bit programming generally.

    Stack frames in 32-bit Windows

    The job that the stack frame has to do is dictated by the calling convention which is used. 32-bit Windows uses the standard calling convention (STDCALL). In this case the stack frame in a window procedure needs to do four jobs:-
    • Access the parameters sent by Windows (which are sent on the stack)
    • Restore the stack to equilibrium before returning to the caller
    • Provide space on the stack for local data
    • Preserve for Windows the EBX, ESI, EDI and EBP registers if they are altered

    All four of these are equally important.

    Accessing the parameters from the stack

    Windows pushes the parameters on the stack before calling your window procedure. This is exactly the same as your code pushing parameters on the stack before calling an API. When Windows calls a window procedure it sends the following parameters as dwords placed on the stack (the words used here are the ones commonly used to label these parameters):-
    hwnd        window handle
    uMsg        message identifier
    wParam      data
    lParam      data
    
    Your window procedure needs to access these parameters. One way is to POP them off the stack into static data, but an easier (and safer) way is to keep them on the stack and reference them directly from the stack. This is better because window procedures sometimes call themselves. This may seem odd, but one example will suffice. Suppose your window procedure needs to fill the window with the correct material at the right time. This is called "painting" the window. This is done by responding to the Windows message WM_PAINT (message number 0Fh). Now the proper way to respond to this message is first to call the API BeginPaint, then to paint the window then to call the API EndPaint. One of the things BeginPaint does is to prepare the window for the paint. In doing so it sends another message to your window procedure, this time WM_ERASEBKGRND (message number 14h). So while your window procedure is dealing with this second message it has not yet returned from the API BeginPaint. After the second message has been dealt with BeginPaint will return. So the window procedure is recursive that is to say, it can come back to itself. For each message (except for hwnd) the parameters will be different. If they are kept on the stack it means that each time the window procedure will be called the parameters will be kept on a different part of the stack and cannot be written over.

    A typical 32-bit stack frame will be set up as follows:-

    TypicalStackFrame:
    PUSH EBP          ;save the value of ebp which will be altered    } called the
    MOV EBP,ESP       ;give current value of stack pointer to ebp     } "prologue"
    ;                 ;POINT "X"
    ;                 ;code to isolate WM_PAINT message
    PUSH ADDR PAINTSTRUCT
    PUSH [hwnd]
    CALL BeginPaint   ;get ready to paint window
    ;                 ;paint window and call EndPaint
    ;
    MOV ESP,EBP       ;restore the stack pointer to previous value    } called
    POP EBP           ;restore the value of ebp                       } the
    RET 10h           ;return to caller adjusting the stack pointer   } "epilogue"
    
    During any recursion esp will be changed since further use of the stack will be made. But ebp is always saved and restored by this procedure so it can always be relied on to access the correct part of the stack for the message being dealt with.
    This is probably best illustrated by an example. In a typical window procedure responding to the message WM_ERASEBKGRND taking a snapshot of the stack when execution is at point "X" the stack will look similar to this (I have missed out a lot of use of the stack for clarity):-

    ebp-4h the next push will go here
    ebp saved value of ebp
    ebp+4h caller's return address
    ebp+8h hwnd
    ebp+0Ch uMsg
    ebp+10h wParam
    ebp+14h lParam
    ebp+18h )
    ebp+1Ch ) other use of stack
    ebp+20h ) (BeginPaint etc)
    ebp+24h )
    ebp+28h saved value of ebp
    ebp+2Ch caller's return address
    ebp+30h hwnd
    ebp+34h uMsg
    ebp+38h wParam
    ebp+3Ch lParam

    Here the parameters lParam, wParam, uMsg and hwnd which are on the stack at ebp+3Ch to ebp+30h are those sent to the window procedure on the WM_PAINT message. Then at ebp+2Ch there is the return address of the caller sending the WM_PAINT message (this will be a call from a Windows Dll). Then ebp+28h is the value of ebp saved on the first instruction coming into the window procedure on the WM_PAINT message. Between ebp+24h and ebp+1Ch is some use of the stack within your window procedure before you called the API BeginPaint. In fact there would be two PUSHes, and then on calling the API the return address (back into your window procedure) would be put on the stack. The fourth PUSH here could be something PUSHed onto the stack by BeginPaint itself prior to sending the WM_ERASEBKGRND message. In practice there would be much more use of the stack within BeginPaint here. The next thing you see at ebp+14h is lParam again. But this is a different lParam than that at ebp+3Ch. This is the lParam sent with WM_ERASEBKGRND. The remainder of the entries up to the current value of EBP are those relating to the WM_ERASEBKGRND message.
    Now lets see what happens when BeginPaint returns. Since we know that BeginPaint will always restore ebp to the value it was on entry to the API, we know that ebp on return will point to the stack at ebp+28h in the above stack frame. From that point, you can see the earlier parameters (those sent with WM_PAINT) can be accessed using ebp+8h, ebp+0Ch, ebp+10h and ebp+14h as before. They were preserved and were not written over by the call to BeginPaint and by the recursion into the window procedure.

    Restoring the stack to equilibrium before returning to the caller

    It is also the job of the windows procedure to restore the stack to equilibrium before it returns to the caller. This involves moving the stack pointer to a higher value by 4 bytes (a dword) for each argument which is sent (each PUSH by the caller has reduced the value of ESP by four so it points to a higher place on the stack). This is exactly what Windows itself does when you call an API. For example, in
    PUSH ADDR PAINTSTRUCT
    PUSH [hwnd]
    CALL BeginPaint   ;get ready to paint window
    
    the stack pointer (esp) was made more negative by 8 bytes by reason of the two pushes before the call to the API BeginPaint. But after the return from BeginPaint, the stack pointer is back where it started. This is because within BeginPaint it was made more positive by 8 bytes before returning to your code.
    Window procedures always have 4 parameters so you know the stack pointer must be moved by 16 bytes to restore it. Other types of callbacks have different numbers of parameters. The Windows SDK gives the appropriate information about those.
    The caller could use ADD ESP,10h to add the 16 bytes after the call but the easiest way is for the procedure itself to move the stack pointer before returning to the caller using the RET instruction with a number, for example RET 10h. This instruction POPs from the stack the caller's return address, moves the stack pointer (ESP register) by the correct number of bytes in this case 16, then finally diverts execution to the caller's return address.

    Provide space on the stack for local data

    Another important task for the callback procedure is to provide space for local data. Local data is data which is kept on the stack while execution is continuing in the callback procedure. It is lost after execution leaves the procedure. We have already seen how data on the stack (in the form of parameters) is preserved by making a stack frame. Space for local data uses the same principle. Suppose we need space for 3 dwords on the stack because we want to preserve this data however deep the window procedure recurses. Then we would code:-
    TypicalStackFrame:
    PUSH EBP          ;save the value of ebp which will be altered    }
    MOV EBP,ESP       ;give current value of stack pointer to ebp     } "prologue"
    SUB ESP,0Ch       ;make space for local data                      }
    ;                 ;POINT "X"
    ;
    ;                 ;window procedure code
    ;
    MOV ESP,EBP       ;restore the stack pointer to previous value    }
    POP EBP           ;restore the value of ebp                       } "epilogue"
    RET 10h           ;return to caller adjusting the stack pointer   }
    
    Here we have moved the stack pointer by 12 bytes, which is exactly the same as if we had done 3 PUSHes. This provides an area of the stack which cannot be used for any other purpose.

    Using FRAME...ENDF you can create the stack frame automatically.
    Here is a typical use of FRAME...ENDF which does the same as the TypicalStackFrame code above, as well as providing names for the parameters which are sent to the window procedure and names for each dword of local data:-

    WndProc FRAME hwnd,uMsg,wParam,lParam
    LOCALS hDC,hInst,KEEP
    ;                 ;POINT "X"
    ;
    ;                 ;code goes here
    ;
    RET
    ENDF
    
    The stack actually looks like this (at point "X" again):-

    ebp-10h the next push will go here
    ebp-0Ch space for local data KEEP ← esp is currently here (top of local data)
    ebp-8h space for local data hInst
    ebp-4h space for local data hDC
    ebp saved value of ebp ← ebp given value of esp when it was here
    ebp+4h caller's return address
    ebp+8h hwnd
    ebp+0Ch uMsg
    ebp+10h wParam
    ebp+14h lParam
    ebp+18h )
    ebp+1Ch ) other use of stack

    Of course if the stack pointer is adjusted in this way it must also be restored. But this time it is automatic because it is simply restored to the value of ebp before returning to the caller, and ebp was saved before the stack pointer was moved to make space for local data.

    Preserve for Windows the EBX, ESI, EDI and EBP registers

    Finally your windows procedure must preserve the ebx, esi, edi and ebp registers. EBP is saved and restored already by the prologue and epilogue code. As for EBX, ESI and EDI, it is probably a good idea to save and restore these even if they are not actually changed by your window procedure, then you need not worry about this any further. In the same way, you can rely on a Windows API preserving these registers. This is particularly useful in assembler programming because you can keep useful handles and other values in these registers whilst calling APIs. You can easily do this using the USES statement for example:-
    WndProc FRAME hwnd,uMsg,wParam,lParam
    USES EBX,ESI,EDI
    LOCALS hDC,hInst,KEEP
    ;
    ;                 ;code goes here
    ;
    RET
    ENDF
    

    Stack frames in 64-bit Windows

    The job that the stack frame has to do is dictated by the calling convention which is used. 64-bit Windows uses the so-called fast calling convention (FASTCALL). Instead of the caller putting the parameters on the stack as in the STDCALL convention, the first four parameters are put in the RCX,RDX,R8 and R9 registers. If there are any more parameters these are put on the stack. So the window procedure needs to do these jobs:-
    • Record on the stack the parameters sent in the registers and access any additional parameters sent by Windows on the stack
    • Provide space on the stack for local data
    • Preserve for Windows the "non-volatile" registers of they are altered. These are the RBP,RBX,RDI,RSI,R12 to R15 and XMM6 to XMM15 registers.
    There is no need for the window procedure to restore the stack to equilibrium before returning to the caller. This job is done by the caller.

    Record and access the parameters

    The RCX,RDX,R8 and R9 registers are "volatile". That is to say Windows does not guarantee that they will be maintained across an API call. The first four parameters are, however, sent in these registers. This means that as soon as your window procedure makes an API call it is possible that the parameters will be written over. For this reason it is necessary to keep these parameters safely. They could be kept in the non-volatile registers, but the Windows documentation recommends that they are kept on the stack itself. Apparently this is done within the APIs themselves. In the FASTCALL calling convention as documented and implemented in 64-bit Windows, the caller is obliged to move RSP more negatively by 32 bytes before making the call to provide space on the stack for this to be done. The stack places so reserved are called "placeholders". Each parameter has its own known placeholder. In the light of this one might be forgiven for querying why FASTCALL was chosen if the parameters end up on the stack anyway. They might have well been put there by the caller in the first place!

    To deal with the requirement to record the parameters sent in the registers, when you use FRAME...ENDF in 64-bit code, GoAsm creates instructions like this at the beginning of the stack frame:-

    MOV [RSP+8h],RCX
    MOV [RSP+10h],RDX
    MOV [RSP+18h],R8
    MOV [RSP+20h],R9
    PUSH RBP
    MOV RBP,RSP
    
    This code puts the parameters in their placeholders on the stack. If there are fewer than four parameters not all these instructions are emitted. Note that parameter five (if present) is already on the stack at [RSP+28h], parameter six is at [RSP+30h] etc. The final two instructions set up RBP as a pointer to the data, having saved RBP first so it can be restored later.

    In the epilogue you would expect to see something like:-

    LEA RSP,[RBP]
    POP RBP
    RET
    
    The LEA instruction here is used instead of the simpler MOV RSP,RBP to help the Windows exception handler to identify the epilogue.

    Provide space on the stack for local data

    This works in exactly the same way as in a 32-bit stack frame except that each local data item must be at least a qword in size. So for example if there were three local data qwords, the instruction to make space for it would be SUB RSP,18h.

    Preserve for Windows the non-volatile registers

    RBP is saved and restored already by the prologue and epilogue code. As for the general purpose registers RBX,RDI,RSI and R12 to R15, if they are altered by the window procedure, they will need to be restored in value. The easiest way to do this is to use the USES statement. This keeps them on the stack. The XMM6 to XMM15 registers can be saved and restored en bloc, using the FXSAVE and FXRSTOR instructions.

    Taking a typical use of FRAME...ENDF as follows:-

    WndProc FRAME hwnd,uMsg,wParam,lParam
    USES RBX,RSI,RDI
    LOCALS hDC,BUFFER[256]:B
    ;               ;POINT 'X'
    ;               ;code goes here
    ;
    RET
    ENDF
    
    Here is how the stackframe turns out in 64-bit assembly, using the RSP and RBP values at the start of code proper at POINT 'X' (note that RBP is 32 bytes less than RSP was when entering the procedure: this is because of the PUSH RBP,RBX,RSI and RDI instructions):-

    rbp-118h next push goes here
    rbp-110h ) 256 byte buffer ← rsp is currently here (top of local data)
      )
      )
    rbp-8h hDC
    rbp saved value of rdi ← rbp given value of rsp when it was here
    rbp+8h saved value of rsi
    rbp+10h saved value of rbx
    rbp+18h saved value of rbp
    rbp+20h caller's return address
    rbp+28h place holding hwnd (originally in rcx)
    rbp+30h place holding uMsg (originally in rdx)
    rbp+38h place holding wParam (originally in r8)
    rbp+40h place holding lParam (originally in r9)
    rbp+48h param #5 if present
    rbp+50h param #6 if present

    Automated stack frames using FRAME...ENDF, LOCALS and USEDATAtop


    Introduction
          The basics
          Accessing the parameters and local data within an automated frame
          Using structures as local data in a stack frame

    Practice


          Some practical considerations
          Calling procedures within the stack frame - use of RETN
          Calling procedures outside the stack frame - use of USEDATA

    Advanced use


          Declaring message-specific local data - positioning the LOCAL statement
          Creating a minimised window procedure
          Locally defined words using #localdef or LOCALEQU
          Re-usable label scope in automated stack frames
          Inheritance and scope when using USEDATA..ENDU
          Releasing local data and making new local data - use of LOCALFREE

    General


          Some syntax points when using FRAME...ENDF
          Some syntax points when using USEDATA...ENDU
          What you can see in the debugger.

    The basicssub-menu

    FRAME...ENDF is similar to MASM's PROC...ENDP, but with GoAsm you can do a lot more. Subroutines can also use data on the stack using USEDATA...ENDU. And you can declare local data dynamically. This permits you within the window procedure to declare only the local data which is actually required for a particular message. You can use locally defined words which only operate within their own FRAME..ENDF envelopes or USEDATA..ENDU areas associated with them.
    Also in GoAsm the syntax is more relaxed and the source script is much easier to understand because there is no type or parameter checking.

    You use FRAME...ENDF to make an automated stack frame. Here is how you would use it:-

    WndProc FRAME hwnd,uMsg,wParam,lParam
    USES EBX,ESI,EDI
    LOCALS hDC,BUFFER[256]:B
    ;
    ;               ;code goes here
    ;
    RET
    ENDF
    
    When you use FRAME..ENDF in this way GoAsm creates a stack frame "behind your back". For this reason it may be mistrusted a little. Assembler programmers like to know what is going on, which is why they are assembler programmers in the first place! So I'm going to describe this in some detail. You don't need to know all these details.
    If you want you can skim over these details and see how FRAME...ENDF works in practice from the Hello World 3 GoAsm sample program.

    In the code above, FRAME tells GoAsm that you want to make an automated stack frame.
    ENDF signifies the end of the automated stack frame.
    The words after "FRAME" are the parameters. In this case there are four parameters which are being given the names shown here. There is no need to add anything else since GoAsm knows the size of the parameters. In 32-bit coding they are always dwords; in 64-bit coding they are always qwords.
    USES tells GoAsm which registers need to be preserved in the frame. Here we use the 32-bit registers, but in 64-bit assembly this is read as USES RBX,RSI,RDI without having to change the source code (in the case of PUSH register its the same opcode which is generated for each platform).
    LOCALS permits you to declare and label local data in the frame. GoAsm adds up the size of this local data and sets aside space for it on the stack. When declaring local data, in 32-bit assembly dword is the default, and in 64-bit assembly qword is the default. The default is used if you don't give a size for the data. So in the example, hDC is a dword. There is also an area on the stack called BUFFER. This is 256 bytes because of the [256]:B. Instead of B you can use W,D,Q or T for words, dwords, qwords or twords respectively or you can use the name of a structure see using structures as local data in a stack frame.

    GoAsm automatically creates the prologue code as described in the 32-bit or 64-bit sections above.
    GoAsm will add the epilogue code each time it sees a RET within the frame delineated by FRAME...ENDF.

    Accessing the parameters and local data within an automated framesub-menu

    Within a frame made in this way you can access the parameters send to the window procedure. In the following 32-bit example the offsets from EBP which are generated by GoAsm are given on the assumption there is no USES statement (64-bit coding is very similar but it uses RBP and each stack element is 8 bytes instead of 4):-
    PUSH [hwnd]          ;equivalent to PUSH [EBP+8h]
    MOV EAX,[uMsg]       ;equivalent to MOV EAX,[EBP+0Ch]
    MOV EBX,ADDR wParam  ;equivalent to LEA EBX,[EBP+10h]
    PUSH ADDR lParam     ;equivalent to PUSH EBP then ADD D[ESP],14h
    MOV EBX,[hDC]        ;equivalent to MOV EBX,[EBP-10h]
    MOV EBX,ADDR BUFFER  ;equivalent to LEA EBX,[EBP-110h]
    
    In this coding GoAsm finds on the stack the correct position of the label you are accessing and codes it appropriately for you. If you use the /l option on the command line you can see this in the list file output. Alternatively you can see this in the debugger.

    Note that the address of the buffer is given at its most negative point. This is correct, so if you code:-

    MOV D[BUFFER+10h],44h
    
    You are inserting the value 44h at a dword which is 16 bytes into the buffer.

    Note that GoAsm sets the value of EBP after PUSHing the registers named in the USES statement. This arrangement permits the amount of local data to be adjusted dynamically on a message-specific basis. But it also means that if you have a USES statement in a FRAME the offset of the parameters from EBP will be greater than otherwise. So, for example if you PUSH three registers in a FRAME with the USES statement as follows:-

    USES EBX,EDI,ESI
    
    then EBP will be pushed further away from the parameters by 12 bytes. So in this example hwnd would be at [EBP+14h], uMsg at [EBP+18h], wParam at [EBP+1Ch] and lParam at [EBP+20h]. When coding you need not worry about the exact position of the parameters relative to EBP, but you will need to be aware of this when looking at your code in the debugger. See also what you can see in the debugger.

    Using structures as local data in a stack framesub-menu

    In this example, local data of a size to suit the structure RECT, previously declared in your source script, is established on the stack.
    RECT STRUCT
      left   DD 0
      top    DD 0
      right  DD 0
      bottom DD 0
    ENDS
    ;
    WndProc FRAME hwnd,uMsg,wParam,lParam
    LOCALS hDC,rc1:RECT
    ;
    ;               ;code goes here
    ;
    RET
    ENDF
    
    Each element of the RECT structure can be accessed in the same way as if it were in static data for example (again using 32-bit examples):-
    MOV EAX,[rc1.right]        ;equivalent to MOV EAX,[EBP-0Ch]
    MOV EAX,[ESI+RECT.right]   ;equivalent to MOV EAX,[ESI+8h]
    MOV EAX,SIZEOF RECT        ;equivalent to MOV EAX,10h
    MOV EAX,ADDR rc1.right     ;equivalent to LEA EAX,[EBP-0Ch]
    PUSH [rc1.right]           ;equivalent to PUSH [EBP-0Ch]
    PUSH ADDR rc1.right        ;equivalent to PUSH EBP then ADD D[ESP],-0Ch
    PUSH ADDR rc1              ;equivalent to PUSH EBP then ADD D[ESP],-14h
    

    Some practical considerationssub-menu

    Exactly how you want to use the facility provided by FRAME...ENDF will be a matter of personal taste.
    You may want to enclose all your window procedure code within the FRAME...ENDF frame. In that case you will need to ensure that sub-routines use a "normal" RET by using RETN.
    You may want to keep the FRAME...ENDF frame as compact as possible but yet access the parameters and local data from outside it. You can do this by specifying a USEDATA...ENDU area.
    You may want to declare message-specific local data rather than for the stack FRAME as a whole. You can do this by positioning the LOCAL statement.
    You may want to combine these methods with a message table to create a minimised window procedure.
    You may want to release local data areas and then make new local data areas. You can do this by using the LOCALFREE statement.

    Calling procedures within the stack frame - use of RETNsub-menu

    You can have as many return points from the FRAME procedure using RET as you like. Each one will produce the epilogue code which is run when leaving the stack frame. However, this also means that if you have additional procedures within the stack frame you must be careful to use RETN ("normal" RET) to avoid creating epilogue code for those.
    Only the main FRAME procedure ought to be called from outside the FRAME. Calling any additional procedures can cause unpredictable results. This is because procedures within the FRAME...ENDF envelope expect to address parameters and local data using the stack pointer (RBP or EBP) and this will not have been set up if called from outside.
    Here is an example in practice:-
    WndProc FRAME hwnd,uMsg,wParam,lParam
    USES EBX,ESI,EDI
    LOCAL hDC,BUFFER[256]:B
    MOV EAX,[uMsg]        ;get message sent by Windows
    CMP EAX,0Fh           ;see if it is WM_PAINT
    JNZ >L2               ;no
    CALL WINDOW_PAINT     ;paint the window
    XOR EAX,EAX           ;return zero to show message dealt with
    RET                   ;restore stack and return to Windows
    ;
    L2:
    ARG [lParam],[wParam],[uMsg],[hwnd]
    INVOKE DefWindowProcA ;allow Windows to deal with the message
    RET                   ;restore stack and return to Windows
    ;
    WINDOW_PAINT:
    ;  code to paint the window
    RETN                  ;do ordinary return from paint procedure
    ;
    ENDF                  ;stop all FRAME action from this point onwards
    

    Calling procedures outside the stack frame - use of USEDATA...ENDUsub-menu

    You may prefer to keep the frame itself compact and call outside the frame to help the readability of your code. You can do this and still maintain access to the parameters and local data within the frame by using the USEDATA statement followed by the name of the frame concerned. For example, the WM_PAINT message in the frame above might call this procedure:-
    PAINT:
    USEDATA WndProc
    INVOKE BeginPaint, [hwnd],ADDR lpPaint     ;get in eax the DC to use
    MOV [hDC],EAX
    INVOKE Ellipse, [hDC],[lpPaint.rcPaint.left],   \
                          [lpPaint.rcPaint.top] ,   \
                          [lpPaint.rcPaint.right],  \
                          [lpPaint.rcPaint.bottom]
    INVOKE EndPaint, [hwnd],ADDR lpPaint
    XOR EAX,EAX
    RET
    ENDU
    
    Here the procedure PAINT is using local data in the FRAME called WndProc. All the code above is outside the FRAME...ENDF envelope.
    You can also use USEDATA to access local data in other USEDATA..ENDU areas.
    Just make sure that USEDATA is used later in the source script than any parameters and local data declarations that it relies on. This is because GoAsm is a one pass assembler and it needs to find the position of that earlier data.
    If a procedure which is called from a FRAME or a USEDATA area does not need to access any of the parameters, local data, or locally defined words, then it need not have its own USEDATA statement.

    More than one exit point or procedure within a USEDATA...ENDU area

    Just as when using FRAME...ENDF, you can have as many return points from the USEDATA procedure using RET as you like. Each one will produce the correct epilogue code which is run when leaving the USEDATA area. However, this also means that if you have additional procedures within the USEDATA area you must be careful to use RETN ("normal" RET) to avoid creating epilogue code for those.
    Only the first USEDATA procedure ought to be called from outside the USEDATA..ENDU area. This is because the correct code to access the stack will only be set up for this first procedure.

    Declaring message-specific local data - positioning the LOCAL statementsub-menu

    In the examples so far the area of local data held on the stack has been declared for the stack FRAME overall. But you may prefer to establish some or all of the local data on a message-specific basis. Here is an example of how to do this:-
    WndProc FRAME hwnd,uMsg,wParam,lParam
    USES EBX,ESI,EDI
    LOCAL hDC             ;declare hDC for frame-wide use
    MOV EAX,[uMsg]        ;get message sent by Windows
    CMP EAX,0Fh           ;see if it is WM_PAINT
    JNZ >L2               ;no
    CALL WINDOW_PAINT     ;paint the window
    XOR EAX,EAX           ;return zero to show message dealt with
    RET                   ;restore stack and return to Windows
    ;
    L2:
    ARG [lParam],[wParam],[uMsg],[hwnd]
    INVOKE DefWindowProcA ;allow Windows to deal with the message
    RET                   ;restore stack and return to Windows
    ;
    ENDF                  ;stop all FRAME action from this point onwards
    ;
    WINDOW_PAINT:
    USEDATA WndProc       ;use parameters and local data from WndProc
    LOCAL ps:PAINTSTRUCT  ;make local data areas
    LOCAL BUFFER[1024]:B  ;specifically for this message
    ;
    ARG ADDR ps,[hwnd]
    INVOKE BeginPaint     ;get ready to paint window
    MOV [hDC],EAX         ;keep the device context in hDC local data
    ;  code to paint the window
    RET                   ;do ordinary return from paint procedure
    ENDU                  ;end use of WndProc frame data
    

    Creating a minimised window proceduresub-menu

    Using the methods described you can create a minimised window procedure using a message table. The window procedure itself needs to be no larger than this:-
    WndProc FRAME hwnd,uMsg,wParam,lParam
    MOV EAX,[uMsg]
    MOV ECX,SIZEOF MESSAGES/8
    MOV EDX,OFFSET MESSAGES
    :
    DEC ECX
    JS >.notfound
    CMP [EDX+ECX*8],EAX     ;see if its the correct message
    JNZ <                   ;no
    CALL [EDX+ECX*8+4]      ;call the correct procedure for the message
    JNC >.exit
    .notfound
    INVOKE DefWindowProcA,[hwnd],[uMsg],[wParam],[lParam]
    .exit
    RET
    ENDF
    
    Somewhere in the data or const section would be the following table for the messages (in practice there would be a lot more messages than this):-
    MESSAGES DD 1h,  CREATE     ;the message value then the code address
             DD 2h,  DESTROY
             DD 0Fh, PAINT
    NextLabel:
    
    And then in the code section (and below WndProc) you would have the code for these messages, for example:-
    CREATE:
    USEDATA WndProc    ;use stack data in the window procedure frame
    USES EBX,EDI,ESI   ;preserve the registers for Windows
    LOCALS LocalData   ;establish required local data area
    ;
    ; code to execute on the WM_CREATE message
    ;
    XOR EAX,EAX        ;return nc and eax=0 to continue creating the window
    RET                ;restore the registers and then RET
    ENDU               ;stop all automated action and access to data
    
    In the minimised window procedure DefWindowProc is not called unless either the message is not found in the message table or the message code returns with the carry flag set. Some messages must call DefWindowProc even if they are processed by the window procedure - see the Windows SDK.

    Locally defined words using #localdef or LOCALEQUsub-menu

    Within a FRAME..ENDF envelope you can use locally defined words. The definition can be made either in the FRAME..ENDF envelope itself or in an associated USEDATA..ENDU area.
    Their scope is limited to the FRAME or USEDATA area. See inheritance and scope when using USEDATA..ENDU for more details about how this works in practice.
    You define such local words using #localdef (or LOCALEQU if you prefer - they do the same thing).

    For example:-

    FrameProc1 FRAME Param
    #localdef THING1 23h
    THING2 LOCALEQU 88h
    ;
    MOV EAX,THING1       ;local define 23h
    MOV EAX,THING2       ;local define 88h
    ;
    RET
    ENDF
    ;
    MyFunction44: USEDATA FrameProc1
    #localdef THING3 0CCh
    ;
    MOV EAX,THING1       ;local define should be 23h
    MOV EAX,THING2       ;local define should be 88h
    ;
    RET
    ENDU
    
    In the above example, if THING1 and THING2 are defined globally (using #define or EQU), that definition is ignored (the local definition takes priority).

    #undef has local scope priority. If the word to be undefined is found locally, then #undef applies to that. If not, #undef will apply to a global label.

    Re-usable label scope in automated stack framessub-menu

    Re-usable labels beginning with a period can be accessed anywhere within an automated stack frame (which can be established using FRAME...ENDF). Other unique labels within the frame are ignored for this purpose, so for example,
    ExampleProc FRAME Param
    CMP EDX,EAX
    JZ >.fin
    LABEL1:
    XOR EAX,EAX
    .fin
    RET
    ENDF
    LABEL2:
    
    Here the jump to .fin will still work despite the existence of LABEL1. This is because the label .fin has the scope of the whole frame, and not just the code between LABEL1 and LABEL2. In other words using FRAME...ENDF enlarges the scope of the re-usable label to the whole frame.

    Inheritance and scope when using USEDATA..ENDUsub-menu

    A USEDATA..ENDU area can be associated either directly with a FRAME, or alternatively with another USEDATA..ENDU area.
    This enables advanced users to select the local data and defined words which a USEDATA area can use.

    The usual arrangement is to have each USEDATA area a child of the FRAME:-

    FrameExample FRAME Param
    LOCAL LocalLabel1
    #localdef CONSTANT 23h
    ;
    RET
    ENDF
    ;
    Usedata#1: USEDATA FrameExample
    MOV EAX,[LocalLabel1]
    MOV EAX,CONSTANT
    MOV EAX,[Param]
    RET
    ENDU
    ;
    Usedata#2: USEDATA FrameExample
    LOCAL Specific
    #localdef SPECIFIC_CONSTANT 444444h
    MOV EAX,[LocalLabel1]
    MOV EAX,CONSTANT
    MOV EAX,[Param]
    MOV EAX,[Specific]
    MOV EAX,SPECIFIC_CONSTANT
    RET
    ENDU
    
    Here each USEDATA area can access the FRAME's parameters, local data and defined words. Note, however, that Usedata#2 has its own local data and defined word. Those can only be referenced within Usedata#2. If Usedata#1 tried to access them (or the code in FrameExample for that matter), they would not be found.

    In the next arrangement, the first USEDATA area is a child of the FRAME and the second USEDATA area is its grandchild.

    FrameExample FRAME Param
    LOCAL LocalLabel1
    #localdef CONSTANT 23h
    ;
    RET
    ENDF
    ;
    Usedata#1: USEDATA FrameExample
    LOCAL Specific
    #localdef SPECIFIC_CONSTANT 444444h
    RET
    ENDU
    ;
    Usedata#2: USEDATA Usedata#1
    MOV EAX,[LocalLabel1]
    MOV EAX,CONSTANT
    MOV EAX,[Param]
    MOV EAX,[Specific]
    MOV EAX,SPECIFIC_CONSTANT
    RET
    ENDU
    
    Here, although each USEDATA area can access the frame's parameters, local data and defined words, Usedata#2 can also reference the local data and locally defined words in Usedata#1.

    Releasing local data and making new local data - use of LOCALFREEsub-menu

    LOCALFREE is only available for 32-bit programs and is not allowed with x86 or x64 modes due to the prologue stack usage information recorded for exception handling.

    You can use LOCALFREE to release areas of local data ready to make new local data. This may help to conserve memory if you use the stack a lot. LOCALFREE will render existing local data declared in the FRAME or USEDATA...ENDU area in which it appears inaccessible to all later code in your source script. It will not affect local data in other FRAMES or USEDATA areas. When GoAsm sees LOCALFREE in the source script it causes the value of ESP/RSP to be restored to its value in the current FRAME or usedata area before any local data was declared. You can then declare new local data using LOCAL or LOCALS.
    Only use LOCALFREE when the stack is in equilibrium. Do not use it if there are any outstanding PUSHes which need to be POPped. This is because the change to ESP/RSP will effectively rub over any outstanding PUSHes.
    At the end of a procedure you do not need to use LOCALFREE since the stack is restored automatically on a RET anyway.
    Here is an example of how to use LOCALFREE:-

    CREATE:
    USEDATA WndProc    ;use stack data in the window procedure frame
    USES RBX,RDI,RSI   ;preserve the registers for Windows
    LOCALS BUFFER[4000]:B    ;establish large buffer on the stack
    ;
    ; part code to execute on the WM_CREATE message
    ;
    LOCALFREE                ;rub out large buffer
    LOCALS BUFFER[256]:B     ;establish smaller buffer on the stack
    ;
    ; part code to execute on the WM_CREATE message
    ;
    XOR RAX,RAX        ;return nc and rax=0 to continue creating the window
    RET                ;restore the registers and then RET
    ENDU               ;stop all automated action and access to data
    

    Some syntax points when using FRAME...ENDFsub-menu

  • The syntax for FRAME...ENDF is as follows, with variations mentioned below:-
    CodeLabel:
    	FRAME Parameter List	;if any parameters are needed
    	USES Register List	;if registers need to be saved
    	LOCAL Local List	;if local variables are required
    	;
    	ret
    	ENDF
    

  • A FRAME statement must be preceded by a label, either immediately before it or on the line before. This is the "frame name".
  • Only one FRAME statement per frame is allowed.
  • All the parameters must be immediately after the FRAME statement, separated by commas. To continue on the next line use the continuation character "\".
  • You can automatically save and restore registers within a frame with the USES statement. ENDF stops the USES action.
  • Jumps can only be within the FRAME itself. This is because a FRAME has its own unique epilogue code which must be implemented.
  • Calls can go outside the FRAME. If you need to access the FRAME's parameters local data or defined words, use USEDATA.
  • If you call a function within the same FRAME, that function should use RETN ("normal" RET) instead of RET. This stops the epilogue code being generated when leaving the function.
  • Local data should be declared using one or more LOCAL statements. After the first following code instruction you won't be able to use LOCAL again unless you have freed existing local data using the LOCALFREE statement.
  • LOCALFREE must be followed by a LOCAL or LOCALS statement.
  • You can only use the LOCALFREE statement if the stack is in equilibrium (no outstanding PUSHes to be POPped).
  • LOCALFREE is not allowed in x86 or x64 modes.
  • You can use LOCALS instead of LOCAL if you prefer.
  • You cannot have a USEDATA statement inside the frame.
  • Locally scoped labels (beginning with a period) will work anywhere within the FRAME...ENDF envelope. Their scope boundary is the FRAME and the ENDF statements themselves, not any other labels which happen to be within the frame.
  • Close the frame using ENDF (or ENDFRAME if you prefer), and optionally the frame name can be in front of this statement.
  • Since GoAsm is a one-pass assembler, local data must be declared in the source script before it is used.
  • Since GoAsm relies on EBP/RBP within a frame to access the parameters and local data on the stack, do not change this within the frame or any procedures called by the frame unless such access is not required within the procedure. If EBP/RBP is changed always restore its value afterwards.
  • One frame can call another frame and thereby pass parameters to it on the stack, but since EBP/RBP will be changed in this process the original stack data will not be accessible in the called frame.

    Some syntax points when using USEDATA...ENDUsub-menu

  • The syntax for USEDATA...ENDU is as follows, with variations mentioned below:-
    CodeLabel:
    	USEDATA SourceData
    	USES Register List	;if registers need to be saved
    	LOCAL Local List	;if local variables are required
    	;
    	ret
    	ENDU
    

  • A USEDATA statement must be preceded by a label, either immediately before it or on the line before. This is the "USEDATA name".
  • SourceData can be a frame name or the name of a USEDATA procedure.
  • If SourceData is a frame name, then the parameters and local data established in the frame will be accessible within the USEDATA...ENDU area.
  • If SourceData is the name of a USEDATA procedure, then all parameters local data and defined words which were accessible within that procedure can be accessed.
  • You can automatically save and restore registers within a usedata area with the USES statement. ENDU stops the USES action.
  • Jumps can only be within the USEDATA area itself. This is because a USEDATA area has its own unique epilogue code which must be implemented.
  • Calls can go outside the USEDATA area. If you need to access the USEDATA area's parameters local data or defined words, make another USEDATA..ENDU area.
  • If you call a function within the same USEDATA area, that function should use RETN ("normal" RET) instead of RET. This stops the epilogue code being generated when leaving the function.
  • Local data should be declared using one or more LOCAL statements. After the first following code instruction you won't be able to use LOCAL again unless you have freed existing local data using the LOCALFREE statement.
  • LOCALFREE must be followed by a LOCAL or LOCALS statement.
  • You can only use the LOCALFREE statement if the stack is in equilibrium (no outstanding PUSHes to be POPped).
  • LOCALFREE is not allowed in x86 or x64 modes.
  • You can use LOCALS instead of LOCAL if you prefer.
  • One USEDATA procedure can call another USEDATA procedure without loss of data since EBP/RBP is not changed.
  • Locally scoped labels (beginning with a period) work normally within usedata areas. Code labels provide their scope boundary.
  • Instead of using ENDU to close the usedata area, you can use ENDUSEDATA if you prefer. Optionally the usedata name can be in front of this statement.
  • Since GoAsm is a one-pass assembler, local data must be declared in the source script before it is used.
  • Since GoAsm relies on EBP/RBP within a usedata area to access the parameters and local data on the stack, do not change this within the usedata area or any procedures called by the code within the usedata area unless such access is not required within the procedure. If EBP/RBP is changed always restore its value afterwards.

    What you can see in the debuggersub-menu

    In order to establish and use the automated stack frames, GoAsm generates extra code. When looking at your code in the debugger this extra code may be confusing and obscure the code you are looking for. One way to see what GoAsm has inserted is to look at the list file produced on assembly (option /l in the command line).
    Here is a brief description of some extra lines of code you will see.

    In 32-bit FRAMEs, GoAsm PUSHes EBP and registers specified by USES first, then keeps the value of the stack pointer ESP in EBP using MOV EBP,ESP. On a RET this is reversed, so you will see MOV ESP,EBP followed one or more register POPs. Space is made for the local data using SUB ESP,x where x depends on the amount of space for local data required.
    In 64-bit FRAMEs, GoAsm's first job is to store parameters #1 to #4 on the stack using an instruction like MOV [RSP+8h],RCX as described earlier. Then after PUSHing RBP and registers specified by USES, MOV RBP,RSP is used to keep the stack pointer. Coming out of the FRAME LEA RSP,[RBP] is used to restore RSP ready to POP the registers and a RET.

    Code such as MOV EAX,[EBP-34h] or LEA,[EBP-56h] or PUSH EBP, ADD D[ESP],-60h (or their 64-bit register equivalents) will be generated when local data is accessed. The offset values will be positive when accessing parameters.

    In USEDATA areas, since GoAsm does not know at assemble-time how much use of the stack there has been before the call to the USEDATA procedure, it makes a "shield" of 100h bytes (200h bytes in 64-bit assembly) to ensure that such stack use is protected from over-write. For this reason the offset numbers used when local data is accessed may be larger than expected.
    Advanced users may like to adjust the size of the shield. This can be done using this syntax, for example:-

    USEDATA WndProc SHIELDSIZE:20h      ;in 32-bit assembly
    USEDATA WndProc SHIELDSIZE:40h      ;in 64-bit assembly
    
    This sets the shield to only 8 push values (8 dwords in 32-bit assembly, 8 qwords in 64-bit assembly), which would be suitable if you were sure that at run-time there would never be more than seven pushes and one call prior to the local data declaration in the USEDATA procedure. (Remember you must count all PUSHes before the CALL, the CALL itself and any sub-CALLs, and also all PUSHes caused by the USES statement within the USEDATA procedure). Once SHIELDSIZE is set it remains at that value for the remainder of the source script until changed.

    In USEDATA areas the value of the stack pointer is not kept in the EBP or RBP register because this already holds the stack pointer on entry into the FRAME. Instead, GoAsm keeps the value of the stack pointer at a convenient place on the stack. This is done when the first local data in the USEDATA area is declared. In order to do this safely GoAsm adds several lines of code culminating in MOV EAX,[EAX-4h] (or MOV RAX,[RAX-8h] in 64-bit assembly). GoAsm uses EAX/RAX during this process but restores its value afterwards, so you can still use it to pass information to the procedure. On a RET you will see the value of ESP/RSP being restored using POP ESP/RSP.

    LOCALFREE will also cause a restoration of the stack pointer.

    USES statements will cause PUSHes of the registers concerned before ESP/RSP is saved and POPs of the registers after it is restored.
     


    Conditional assemblytop

    What is conditional assembly and why use it?

    Using conditional assembly you can select at assemble-time which part of your source script you want to have assembled. This may be useful if, for example, you want to make different versions of your program from the same source script.

    The conditional directives

    GoAsm simply uses the "C" syntax which is based on the #if, #ifdef, #else, #elif (or #elseif) and #endif directives. The syntax of the basic structure of a conditional directive in its simplest form is as follows:-

    #if condition
    text A
    #endif

    Here if the condition is TRUE text A will be assembled. If, however, the condition is FALSE, the assembler will jump over text A and will continue compiling from the #endif.

    You can add something to do if the condition is FALSE as follows:-

    #if condition
    text A
    #else
    text B
    #endif

    Here if the condition is TRUE, text A will be assembled, but text B will not be assembled.

    If, however, the condition is FALSE, text A will be jumped over but text B will be assembled.

    The #endif indicates the end of the conditional frame, so that all text after that will be assembled.

    The #else statement must always be next before the #endif.

    You can add a further condition to the frame:-

    #if condition1
    text A
    #elif condition2
    text B
    #endif

    Here if condition1 is TRUE, text A will be assembled, but text B will be jumped over and assembly will continue from the #endif. If, however, condition1 is FALSE, text A will be jumped over to the #elif when condition2 will be tested. If then condition2 is TRUE, text B will be assembled. "#elif" is the same as "#elseif".

    Adding the #else to the above conditional frame produces:-
    #if condition1
    text A
    #elif condition2
    text B
    #else
    text C
    #endif

    Here if condition1 is TRUE, text A will be assembled, but text B and text C will be jumped over and assembly will continue from the #endif. If, however, condition1 is FALSE, text A will be jumped over to the #elif when condition2 will be tested. If then condition2 is TRUE, text B will be assembled, and text C will be ignored; if, however condition2 is FALSE text B will be jumped over to the #else and text C will be assembled.

    You can have as many #elifs as you like in each conditional frame, but there can only be one #else per frame, and each #if must have a corresponding #endif. Some programmers nest the conditional frames, but this can become very confusing and may not be good programming practice. If this is done it is recommended that you label each #endif with a comment so that you can see to which #if it refers.

    Types of #if statements

    #ifdef identifier

    where identifier is a word which may be defined in the source script or in an include file. This statement returns TRUE if the identifier is defined and FALSE if it is not defined. identifier must be a word and not a number nor a quoted string.
    #ifndef identifier as above but this statement returns FALSE if the identifier is defined, and TRUE if it is not defined.
    #if expression where expression can be a number, with the statement returning FALSE for 0, and TRUE for non-zero values.

    where expression can be an identifier which evaluates to a number.

    where expression can be the defined operator used as follows:-
    defined identifier
    defined(identifier)
    which returns TRUE (1) if the identifier is defined, and FALSE (0) if it is not defined, similar to #ifdef.

    The ! operator can be used in front of these simple expressions to reverse the result of the condition, so that for example:-
    #if !0 would return TRUE, and
    #if !defined identifier would return FALSE (0) if the identifier is defined, and TRUE (1) if it is not defined, similar to #ifndef.

    where expression can be more complex:-
    identifier relational-operator value
    identifier must be a word which is defined elsewhere in the file, in an include file or in the command line. It cannot be a number.

    the relational operator can be one of the following:-
    >= greater than or equals
    <= less than or equals
    == equals
    = equals
    != not equal
    > greater than
    < less than

    value can be a number or a word which is defined elsewhere in the file, in an include file or in the command line, which evaluates to a number.

    where expression can be more complex combining two or more expressions with the && conditional-AND operator or the || conditional-OR operator:-
    expression1 && expression2
    expression1 || expression2
    where expression1 and expression2 are one of the previous types of expressions.

    The && statement returns TRUE if both expression1 and expression2 are TRUE.
    If expression1 is FALSE, then expression2 is not evaluated.

    The || statement returns TRUE if either expression1 or expression2 is TRUE.
    If expression1 is TRUE, then expression2 is not evaluated.

    Note that for multiple expressions with multiple conditional operators the evaluation of the statement is currently done with simple processing from left to right. Normally && has precedence over ||, so if you place those conditional expressions with && first you should get similar results.

    Examples of conditional assembly

    #define HELLO
    ;
    #ifdef HELLO
    BSWAP EAX          ;swap the bytes in eax if HELLO is defined
    #endif
    
    #if HELLO==3
    OUTPUT DD 3h       ;if HELLO is defined as 3 declare data label OUTPUT as 3
    #elif WINVER>=400h
    OUTPUT DD 4h       ;alternative data if WINVER is equal to or greater than 400h
    #else
    OUTPUT DD 5h       ;alterative data if neither of the above apply
    #endif
    
    You can define a word in the command line in order to trigger the correct parts of your source script for assembly, for example
    GoASM /l /d WINVER=401h MyProg.ASM
    
    or simply
    GoASM /l /d VERSIONA MyProg.ASM
    
    Means that the word VERSIONA will be defined, and it can be tested using #ifdef.

    See also:-
    conditional assembly in macros
    conditional assembly in structures
     


    Include files - using #include and INCBINtop

    Using include files containing assembler code or merely structures and definitions can be another good way to obscure your source script and make it difficult to read and follow. This is because it takes time for the reader to refer to file number 2 to understand file number 1. Nevertheless include files containing Windows structures and definitions are popular and GoAsm does give full support for include files.

    As far as GoAsm is concerned there are two types of include file as follows:-

    Type Effect
    Files with an "a" or "A" extension, for example MyInclude.asm, or simply MyInclude.a With this type of file, at the time the include is declared in your source script assembly is diverted into the include file.
    And if you are making a list file using the /l option, a full list of the contents of the file will be made.
    Do not use this type of file if your include file contains only definitions, structures and the like. This will slow GoAsm down unnecessarily, because it will look for mnemonics and assembler instructions in the file.
    Files without an "a" or "A" extension, for example MyInclude.inc, or simply MyInclude With this type of file, no assembly is carried out in the include file and only the definitions and structures in the include file are examined and recorded.
    And if you are making a list file using the /l option, the contents of the file will not be listed.
    Use this extension if your include file contains only definitions, structures and the like (commonly called header files). GoAsm will make a record of all these in case they are referred to later in the main source script. For this reason a large include file will slow GoAsm down.
    Normally GoAsm does not permit any other programs to open include files without an "a" or "A" extension that it has opened. This is to help error checking. But if you want to allow this for example to permit the same header files to be available in a parallel compilation environment, you can specify the /sh ("share header" files) switch in the command line.

    Syntax for #include

    #include path\filename

    path\filename can be either:-

    a quoted string
    a non-quoted string
    a string in less-than and greater-than characters
    GoAsm will look for the file using the path specified. If no path is specified it will look in the current directory. If the file is not found it will look for it in the path given by the INCLUDE environment string. You can set the INCLUDE environment string using the DOS SET command in the MS-DOS (command prompt) window, or by calling the SetEnvironmentVariable API. You can also use control panel, system, advanced, environment if your operating system will permit this, followed by a reboot. Note: there may be a different environment string for each folder or sub-folder. Ensure the environment string you wish to use is in the current directory.
    You can nest #include files but it is good practice to avoid this as far as possible.

    Loading a file using INCBIN

    INCBIN allows you to load chunks of material from a file directly into a data or code section without further processing. You can choose how many bytes to jump over from the start of the file and/or how many to load from the file. Here are examples of how to use INCBIN:-
    DATA SECTION
    BULKDATA INCBIN MyFile.txt        ;load the whole of MyFile.txt into data section with label BULKDATA
        INCBIN MyFile.txt, 100        ;miss out the first 100 bytes but load rest of file
        INCBIN MyFile.txt, 100, 300   ;miss out the first 100 bytes but load 300 bytes
    
    See also inserting blocks of data.

    Merging: using static code librariestop

    What are static code libraries?

    Static code libraries are files with the ".lib" extension containing one or more COFF object files. The object files contain code and data for ready-made functions. The material inside the library file is in binary format (machine code) not source code. The library file contains an index with a list of the functions and the code and data labels they use. Static code libraries should be distinguished from dynamically linked libraries (DLLs) and from import libraries which merely contain a list of functions exported by DLLs.

    How to use static code libraries

    In GoAsm, you can use the ready-made code and data in static code libraries by just calling the required function in your source script, giving the name of the library containing the function, for example
    CALL zlibstat.lib:compress
    
    You can also use equates to shorten the call for example:-
    LIB1=c:\prog\libs\zlibstat.lib
    CALL LIB1:compress
    
    Examples using INVOKE are:-
    INVOKE zlibstat.lib:compress,[pCompHeap],ADDR ComprSize,[pHeap],[DataSize]
    INVOKE LIB1:compress,[pCompHeap],ADDR ComprSize,[pHeap],[DataSize]
    
    If your path contains spaces you must put the path and filename in quotes.

    What happens when you call a library function

    The above coding causes GoAsm to load the associated code and data and merge it with GoAsm's output file (object file) during assembly. You can then send the output file to the linker in the usual way, but the linker is not interested in the source of the code and data (the library file) at all: all the code and data associated with the function has already been loaded by GoAsm. The only additional work which you may need to do at link-time is to ensure that the linker is aware of any additional DLLs needed by the ready-made functions. For example, a ready-made function may call a Windows API in OLEAUT32.dll. In order to avoid a "symbol not found" error you need to add that DLL to GoLink's command line. If you are using another linker you would add the import library for OLEAUT32.dll to the list of import libraries given to the linker.

    Not the Microsoft method

    GoAsm's method of using static code libraries is quite different from the method used by the Microsoft tools. The MS linker adds code and data at link-time. If you prefer you can still use that method with GoAsm. To do so you need to feed the GoAsm output files to the Microsoft linker and tell the linker to look for the functions in the appropriate static code library.
    See using GoAsm with various linkers.

    How GoAsm finds the correct lib file

    If the path of the lib file is not given in the call, GoAsm will look in the current directory and in the path given in the LIB environment. You can set the LIB environment string using the DOS SET command in the MS-DOS (command prompt) window, by calling the SetEnvironmentVariable API or using control panel, system, advanced, environment if your operating system will permit this. Note: there may be a different environment string for each folder or sub-folder. Ensure the environment string you wish to use is in the current directory.
    You only need to specify the path of the lib file once in your source script (in a call to a lib file) and GoAsm will automatically use that path for all specified lib files of the same name.

    Using JMP instead of CALL/INVOKE

    You can jump to a function using for example:-
    JMP MyLib.lib:MainProc
    
    This can be used if, for example the library code contains a call to ExitProcess to end the program.

    How do you tell what is in a lib file?

    You will find it useful to check what functions are available in the static code libraries. There are a number of tools available but one of the most useful is Wayne J. Radburn's PEview which enables you to view the internals of various files including lib files. You can see from this tool that the individual members of static code libraries are always ".obj" files, but that the individual members of import libraries are always ".dll" files.
    DUMPBIN is a Microsoft utility which comes with MASM and the "C" compilers. LINK -dump (using the MS linker) is functionally the same and also provides a list of options if used without any further parameters. There are various options but for example, DUMPBIN /LINKERMEMBER:1 MyLib.lib >MyLib.dmp will give (in MyLib.dmp) information about the first member of the library file and DUMPBIN /ALL MyLib.lib will give information about all members.
    PEDUMP is a utility written by Matt Pietrek which can dump the contents of any PE file (including a .lib file) in an ordered way (see also LIBDUMP by the same author).

    Making your own lib files

    The Microsoft utility LIB.EXE, which relies on LINK.EXE and also MsPDB50.dll, can make static code library files. Suppose you have an object file called calculate.obj which contains a function which you wish to re-use. Then you can make a library out of this with the following command line syntax, for example:-
    LIB calculate.obj
    
    This will make calculate.lib. To add another object file to the library you can then use:-
    LIB calculate.lib added.obj
    
    This will add added.obj to the library calculate.lib. This is useful if you want to keep your functions in libraries, so that they can be re-used without having to insert the source code into your source scripts. They are also useful to distribute your functions whilst keeping your source code to yourself. LIB.EXE and its components are part of the MSDN tools which can be downloaded free from the Microsoft MSDN site (part of the SDK). The exact download keeps changing so it may be trial and error getting these files. It is most likely also part of various main compilation tools such as VC++ or MASM.

    Static code library bloat

    Calling a function in a static code library will enlarge GoAsm's output file with the associated code and data of the function. Often the function relies on other functions within the same library file, resulting in yet more code and data being loaded. You can follow what is loaded by using the /l (list) switch in GoAsm's command line and looking at the resulting list file. Unfortunately some code and data may be loaded which is not actually used. You can see the unused code and data labels by using the /unused switch in GoLink.

    Callbacks or data reliance

    Some functions in library files expect to find specific code and data labels in the executable which must be part of your source script. For example the callback procedure RegisterDialogClasses and its associated data variables must be provided in your source script in order to use SCRNSAVE.LIB to make a screensaver.

    Getting data only, not code

    Libraries are intended to give access to functions at compile time rather than just to data from the library. However if you want to cause a particular library to load at compile-time so that you can access its data, you can call a suitable code label in do-nothing code in your source script (the code label is never called). For example:-
    CALCULATE1:
                            ;some code here
    RET
    CALL Lib1:DUMMY         ;ensure Lib1 is loaded at compile-time
    CALCULATE2:
                            ;some code here
    RET
    

    Integration of the code and data, priority of same-name labels

    If you make your own lib files, it will help if you understand how library code and data is integrated with the code and data produced from the main source script. When using code libraries it is common for the names of labels to be the same, and if this does happen when loading a library GoAsm uses information about the very first label only and ignores the rest.
    For example, suppose you have a data area called "BUFFER", either declared in the main source script or in the library. There may be several functions in the main source script or in the various library components or even in other libraries which might use BUFFER.

    So where should it be declared and what if it is declared more than once? A similar question arises with functions. The code for these may be in the main source script or in a library. The code may be duplicated in several places. The answer to the question lies in the priority rules. They are as follows:-
    1. The GoAsm main source script and any "a" include files always have priority. In other words, any code label or data declaration in the scripts will always find their way into GoAsm's output file together with the code and data that they are labelling.
    2. Subject to 1 above, the formal library calls (using the format library:functionname) have priority in the order in which they are called.

    These rules mean that code libraries are able to call functions and to use data within the source scripts directly (without any assistance from the linker). They also mean that any label in a library which has already been used in the source script or in a library which has already been called will be ignored. For example suppose BUFFER is declared in the source script to be 256 bytes. If library1 declares it again as 128 bytes, the label is ignored. BUFFER will be 256 bytes in the output file. And further, although the data area reserved for BUFFER in library1 will be loaded in the output file, the label BUFFER will not point to that area but to the area declared in the source script. Now if in a later call, library2 declares BUFFER as 1024 bytes, again this data area will go into the output file, but BUFFER will point to the original data area. The reason GoAsm deals with same-name labels in this way is twofold. Firstly, it would be impossible for any assembler (or linker for that matter) to work out which label has priority from its size. This is because in GoAsm at assemble time and certainly at link time the size of a particular area pointed to by a label is not known for certain. This is because sometimes areas are enlarged by areas of unlabelled data or code, or sometimes other labels are used as pointers to an intermediate place within the area.
    The rules also have significance for the ordering of data. Suppose your library relies on data being held in BUFFER and also upon data sometimes overflowing into an enlarged area labelled BUFFER_EXT, declared immediately after BUFFER in the libary. Now in the above example BUFFER would not actually point to the expected place (just before BUFFER_EXT). Instead it would point to the first BUFFER declaration somewhere else in the data section.

    So I would suggest the following rules are followed when making lib files:-
    1. Only use same-name labels in the source script and in the library if the labels are intended to point to the same thing at the same place, ie. to the first declared such label.
    2. If data labels of the same-name are to be used in different functions in the lib file, be aware that the size of the data area which they identify will be set by the first declared such label. Also don't expect the data area which the label identifies to be placed in any particular position in the executable.

    Use single object files only

    Currently GoAsm does not support calling the same library functions from more than one module (source script). If you do this you will see "duplicate symbol" errors from the linker. Instead you need to concentrate all library calls to the same function in a library in one source script. In later versions of GoAsm and GoLink support will be added for calls to the same library function in more than one source script if this would be helpful to users.
     

    Unicode supporttop

    GoAsm and its sister program GoRC (resource compiler) read Unicode UTF-16 and UTF-8 files, can take their commands in Unicode and give their output in Unicode. This means that if you use a Unicode editor it is possible to use Unicode filenames, comments, code and data labels, defined words (equates, macros and structs), exports and also strings in data. GoAsm has a number of features to help if you want to write Unicode programs or if you want to have one source script able to make both Unicode and ANSI versions of your program. See writing Unicode programs for more information about all these topics.
     

    64-bit assemblytop

    Using /x64 in the command line switches GoAsm into 64-bit mode, and it will produce a 64-bit COFF object file in PE+ format. GoRC (resource compiler) and GoLink can also work in 64-bits producing executables for the AMD64 and EM64T processors, to be run under Windows 64-bit versions. Although 32-bit executable code differs from 64-bit code, since the basic principles used in writing the source code remain the same, it is possible to use the same source code for both platforms. And existing 32-bit source code can be ported to 64-bits. AdaptAsm.exe can help with this conversion.

    See
    Calling Windows APIs in 32-bits and 64-bits
    Callback stack frames in 32-bits and 64-bits
    Writing 64-bit programs.
     


    x86 compatible mode (32-bit assembly using 64-bit source)top

    Using /x86 in the command line switches GoAsm into x86 compatibility mode and allows you to use source code using the extended general purpose registers RAX,RBX,RCX,RDX,RDI,RSI,RBP and RSP. In this mode these registers are read by GoAsm as if they were EAX,EBX,ECX,EDX,EDI,ESI,EBP and ESP. This enables you to use instructions like:-
    MOV RSI,ADDR String
    MOV [RDI],AL
    MOV RAX,[hInst]
    
    These and similar instructions will work in both x64 and x86 compatibility mode.

    In addition to this, in x86 compatibility mode,

    Note that /x86 should not be used in the command line for Win32 source code (use it only for 32/64-bit switchable source code).

    Even in x86 compatibility mode, you cannot use the new AMD64 registers, R8 to R15, XMM8 to XMM15, nor the new register addressing formats SIL,DIL,BPL,SPL,R8W to R15W, or R8D to R15D. This is because they are not available for use by a 32-bit executable.

    Any source code which is incompatible with a 32-bit excutable should be switched at assemble time using conditional assembly.
    See Calling Windows APIs in 32-bits and 64-bits in the GoAsm manual for more information about ARG and INVOKE.
    See the file Hello64World3 for example source code which can make either a simple Win32 "Hello World" Window program or a Win64 one.
    See also writing 64-bit programs for the detailed differences between 32-bit and 64-bit programs.
     


    Sections - some advanced usetop

    Naming sections

    The object file expects a name for each section and GoAsm provides these by default. When using GoAsm there is no need to name sections at all. The default names used by GoAsm are "code" for a code section, "data" for a data section and "const" for a const (or constant) section.
    You can supply your own name, if you wish, by following the section declaration with a name eg.
    DATA SECTION MySect or
    DATA SECTION "Hello are you well?"
    Each section with a name will be unique. Hence you can make as many sections as you like by giving each section a different name. In normal use, however, you would need only one data section, one code section and one const section. In larger programs you may wish to have more than one section. It is possible this might make debugging slightly easier. Although GoAsm allows you to use a name of more than eight characters, the linker will abbreviate the name to only eight characters when it makes the executable.

    Adding the shared attribute

    The shared attribute (flag 10000000h) when used in a Dll's data section causes the Windows loader to provide only one copy of the data contained in that section to each executable using the Dll. Without this flag, each executable would get a separate copy of the data. This could be a way for one executable to send data to another, without having to use file mapping. To set this attribute add the word "shared" just after the section is declared eg.
    DATA SECTION "MyData" SHARED
    
    Note that a shared section must have a unique name specified in the way shown above. You will probably not want to use one of the default section names, "data", "code" or "const", since these would normally be reserved for non-shared sections.
    GoAsm treats any uninitialised data in a shared section as initialised to zero. This is to avoid problems which would arise if unshared uninitialised data is also required in a module (there can only be one uninitialised data section in an object file); or problems which would arise at link-time if there is no suitable section to which to attach such shared uninitialised data.

    Order of sections

    GoAsm inserts the sections in the object file in the same order as they are encountered in the source script. It is normally the linker's responsibilty to order the sections in the final executable. If the linker is set to follow the same sequence as in the object files, you can adjust the order of the sections in the final executable by changing the order of declaration in the source script. Within the sections themselves you can instruct the linker to order the individual raw data components in a certain way by using a $ suffix in the name of the section. For example:-
    CODE SECTION 'Asm$b'
    ;
    CODE SECTION 'Asm$a'
    ;
    
    Here the linker will ensure that the code in the section called Asm$a will appear in the executable earlier than the code in the section called Asm$b. In fact, the linker will combine the code into one section (called Asm) in this correct order. Material after the dollar sign is used only to provide correct ordering and when comparing the section names the linker will only look at the characters in front of the dollar sign. Because, in the executable, sections cannot have names of more than eight characters, in practice you ought to limit the number of characters in front of the dollar sign to eight.

    Setting the section alignment

    Sections have a default alignment of 16 bytes. This can be changed for an object file to control how the linker aligns the content for that section with the content of that section from other object files. To specify a different section alignment, add ALIGN value just after the section is declared, where value is the size of the required alignment in bytes and can be 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, or 8192. This can be useful for projects with multiple source files that need to build various tables of information, for example the SEH tables in 64-bit projects:
    CONST SECTION '.xdata' ALIGN 8
    ;
    ;UNWIND_INFO
    ;
    CONST SECTION '.pdata' ALIGN 4
    ;
    ;RUNTIME_FUNCTION
    ;
    

     

    Adapting existing source files for GoAsmtop

    I wrote AdaptAsm.exe to take the burden off some of the work which might otherwise be involved in adapting your existing source files to GoAsm 32-bit syntax. It can now also help towards adapting GoAsm 32-bit syntax (or maybe other assembler) syntax to 64-bit syntax. For details of this see using AdaptAsm to help convert to 64-bit programs.
    You use AdaptAsm from the command line using the following:-
    AdaptAsm [command line switches] inputfile[.ext]
    
    If no input extension is specified, .asm is assumed.
    If no output extension is specified, .adt is assumed
    The command line switches are:-
    /h=this help
    /a=adapt a386 file
    /m=adapt masm file
    /n=adapt nasm file
    /fo=specify output path/file eg. /fo GoAsm\adapted.asm
    /l=create log output file
    /o=don't ask before overwriting input file
    /x64=adapt file for 64-bits
    
    When adapting a TASM file, you can regard it as a MASM file if it is written for masm mode. I have not included a version for TASM's ideal mode.
    AdaptAsm creates the output file using the same name and path of the input file but with the extension .adt (unless a different name and path is specified). The output file will show at its head how many changes have been made.
    AdaptAsm cannot write over the source file unless it is also specified as the output file. Even then you will be asked whether you want to overwrite the file unless the switch /o is specified. I suggest you only overwrite the original file if you have a copy of it somewhere. I have not written an antidote!
    If the switch /l is specified AdaptAsm makes a log output file of the same name and path as the adapted output file but with the extension .log. This shows the changes which have been made. The line numbers provided refer to the line numbers of the input file.

    What AdaptAsm does not do

    Despite passing your source script through AdaptAsm you will need to:- Although AdaptAsm looks into "include" files in order to get details of equates, macros and structure templates, it does not adapt these files. If there is any code or data in the include file you will need to run AdaptAsm again specifying the include file as the input file.

    What AdaptAsm does when adapting various input files.
    For what happens using the /x64 switch see using AdaptAsm to help convert to 64-bit programs
    Action a386 files using /a masm files using /m nasm files using /n
    Unless the word is defined (eg. using an equate) square brackets are added to all memory references where they don't already have them eg.
    MOV EBX,MEM_REFERENCE      becomes
    MOV EBX,[MEM_REFERENCE]
    
    but
    MOV EBX,OFFSET MEM_REFERENCE is left alone
    
    yes yes no
    Puts memory references combined with square brackets into correct form eg.
    MOV DX,sHEXw[ECX*2]     becomes
    MOV DX,[sHEXw+ECX*2]
    
    yes yes no
    Adds ADDR to NASM memory references which do not have square brackets no no yes
    Type indicators and overrides BYTE, BYTE PTR, WORD, WORD PTR, DWORD, DWORD PTR, QWORD, QWORD PTR, and TWORD, TWORD PTR are replaced by the shortened equivalents, B,W,D,Q and T yes yes yes
    Swaps all direct quote immediates and word and dword character based data declarations so they read the correct way round eg.
    MOV [ESI],'exe.' becomes MOV [ESI],'.exe'
    MOV EAX,'morf'   becomes MOV EAX,'from'
    DW 'GJ'          becomes DW 'JG'
    DD 'dcba'        becomes DD 'abcd'
    
    yes yes no
    Changes HELLO LABEL to HELLO: yes yes no
    The MASM-type @@: local labels and @F and @B jumps are converted to GoAsm format. They are given numbers in sequence through the file and where necessary the > indicator is added in the jump instructions. no yes no
    The NASM-type local labels preceded by a period eg. (".23") are converted to GoAsm format. The number remain unchanged but where necessary the > indicator is added in the jump instruction. no no yes
    In jump instructions NEAR and SHORT are removed (no longer used) yes yes yes
    Changes FPU registers from just a number to ST0 to ST7 eg.
    FDIV 0,1      becomes FDIV ST0,ST1
    
    yes no no
    Changes FPU registers from ST(0) to ST(7) to read ST0 to ST7 eg.
    FDIV ST(0),ST(1)      becomes FDIV ST0,ST1
    
    no yes no
    Data declarations made using BYTE, ACHAR, SBYTE, are changed to DB.
    Data declarations made using WORD, SWORD, SHORTINT are changed to DW.
    Data declarations made using DWORD, HDC, ATOM, BOOL, HDWP, HPEN, HRGN, HSTR, HWND, LONG, LPFN, UINT, HFILE, HFONT, HICON, HHOOK, HMENU, HRSRC, HTASK, LPINT, LPSTR, LPVOID, WCHAR, HACCEL, HANDLE, HBRUSH, HLOCAL, LPARAM, LPBOOL, LPCSTR, LPLONG, LPTSTR, LPVOID, LPWORD, SDWORD, WPARAM, HBITMAP, HCURSOR, HGDIOBJ, HGLOBAL, INTEGER, LONGINT, LPBYTE, LPCTSTR, LPCVOID, LPDWORD, LRESULT, POINTER, WNDPROC, COLORREF, HPALETTE, HINSTANCE, HINTERNET, HMETAFILE, HTREEITEM, HCOLORSPACE, LOCALHANDLE, GLOBALHANDLE, HENHMETAFILE are all changed to DD.
    Data declarations made using QWORD and DWORDLONG are changed to DQ.
    Data declarations made using TWORD are changed to DT.
    no yes no
    TIMES duplicate data syntax used in NASM changed to DUP method of declaring duplicate data. Also RESB/RESW/RESD used in NASM to reserve uninitialised data changed to DUP ? method of declaring uninitialised data. no no yes
    TEXTEQU changed to its "C" type #define version. The equates EQU and = are not changed since GoAsm supports these. no yes no
    INCLUDE directive changed to #INCLUDE yes yes yes
    %INCLUDE directive changed to #INCLUDE no no yes
    Changes the IF/ELSE/ELSEIF/ENDIF/IFDEF series of directives (conditional assembly) to #IF/#ELSE/#ELSEIF/#ENDIF/#IFDEF The .IF/.ELSE/.ELSEIF/.ENDIF/.IFDEF and .WHILE and .BREAK directives are left untouched. These will have to be changed back to "pure" assembler to hand. no yes no
    Changes the %IF/%ELSE/%ELSEIF/%ENDIF/%IFDEF series of directives (conditional assembly) to #IF/#ELSE/#ELSEIF/#ENDIF/#IFDEF Changes the %DEFINE directive to #DEFINE no no yes
    Comments out all lines beginning with EXTRN or EXTERN, GLOBAL, or PUBLIC. yes yes yes
    PROC is changed to FRAME and ENDP to ENDF. In masm code, the parameters and the USES statement are adjusted to GoAsm syntax. yes yes no
    The size of LOCAL data in an automated stack frame is changed to the GoAsm shorter versions (B,W,Q,T) and D is removed altogether since this is the GoAsm default. no yes no
    Changes EVEN to ALIGN yes yes yes
    Various lines which GoAsm does not support commented out eg. NAME, TITLE, SUBTITLE, SUBTTL, PROTO lines etc. yes yes yes
    Various lines which GoAsm does not support are removed altogether eg. .ERR, .EXIT, .LIST, .286 etc. yes yes yes
    The word COMMENT is replaced by a semi-colon yes yes yes


    Miscellaneoustop


    Special push instructionstop

    Half stack operations

    This applies only to 32-bit programming.

    In Win32 data on the stack is held in dwords, and the value of ESP is always on a dword boundary after a PUSH or POP operation. GoAsm does support half stack operations, however, which push onto and pop from the stack only two bytes at a time instead of four. When using these instructions you must push or pop a second time to restore ESP to a dword boundary. To make the syntax obvious, GoAsm requires the use of PUSHW and POPW for these half-stack operations. PUSH and POP cannot be used - they always perform a dword stack operation. As an example, half stack instructions can be used in response to the WM_LBUTTONDOWN message:-

    MOUSEX_POS DD 0
    MOUSEY_POS DD 0
    PUSH [EBP+14h]          ;push lParam onto the stack
    POPW [MOUSEX_POS]       ;take the loword first
    POPW [MOUSEY_POS]       ;then the hiword
    
    or when using some APIs which receive data from the stack in both the loword and hiword:-
    PUSH ADDR lpFileTime
    PUSHW [wFatTime]
    PUSHW [wFatDate]
    CALL DosDateTimeToFileTime
    
    GoAsm also supports the PUSHAW, PUSHFW and POPFW, POPAW instructions, although you would not normally use these because GoAsm is used only in 32-bit programming.

    Pushing and popping flags

    Instead of using PUSHF and POPF to push and pop the flags, you can if you prefer use:-
    PUSH FLAGS
    
    and
    POP FLAGS
    
    This feature can also be used with invoke and with uses.
    FLAGS is therefore a reserved word in GoAsm and cannot be used as a label.

    See also:-
    PUSH or ARG pointers to strings and data,
    callback stack frames in 32-bits and 64-bits.
     


    Segment overridestop

    In Windows programming segment overrides will be used very rarely, and are usually limited to the FS segment register.

    In GoAsm segment overrides can be either before or after the mnemonic, for example:-

    FS OR D[24h],100h
    OR FS D[24h],100h
    FS MOV [ESI],EAX
    MOV FS[ESI],EAX
    
    But segment overrides cannot be in a position where they can be confused with a segment register, nor can they be inside square brackets, so this is not allowed:-
    PUSH FS[0]          ;use FS PUSH [0] instead
    POP FS[0]           ;use FS POP [0] instead
    MOV [FS:0],EAX      ;use FS MOV [0],EAX instead
    

    Using source informationtop

    The following source information is available at compile-time:-
    @line         current line being assembled
    @filename     main source script being assembled
    @filecur      current file being assembled
    
    These words are case insensitive. @filecur shows the current file following assembly into "a" include files, whereas @filename shows the name of the very first source script given to GoAsm at start-up.

    @line provides a 32-bit integer and can be used as follows:-

    MOV EAX,@line         ;eax given the line number
    PUSH @line            ;line number pushed on the stack
    DD @line              ;line number declared in memory
    
    @filename and @filecur provide pointers to a string containing the name and can be used as follows:-
    PUSH @filename        ;pointer to null terminated string
    DB @filename          ;not null terminated string
    DB @filename,0        ;null terminated string
    

    Using the location counters $ and $$top

    Meaning of the location counters

    $ - means the position at the point of use in memory in the executable as loaded by Windows
    $$ - means the position at the start of the current section in memory in the executable as loaded by Windows

    Because both of these operators give positions in memory in the executable as loaded by Windows their values are not known to GoAsm at assemble-time, nor to the linker at link-time. In this respect they act like a code or data label. When you use them they do not have a value but they can be subtracted from each other or from memory references within the same section to produce a value. This is because their relative values are known.

    Use of the location counters

    The $ location counter is useful as a device to get the size of a string, for example:-
    HELLO DB 'He finally got the courage to talk to her',0
    LENGTHOF_HELLO DB $-HELLO
    
    Note that the $ location counter refers to the position of the label LENGTHOF_HELLO, so that the length of the string will be contained in data at that place and will be exact.
    Here is an example which gets the size of a table of dword and uses a slightly different method of calculation so that the size is contained in the first dword of the table itself
    MESSAGES DD ENDOF_MESSAGES-$
             DD MESS1,MESS2,MESS3,MESS4,MESS5
    ENDOF_MESSAGES:
    
    Which is the same as this:-
    MESSAGES DD ENDOF_MESSAGES-MESSAGES
             DD MESS1,MESS2,MESS3,MESS4,MESS5
    ENDOF_MESSAGES:
    
    See that the first dword of the table of values contains the value of 24 since the first dword is counted too. With a little bit of arithmetic you can get the number of values in the table, in this case five:-
    MESSAGES DD (ENDOF_MESSAGES-$-4)/4
             DD MESS1,MESS2,MESS3,MESS4,MESS5
    ENDOF_MESSAGES:
    
    In this instruction
    LABEL400:
    JMP $+20h       ;continue execution 20h bytes ahead
    
    The location counter refers to the position of LABEL400, which is the same position as the beginning of the JMP instruction. Therefore the five bytes in the JMP instruction itself (relative call using the opcode E9) must be allowed for in the calculation.

    Here are some other examples of use in the code section:-

    CALL $$       ;a call to the start of the current section
    MOV EAX,$-$$  ;get distance to current location from start of current section
    
    Here are some other examples of use of $ and $$ in the data section:-
    HELLOZ DD $$            ;HELLOZ to hold position at top of section
    DB 100-($-$$) DUP 0     ;pad with zeroes to offset 100
    
    When used inside a definition the location counter refers to the location when the definition is used rather than when it is declared. For example, here the $$ refers to the start of the code section since the definition globule is used within the code section:-
    #define globule $$+2+3
    CODE SECTION
    MOV EAX,globule
    

    Alignment and the use of ALIGNtop

    What is alignment?

    All areas of memory have certain boundaries and points of alignment, even when considering the virtual addresses used by Windows. Take the virtual addresses 400000h and 410000h where typically you might find the code and data sections of your program loaded by Windows. Both these addresses can be regarded as starting on word, dword, and paragraph (16 byte) boundaries as well as 400h (which is 1K) or even 10000h boundaries (which is 64K). The addresses 400001h and 410001h are not on any of these boundaries whereas 400002h and 410002h hit a word boundary. It is therefore said to be "word aligned". 400004h and 410004h hit word and dword boundaries. 400010h and 410010h hit word, dword and paragraph boundaries. These latter addresses are therefore properly described as being word, dword and paragraph aligned.

    In assembler, you can control both data and code alignment. We shall look first at data alignment and then briefly look at code alignment.

    Need for data alignment

    There are two reasons why you might need to align your data.
    One is to satisfy Windows.
    The second is to try to get some extra speed into your programs.

    Alignment to satisfy Windows

    For 32-bit Windows (NT/2000/XP and Vista running as Win32) the destination of many pointers to data given to the APIs need to be dword aligned, and often this is undocumented.
    Even under Windows 9x there are several pointer destinations which must be dword aligned, for example the structures DLGITEMTEMPLATES and DLGITEMTEMPLATESEX. Also the Menu, class, title and font data in a DLGTEMPLATE must be word aligned and the structures used in the Network Management APIs must be dword aligned.
    Certain members of bitmap structures must be aligned internally. XP requires that the height and width of compatible bitmaps is always divisible by four to ensure that each line is aligned properly.
    There are certain SSE and SSE2 instructions which require 16-byte alignment of the memory area which they are dealing with, for example FXSAVE, FXRSTOR, MOVAPD, MOVAPS and MOVDQA.

    For 64-bit Windows the alignment requirements are even stricter. It is essential to ensure that structure members are aligned on their "natural boundary". So a word should be on a word boundary, a dword on a dword boundary, qword on a qword boundary etc. This only works if the structure itself is properly aligned on the correct boundary. Basically the structure should be aligned on the natural boundary of its largest member. It is also important for the structure to end on the natural boundary of its largest member, if necessary by adding padding. Also in 64-bit Windows, the stack pointer RSP should also always be aligned to a 16-byte boundary on making an API call. More information about alignment requirements in 64-bit programming.

    For both 32-bits and 64-bits, if the alignment is wrong the results are unpredictable, varying from mere non-appearance of controls, to program exit.

    Alignment for speed

    The alignment which achieves the greatest speed varies from processor to processor but generally it is a good idea to ensure that data is aligned in memory to suit the size in which it operates. For example a table of dwords would best be on a dword boundary. In theory both qwords and twords ought to be qword aligned for best performance.

    Achieving correct data alignment

    For Win32, GoAsm automatically aligns structures on a dword boundary, both when they are declared as local data and in the data section. So you can be sure that all your structures will work. However you may wish to add additional alignment to those structures which are declared in the data section by using the ALIGN operator. You might also wish to use ALIGN on ordinary data too (not structures) to achieve best performance.
    Good alignment can usually be achieved automatically by declaring data in size sequence in the data section. So you would declare all qwords first, then dwords, then words, then bytes and strings. Twords, being 10 bytes, would upset the sequence - you could do them all first then correct the alignment using ALIGN.

    For Win64 GoAsm automatically aligns structures and structure members to suit the natural boundary of the structure and its members. GoAsm also pads the size of the structure to suit. GoAsm also automatically aligns the stack pointer ready for an API call.
    See writing 64-bit programs for more information about how this works in practice.

    Code alignment

    Good alignmnent of code is more of an imponderable and depends on the processor running the code. Rick Booth touches on the subject in his book "Inner Loops" published by Addison Wesley.
    I have included some speed tests in TestBug which show what difference correct alignment can make when reading from, writing to or comparing the contents of, memory.

    Use of ALIGN

    GoAsm recognises ALIGN value, where value is the size of the required alignment in bytes. GoAsm will align the next data or code on the correct boundary to ensure the alignment. For example:-
    ALIGN 4         ;the next data will be dword aligned
    ALIGN 16        ;(or ALIGN 10h) will align on the next 16 byte boundary
    
    In order to achieve the alignment, in a code section GoAsm pads with instruction NOP (opcode 90h), which performs no operation. In data or const sections GoAsm pads using zeroes to the correct place.

    See also sections - some advanced use on section alignment.
     


    Use of SIZEOFtop

    The operator SIZEOF can be used like ADDR or OFFSET, but instead of giving the address of a code or data label, it gives its size in bytes. Since GoAsm is a one-pass assembler, SIZEOF only returns the proper value for a label which starts and ends earlier in the source script. SIZEOF can either be used for ordinary data and code or for local data.

    Using SIZEOF when referring to data labels

    Here are examples how SIZEOF can be used (where Hello is a data label):-
    MOV EAX,SIZEOF Hello
    MOV EAX,SIZEOF(Hello)
    MOV EAX,[ESI+SIZEOF Hello]
    SUB ESP,SIZEOF Hello
    DD SIZEOF Hello
    Label DB SIZEOF Hello DUP 0
    MOV EAX,SIZEOF Hello+4
    MOV EAX,SIZEOF Hello-4
    MOV EAX,SIZEOF Hello/2
    MOV EAX,SIZEOF Hello*2
    
    When referring to a data label SIZEOF finds the distance in the raw data from the specified label to the next label in the section in which the label was declared or to the end of the section whichever is the earlier so that in:-
    Hello DB 0
          DD 0
    Hello2 DB 0
    
    then
    MOV EAX,SIZEOF Hello
    
    loads into eax the value five.

    Using SIZEOF with strings

    You can use SIZEOF with strings (replacing the masm LENGTHOF). For example:-
    WrongB DB 'You pressed the wrong button!,'0
    
    then
    MOV EAX,SIZEOF WrongB
    
    returns the length of the string and the null terminator

    Using SIZEOF when referring to code labels

    In this example:-
    START:
    XOR EAX,EAX
    XOR EAX,EAX
    XOR EAX,EAX
    XOR EAX,EAX
    LABEL:
    MOV EAX,SIZEOF START
    
    The value eight will be loaded into eax. This is the size of the four xor eax,eax instructions.

    Using SIZEOF with structures

    You can use SIZEOF with structures to return the size of the structure. For example if you have
    Rect STRUCT
       left   DD
       top    DD
       right  DD
       bottom DD
    ENDS
    rc Rect
    
    then both
    MOV EAX,SIZEOF Rect
    and
    MOV EAX,SIZEOF rc
    
    load into eax the value 16.

    Using SIZEOF with structure members

    You can use SIZEOF to return the size of structure members calculated to the next named label. For example if you have
    Rect STRUCT
       left   DD
              DD
       right  DD
       bottom DD
    ENDS
    rc Rect
    
    then both
    MOV EAX,SIZEOF rc.left
    and
    MOV EAX,SIZEOF Rect.left
    
    return a value of 8.

    Using SIZEOF with unions and union members

    You can use SIZEOF to return the size of the union or its members but bear in mind that each member will return the same size (the largest member size). So, for example:-
    Sleep STRUCT
         DW 2222h
         DB 0h
    ENDS
    Ness UNION
         Possums DB L'Balance'
         Koalas Sleep
         Devils  DB 'Roar'
    ENDS
    Happy Ness
    SizeLabel DD SIZEOF Happy
              DD SIZEOF Ness
              DD SIZEOF Happy.Possums
              DD SIZEOF Happy.Koalas
              DD SIZEOF Happy.Devils
    
    Each dword in SizeLabel contains 14, which is the size of the largest union member, Happy.Possums, which contains a Unicode string which is 14 bytes long.

    How structure initialisation may affect the size

    When getting the size, any arguments which might otherwise be used to change the size of the structure are ignored for example

    StringStruct STRUCT
       DB ?
    ENDS
    
    then
    MOV EAX,SIZEOF StringStruct
    
    would return size of 1 byte even if the structure has been implemented using LongString StringStruct <'I hate structures'> which enlarged the structure to 17 bytes in this implementation Similarly, if the length of the string relies on resolution of a definition, for example
    StringStruct STRUCT
       DB LONGSTRING
    ENDS
    
    then
    MOV EAX,SIZEOF StringStruct
    
    would return 1 byte regardless of the value of LONGSTRING
    Structure sizes which change with definitions on other ways are resolved so that
    Rect STRUCT
       DB TWELVE DUP 0
    ENDS
    
    will be resolved properly.

    Initialising a structure with SIZEOF

    You can initialise a structure with its own size or with the size of something else, for example:-
    PARAM_STRUCT STRUCT
         DD 0
         DD 0
         DD 0
    ENDS
    ps1 PARAM_STRUCT <SIZEOF PARAM_STRUCT,,>
    

    Using SIZEOF with local data

    The size of the local data is returned, for example:-
    MyWndProc FRAME
    LOCALS hDC,DemonFlag:B,Buffer[256]:B,MyRect:RECT
    MOV EAX,SIZEOF hDC         ;4 in 32-bits, 8 in 64-bits (default size)
    MOV EAX,SIZEOF DemonFlag   ;1
    MOV EAX,SIZEOF Buffer      ;256
    MOV EAX,SIZEOF MyRect      ;16
    RET
    ENDF
    
    The size which is returned ignores any padding added by GoAsm to align the local data properly on the stack (all local data is dword aligned in 32-bit assembly, and qword aligned in 64-bit assembly).
    See also using the LOCALS statement.

    Using branch hintstop

    What is branch prediction?

    Modern processors (for example the Pentium upwards) are able to run very fast by predicting in advance the next instruction which will be executed after a conditional jump instruction. For example:-
    CMP EDX,EAX       ;compare edx and eax
    JZ >L1            ;branch to L1 if edx=eax
                      ;lots of other instructions
    L1:
    
    The processor cannot be 100% sure in advance whether edx will equal eax when line 1 is executed. It might predict this to be unlikely, however, in which case it will set up the instructions just after the conditional jump to be executed next. If this prediction is right, those instruction will be executed immediately without any time loss. If the prediction is wrong, there will be some time loss in switching to the correct instruction (just after L1) instead. Various branch prediction algorithms are used by the processor, including ones which learn from their mistakes. One of the most basic predictions used as a starting point assume that:-
    All forward conditional jumps will not take place, and
    All backwards conditional jumps (loops back) will take place.
    
    It is documented that in normal code, backward conditional jumps tend to take place 80% of the time, whereas forward conditional jumps will be likely not to take place. It is said that overall, the default prediction is correct 65% of the time. So predicting whether or not the conditional jump will take place can speed up the code particularly on a series of loops back. As an assembler programmer, you can produce fast code knowing the default prediction for the processor that you are programming for since you are in complete control over your code.

    What are branch hints?

    In the P4 (and possibly in some earlier processors) you can give the processor a "hint" as to whether or not the branch is likely to occur.
    This is done by using 2Eh and 3Eh branch hint bytes which are inserted immediately before the conditional jump instruction. Respectively they mean as follows:-
    2Eh - hint that the branch will not occur most of the time.
    3Eh - hint that the branch will occur most of the time.
    
    Using branch hint 2Eh would be useful (for example) if you have a backwards branch in a conditional jump which occurs only in the case of an error. The processor would usually predict that the backwards branch is likely to happen, thereby slowing down the code at that particular point. Inserting 2Eh will stop the processor making that prediction.
    Branch hint 3Eh would be useful if (unusually) you want to create a loop where the conditional jump is at the beginning of a code fragment and the destination of the jump is forward in the code. This loop would normally run more slowly than a normal type of loop because of the default prediction that the forward branch is unlikely to occur, unless the processor is capable of learning from its prediction mistakes.

    Inserting branch hints

    Intel does not recommend any particular mnemonic to insert the branch hint bytes. You could do this as follows:-
    DB 2Eh  ;or
    DB 3Eh
    
    If you prefer you can insert the branch hint bytes automatically.
    GoAsm uses the following syntax to insert the branch hint bytes suitable for the P4 processor:-
    hint.nobranch           ;insert 2Eh
    hint.branch             ;insert 3Eh
    
    Going back to the first example, if edx normally does equal eax, then this code will run faster on the P4 by adding a branch hint as follows:-
    CMP EDX,EAX             ;compare edx and eax
    hint.branch JZ >L1      ;normally does branch to L1
                            ;lots of other instructions
    L1:
    
    I have included a speed test in TestBug which proves the speed improvement which can be obtained using branch hints. I found for example that some code ran some 1.5 times faster on a P4 processor using the correct branch hint.
     

    Syntax to use FPU, MMX, and XMM registerstop

    The x87 FPU (floating point) registers can be addressed using
    ST0
    ST1
    ST2
    ST3
    ST4
    ST5
    ST6
    ST7
    
    the MMX and 3DNow! registers can be addressed using
    MM0
    MM1
    MM2
    MM3
    MM4
    MM5
    MM6
    MM7
    
    and the XMM registers can be addressed using
    XMM0
    XMM1
    XMM2
    XMM3
    XMM4
    XMM5
    XMM6
    XMM7
    
    In 64-bit processors there are eight new XMM registers, addressable using
    XMM8
    XMM9
    XMM10
    XMM11
    XMM12
    XMM13
    XMM14
    XMM15
    
    See writing 64-bit programs for the other new registers and register addressing methods.

    The "Wait" opcode (9Bh)

    Since GoAsm works only on modern processors with integrated floating point processors, it only emits the Wait opcode for the following instructions:-
    FCLEX
    FINIT
    FWAIT
    FSAVE
    FSETPM
    FSTCW
    FSTENV
    FSTSW
    

    Reporting assemble timetop


    GoAsm will give you a report of the time it took to assemble parts of your source script if you add the word
    GOASM_REPORTTIME
    
    one or more times in your source script. The time which is reported is the time to assemble from the last GOASM_REPORTTIME (or start of assembly) up to that line, and the time to assemble to the next GOASM_REPORTTIME (or end of assembly). Times exclude set-up and clean-up times.
    GOASM_REPORTTIME is useful if you want to see if different configurations of your source will speed up assembly, or to see if there is a particular part of your source which is slowing down assembly.

    Other GoAsm interruptstop


    You can write a message to the console upon the happening of an event during assembly (which might be switched by conditional assembly) using (for example):-
    GOASM_ECHO Assembly has reached new heights
    
    The material to write to the console does not have to be in quotes, although it can be.

    You can force GoAsm to stop assembly and exit using:-

    GOASM_EXIT
    

    GoAsm list filetop

    GoAsm produces a list file if asked to do so with the switch /l in the command line. The file uses the same name and path as the object output file but it has the extension .lst. It will be in the same format as your first source file (a Unicode source file will result in a Unicode list file) more about Unicode.
    The list output is in two parts, showing the opcodes which were generated as a result of each instruction on the left hand side of the page, then the instruction and comments on the right hand side of the page. If any line is too long to be given in full it is truncated. In the case of the opcodes an ellipsis shows that this has happened. Memory references which require relocation (ie. a value will be inserted either by GoAsm or by the linker) are shown in square brackets. However, the relocation which is inserted will not be shown.
    Definitions are shown as defined or expanded.
    Include files with an "a" or "A" extension are treated as part of the source script and their contents will be included in the list. The contents of include files with other extensions will not be included in the list.
    Let me know if there is anything else you would like the list file to show or if you would like to change the format.
     

    GoAsm's error and warning messagestop

    GoAsm gives all error and warning information on the command line. If GoAsm finds an error in the source script it will stop assembling and exit with return code TRUE (eax=1). GoAsm then displays in the MS-DOS (command prompt) window the line and origin (the source script) where the error occurred, a description of the error, and it may also show the offending word or line of text. If the word or line was defined it may show the origin of the definition. Some errors in non "a" include files (those which are not treated as part of the source script) are simply ignored, others are shown as warnings, depending on the nature of the error.
    If you use the /b switch you will also hear a beep on an error.
    I decided against permitting assembly to continue after an error since an error often results in other errors which are also reported and this tends to obscure the original cause. I have found this single-error response to be quite adequate in the light of GoAsm's simplifed syntax which hopefully will reduce errors in your source script anyway. In addition to this, if you write your code incrementally then hopefully errors will be few.
    I also decided against writing errors to a separate file, since it is time consuming to have to open another file to read the error rather than seeing it on the command line. You can redirect all GoAsm's output to a file which you can read later, however, using the DOS redirect command, for example:-
    GoAsm Test.asm >output.fil
    
    Another way to control GoAsm's error and warning output is by using these switches on the command line:-
    /b  beep on error
    /ne no error messages
    /ni no information messages
    /nw no warning messages
    /no no output messages at all
    
    A warning only will also be given if a word has been defined more than once in the command line or in the source script, but assembly is allowed to continue. This is because it would be unusual to define a word more than once and it may be that this is a programming error. It is perfectly permissible to cancel a previous definition using #undef so that the word can be defined. In that case no warning is given.
    You will also get a warning if:-
    (a) you try to use the same include file twice unless it is the type of include file which itself contains source script, in which case there will be an error instead;
    (b) you try to declare more than 1MB of duplicate data;
    (c) (in some cases) if you use a type indicator when this is not necessary.

    In a batch file, you can use the error return with ERRORLEVEL, for example the following will pause if there is an error return:-

    GoAsm MyFile.asm
    IF ERRORLEVEL 1 PAUSE
    

    Using GoAsm with various linkerstop

    Using GoLink

    GoLink is a free linker written by me and available from my web site www.GoDevTool.com. It accepts the PE file input from OBJ files produced by GoAsm (or other assemblers and compilers) and the RES or OBJ input from GoRC and produce an executable file.
    GoLink has certain advantages over other linkers, in particular it is tuned to work closely with GoAsm on imports and exports, and it can report on redundant data and code labels when linking files produced by GoAsm. But its main advantages are that it works fast, and has reduced "baggage". Files are kept to a minimum. LIB files have been abolished. Instead GoLink looks at the executables actually stored on the computer at link-time to get its information about imports required by the executable it is making. Since most of these will be in memory anyway if they are system Dlls, this can be done fast.

    See the GoLink help file for full information about GoLink and how to use it. A typical batch file (with an extension .bat) to create a simple executable might be:-

    GoAsm MyProg.asm
    GoLink MyProg.obj Kernel32.dll User32.dll
    
    This will create a Windows PE file MyProg.exe with imports from the mentioned Dlls. The entry address START is assumed but this may be specified.

    You can use a command file with GoLink for example:-

    GoLink @command.fil
    
    Instead of (or in addition to) specifying the Dlls in GoLink's command line or file you can use #DYNAMICLINKFILE in GoAsm source code. The syntax is:-
    #dynamiclinkfile path/filename, path/filename
    
    The comma is optional. The path/filename can be in quotes. One or more path/filenames can be specified. You don't need to provide the path when specifying system files since GoLink looks inside the system folders automatically. The filename must have its extension which can be .dll, .ocx, .exe or .drv.
    The filename is sent to GoLink in the .drectve section in the object file created by GoAsm. This is used to give information and directives to the linker but does not find its way into the final executable.
    You need GoLink version 26.5 and above for this to work.

    There are several other command line switches and options when using GoLink. For more details please see the Golink help file, GoLink.htm.

    Using ALINK

    ALINK is a free linker from Anthony Williams. ALINK's output is rather verbose, so its best to divert its output to a file which you can look at later. I would suggest you make a batch file with an extension .bat containing the following lines:-
    GoAsm MyProg.asm
    ALINK @Respons.fil >link.opt
    
    You run the batch file by entering its name on the command line in an MS-DOS (command prompt) window and pressing enter.
    Respons.fil is a file with the instructions to the linker which might be as follows (as an example):-
    -m                            ;produces map file
    -oPE                          ;makes a PE file
    -o MyProg.Exe                 ;gives the output file name
    -entry START                  ;signifies the starting address
    -debug                        ;signifies debug symbols to be made
    kernel32.lib                  ;
    COMCTL32.lib                  ;  a lib file for each API
    COMDLG32.lib                  ;  which is called in your program,
    user32.lib                    ;  made using ALIB which is
    gdi32.lib                     ;  part of the ALINK package
    shell32.lib                   ;
    MyProg.obj                    ;the input file
    
    You can organise your work by keeping the files in various folders in which case you would need to include the paths in the instructions given to GoAsm and ALINK.

    Using the Microsoft linker

    The Microsoft linker is available free in the Microsoft SDK or by downloading MASM itself. See www.GoDevTool.com for more information or try the various links to other sites. The Microsoft linker is more difficult to use since it expects that the start address label and external calls are "decorated" in the way they are emitted by "C" compilers and by MASM. The expected decoration is an underline character before the label and (in the case of a function call) an @ sign followed by the number of bytes pushed on the stack when the function is called. Normally GoAsm does not use decoration because my aim has been to simplify everything as far as possible.

    Automatic decoration with /ms switch

    You can instruct GoAsm automatically to provide the necessary decoration using the /ms switch in the command line (32-bit only, disabled with use of /x64). This will cause GoAsm to decorate all code labels, calls and invokes.

    In every such case the decoration is in this form:-

    _CodeLabel@x
    
    where the label is declared as CodeLabel and where x is the number of bytes used by CodeLabel's parameters. Decorated in this way, CodeLabel is available to other object files being linked by the MS linker (and therefore can be called from those other object files). And if CodeLabel resides in a DLL, the MS Linker will recognise it as such from a lib file made from the DLL and given to it at link-time.

    At link-time the MS linker expects the value of "@x" in both the caller and the callee to match exactly. This is therefore a limited form of parameter checking. When the /ms switch is used, GoAsm therefore needs to count the number of parameters used by the code label in order to get the value of "@x" correct. To achieve this, in the case of labels to FRAMEs, GoAsm counts the number of parameters declared in the FRAME and adjusts the decoration accordingly. In the case of a call using INVOKE again GoAsm counts the number of parameters used.

    However, GoAsm cannot count the number of parameters to a call using CALL and assumes there are none. For this reason if there are any parameters you must use INVOKE for the decoration to work properly (and not an ordinary PUSH xxx, then CALL). Also note that if using ARG before INVOKE, each argument needs to be on its own line (not ARG 1,2,3).

    So, for example if you use the following code with the /ms switch in GoAsm's command line:-

    HelloProc FRAME hwnd,arg1,arg2
    INVOKE MessageBoxA, [hwnd],'Click OK','Hello',40h
    
    Then GoAsm will insert the symbol HelloProc in the object file as
    _HelloProc@12
    
    and the called function in the object file as
    _MessageBoxA@16
    
    This is because GoAsm knows that 12 bytes are on the stack in the case of HelloProc and 16 bytes are pushed on the stack before MessageBoxA is called.

    From GoAsm Version 0.49, ordinary code labels without parameters are also decorated. This is to enable such code labels to be recognised externally at link-time so that they can be called by other object files created by MS tools. They are given a parameter byte count of zero. This includes the label giving the starting address itself. So suppose your starting address in your source script is START: (no leading underline character). This is now decorated as:-

    _START@0
    
    and to link properly using the MS Linker you would include this line in the linker's command line or file:-
    -ENTRY START@0
    

    Manual decoration

    You can also use the MS linker by making the following manual changes to your source script.
  • Ensure that the label giving the start address starts with an underline character. But omit this underline character when telling the MS linker what the label is.
  • If you have more than one object file to send to the MS linker, and there are functions in one object file called from another object file, decorate the labels for those functions. Ordinary code labels without parameters will need a leading underline character and @0 at the end of the name. Code labels with parameters need a leading underline character and @x after the name where x is the number of bytes of parameters (each parameter is four bytes).
  • Decorate all API calls with a leading underline character and @x after the name where x is the number of bytes pushed on the stack when the API is called (each PUSH is four bytes).

    For example:-

    PUSH 12h
    CALL _GetKeyState@4       ;check for alt-key pressed
    
    There is only one parameter to the API GetKeyState so four bytes are put on the stack.

    If you are making a dll, you will need to use a label for the starting address decorated in the same way, for example

    _DLLENTRY@12:
    
    This indicates that the label DLLENTRY is called with 12 bytes pushed on the stack ie. 3 dwords.

    After you have made those changes to your source script you are ready to make a batch file with the extension .bat to assemble the source and run the linker. These lines might be in the batch file:-

    GoAsm MyProg.asm
    LINK @Respons.fil
    
    You run the batch file by entering its name on the command line in an MS-DOS (command prompt) window and pressing enter. The file Respons.fil might contain the following lines:-
    /OUT:MyProg.Exe                ;gives name of output file
    /MAP                           ;produces map file
    /SUBSYSTEM:WINDOWS             ;makes a Windows GDI executable
    /ENTRY:START                   ;you have it as _START in the source!
    /DEBUG:FULL                    ;do a debug output
    /DEBUGTYPE:COFF                ;do embedded COFF symbols
    MyProg.obj                     ;the input file
    comctl32.lib                   ;
    user32.lib                     ; lib files for each API which
    gdi32.lib                      ; your program calls, these
    kernel32.lib                   ; files come with the linker
    

    Using definitions to help with the Microsoft linker

    To make your source code look better when using the Microsoft linker you can cause the API calls to be appropriately decorated by defining the API name for use throughout your code, for example:-
    GetKeyState=_GetKeyState@4
    CALL GetKeyState
    
    or, to call the API more directly you can use:-
    GetKeyState=__imp__GetKeyState@4
    CALL [GetKeyState]
    

    Retaining leading underscores in library calls using the /gl switch

    Some linkers, such as the MinGW linker, expect at least one leading underscore for calls to "C" functions. If in GoAsm you call a function in a static code library, that library might make calls to "C" functions. But GoAsm will normally strip out the leading underscore on the assumption that you will be using GoLink. But if you are using a linker of this type you will want to ensure that the leading underscore is retained. To ensure this happens use the /gl switch in the command line.

    Alphabetical Indextop


    A

    a386 files, adapting for GoAsm
    accessing labels
    acknowledgements
    adapting files for GoAsm
    ADDR same as OFFSET
    ALIGN
    alignment of data and code
    alignment of structures and members
    Alink linker
    AND logical operator
    ANSI API versions
    Unicode/ANSI switching
    APIs, calls to
    ARG, to send parameters to APIs
    ARG, to send pointers to strings and data
    ARGCOUNT macro operator
    arguments, using in macros
    arithmetic
    assembler - beginners advice
    automated stack frames

    B

    beginner's section
    bits, binary and bytes
    branch hints
    bss section

    C

    callback stack frames
    calls, different types
    calls to procedures
    case sensitivity
    case, which to use
    characters
    character immediates in code
    character immediates in data
    code
    code section
    command line switches
    comments
    conditional assembly
    conditional assembly in macros
    conditional assembly in structures
    conditional jumps to labels
    conditional jumps to unique labels
    conditional jumps, tutorial
    C runtime library, using

    D

    32-bit compatibility mode
    3DNow! instructions
    data access [square brackets]
    data declaration
    data, run-time importing by pointer
    data, compile-time importing by file using INCBIN
    data, inserting blocks of data using DATABLOCK
    data labels, accessing
    data labels, getting addresses
    data section
    DB, DW, DD, DQ, and DT
    debugging - beginners advice
    DEC, repeat instructions
    decoration by ms linker
    decoration, automatic with /ms switch
    decoration, manual
    #define
    definitions
    design features
    discussion forum
    distribution
    DUP (duplicate data)
    DUS (declare unicode sequence)
    #dynamiclinkfile

    E

    ECHO interrupt
    entry into program
    environment (lib file)
    environment (include)
    equates
    error and warning messages
    EVEN see ALIGN
    execution, diverting
    EXIT interrupt
    exporting procedures and data
    exporting in Unicode

    F

    filename at compile-time
    files, include
    files, input and output
    files, raw data
    flags, automatically saving
    flags, pushing and popping
    flags, setting
    flags, tutorial
    floating point numbers, declaring
    floating point registers, x87
    forum
    FRAME..ENDF stack frame

    G

    Using the /gl switch
    GoLink linker

    H

    half stack operations
    hashes, using double in definitions
    "Hello World" examples - see appendices
    hex numbers, tutorial
    hinting branches

    I

    "if" commands, conditional assembly
    "if" commands, run-time
    importing by ordinal
    importing by specific Dll
    importing data
    importing procedures by calls
    importing libraries (at compile-time)
    INC, repeat instructions
    INCBIN to insert raw data
    #include, using to include files
    incremental coding
    initialised data, declaring
    initialised structures, overriding
    initialising unions
    Integrated Development Environments (IDEs)
    interrupts (ECHO, EXIT see also REPORTTIME)
    INVOKE to call an API

    J

    jumps, conditional to labels
    jumps, conditional to unique labels
    jumps, conditional, tutorial
    jumps, different types
    jumps to labels
    jumps to unique labels

    K

    L

    labels
    labels in Unicode
    LEA - examples
    legal stuff
    libraries, using
    licence
    libraries, merging at compile-time
    line number at compile-time
    linking the output files
    list file
    local data, message specific
    locally defined words (#localdef or LOCALEQU)
    LOCALFREE - free local data
    locally scoped re-usable labels
    LOCAL(S) local data
    location counters
    loops, speeding up

    M

    MACRO...ENDM
    macros
    masm files, adapting for GoAsm
    memory access [square brackets]
    memory, accessing data labels
    memory, getting label addresses
    merging using code libraries
    Microsoft linker
    mmx registers
    mnemonics
    MOV pointers to strings or data

    N

    nasm files, adapting for GoAsm
    nested unions
    nested structures
    NOT logical operator
    numbers and arithmetic

    O

    object orientated programming
    OFFSET same as ADDR
    operators
    OR logical operator
    ordinal, importing by
    output paths

    P

    parameter checking, no
    predicting branches
    PROC..ENDP see FRAME..ENDF
    procedures and calls
    programming - beginners advice
    programming - hints and tips
    protective coding
    PUSH or ARG pointers to strings or data
    PUSH & POP flags
    PUSH & POP, repeat instructions
    PUSHW & POPW (half stack operations)

    Q

    quick start to making a simple Windows program

    R

    real numbers, declaring
    registers, automatically saving
    registers - fpu, mmx and xmm
    registers - preserved by Windows
    registers - traditional use
    registers, tutorial
    repeat instructions
    repeat (duplicate) data
    repeat structures
    REPORTTIME interrupt
    RETN - normal return
    return values from functions
    reverse store, no, in code
    reverse store, no, in data
    reverse storage, tutorial
    re-usable labels

    S

    sections
    segment overrides
    shared sections
    SHIELD and SHIELDSIZE, in frames
    signed numbers, tutorial
    64-bit assembly
    SIZEOF
    source information at compile-time
    speed of assembly, report
    square brackets
    SSE instructions
    stack frames, automated
    stack frames, callback
    stack frames, tutorial
    stack, tutorial
    starting address
    starting GoAsm
    static code libraries, using
    strings, declaring in data
    the STRINGS directive
    strings, pushing pointers on stack
    strings, short, in code
    strings, the actual characters
    strings, unicode
    strings, using in structures
    strings, using SIZEOF
    structures, general
    structures, formalised
    structures, nested
    symbols
    syntax aims

    T

    3DNow! instructions
    time of assembly, report
    tips for programming
    tutorials
    two's complement numbers
    type checking, no
    type indicators

    U

    uninitialised data, declaring
    uninitialised data section
    Unicode/ANSI switching
    Unicode API versions
    Unicode strings
    Unicode support
    unicows.dll
    unions
    unions, nested
    unique labels
    unscoped re-usable labels
    updates
    USEDATA statement
    USES statement

    V

    versions

    W

    warning messages
    window procedure, 32/64-bits
    window procedure, automated
    window procedure, manual
    window procedure, minimised
    Windows - beginners advice

    X

    x64 64-bit assembly
    x86 compatibility mode
    xmm registers

    Y

    Z

    Copyright © Jeremy Gordon 2001-2016
    Back to top