The "Go" tools
The RSDS pdb format
by Jeremy Gordon -
This file describes the format of the pdb (Program Database) files
of the "RSDS" or "DS" type which are emitted by Miscrosoft's link.exe
from version 7 and above.
For a description of the earlier "JG" format see Sven B. Schreiber's
excellent book:
Undocumented Windows 2000 Secrets.
If anyone wishes to add to or correct the information in this file
let me know and I'll include your contribution with an appropriate
acknowledgement.
What are PDB files?
The latest Microsoft linkers create a Program Database (pdb) file
when linking if the /DEBUG option, or /DEBUG:FULL option is chosen. The
pdb file contains information about the creation of the executable, and
also contains the symbol information in the latest CodeView format. The
executable contains a path and filename for the pdb file on the local
machine, together with an identification code, so that the correct pdb
file can be located. Neither the format of the pdb file itself nor the
latest CodeView format are documented. To my knowledge, the format has
changed twice already and it is likely to change again. Microsoft provide APIs
to analyse and report the contents of the pdb files in its Debug Information
Access (DIA) SDK, but unfortunately this is available only if you
subscribe to the Enterprise edition of MSDN or purchase Visual Studio
.NET.
PDB file information in the executable
The linker puts the filename of the pdb file made at link-time, and
its path on the local machine, in the "CODEVIEW" debug directory in the
executable. If this is missing, it's most likely because a dbg file was
made instead. This might occur for example if the REBASE program was
run after linking. In that case the path and filename of the pdb file
will be contained in the dbg file. The filename of the dbg file will
then appear in the "MISC" debug directory of the executable.
Where the "CODEVIEW" debug directory does contain the pdb file
information it will be in the following format:-
+0h dword "RSDS" signature
+4h GUID 16-byte Globally Unique Identifier
+14h dword "age"
+18h byte string zero terminated UTF8 path and file name
Here RSDS signature identifies the format. The Globally Unique Identifier is a
machine specific unique value. It is written here into the executable
and also into the pdb file so that the two can be identified as matched.
The "age" is a value which is incremented each time the executable and
its associated pdb file is remade by the linker.
Viewing the "CODEVIEW" debug directory
One of the easiest ways to do this is to use a tool which is capable of
displaying the contents of the executable visually. One such tool is
Wayne J. Radburn's PEview.
With this tool, open the executable and open IMAGE_NT_HEADERS on the left
pane. Click on IMAGE_OPTIONAL_HEADER and scroll down until you reach the
DEBUG directory entry. This gives the RVA (Relative Virtual Address) of
the information you are interested in. Make sure the toolbar is switched
to RVA values so that you can proceed (RVA is an address which would
apply if the executable was loaded into memory ready to run). You are
looking for the DEBUG directory in the executable and it will most likely
be buried inside the data of one of the sections. The most likely is the "rdata"
section. Click on that in the left hand pane and check that the DEBUG directory
now appears. If not, try the other sections looking for the RVA given
in the IMAGE_OPTIONAL_HEADER. The DEBUG directory contains pointers
to the debug information. If there is a "CODEVIEW" debug directory
in the file it will appear in the DEBUG directory, and it will also
appear in PEview's left pane. Click on this in the left pane to view
its contents.
Here is a typical example of the contents of the "CODEVIEW" debug
directory:-
Here the GUID is "B2DB2291-8FE8-4502-A20556A28496D442". "Age" is 7.
Then the path and filename of the pdb file follows. Note that this is
supposedly in the UTF-8 format, which means that filenames in non-Roman
characters can be used.
Viewing the PDB file
Since the PDB format keeps changing, you can't expect visual tools to
keep up with the changes and the files are best viewed using a hex editor
such as Paws, or dumped to
a file or printed using a hex filedump program such as Borland's tdump.
Nature of the PDB file
As Sven B. Schreiber worked out, the pdb file format is
similar to that used by a disk file system. A disk file system would be
divided into blocks of data called "sectors" of a fixed size. The data
from a file is contained in those sectors identified as spare
when the file is written to disk, but they are not necessarily
contiguous on the disk. A file directory keeps track of where the data
is on the disk. In pdb files, it might be more appropriate to call the
blocks of data "pages", the data from a file a "stream" and
the file directory the "stream directory".
Inside the PDB file - header
At the top of the pdb file is the header which appears in this dump:-
Turbo Dump Version 4.2.16.1 Copyright (c) 1988, 1996 Borland International
Display of File TESTGOBUG.PDB
000000: 4D 69 63 72 6F 73 6F 66 74 20 43 2F 43 2B 2B 20 Microsoft C/C++
000010: 4D 53 46 20 37 2E 30 30 0D 0A 1A 44 53 00 00 00 MSF 7.00...DS...
000020: 00 04 00 00 02 00 00 00 E3 00 00 00 B4 04 00 00 ................
000030: 00 00 00 00 E1 00 00 00 00 00 00 00 00 00 00 00 ................
000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
The character to look for here
is 1Ah which (in ascii terms) is an "end-of-file" character.
In this case appears (coincidentally) at offset +1Ah in the file and
marks the end of the string "Microsoft C/C++ MSF 7.00". It should
be noted that the length of this string does differ between different
pdb versions. The end-of-file character is immediately followed by the
signature which in this case is "DS", then a null-terminator and then
sufficient padding to bring the header to the next dword, which in this
case is at +20h.
At +20h we find the dword value 400h. This is the size of each
block of data which we might call the "page size". In other words the
pdb file is divided into blocks of 400h bytes (1,024 bytes in decimal).
At +24h there is the value 2h. I am not yet sure what
this represents.
At +28h there is the value 0E3h. This indicates how many pages there
are in the whole file. If this is multiplied by the page size of 400h
it produces 38C00h or 232,448 which is the size of the pdb file in bytes.
At +2Ch there is the value 4B4h (1,204 decimal). This is the total
size of the stream directory in bytes. Since each page is 1,024 bytes
we now know that the stream directory cover a complete page plus 180
bytes. This is important because the stream directory is not necessarily
contiguous in the file either as we shall see.
At +30h there is the value zero. I have not yet discovered what
this represents.
At +34h is the value 0E1h. This is a pointer to the stream directory
pointers. Multiplied by the page size of 400h the value 0E1h becomes
38400h. So at 38400h in the file we would expect to find the stream
directory pointers.
Inside the PDB file - stream directory pointers
Here is a dump of the file at 38400h holding the stream directory pointers:-
038400: DF 00 00 00 E0 00 00 00 00 00 00 00 00 00 00 00 ................
038410: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
038420: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
The stream directory pointers are in a very simple structure. We know
from the pdb header that the stream directory is in two pages so we would
expect two pointers. And we can see that the pointers are 0DFh and 0E0h.
Pointers are needed because the stream directory is not necessarily
contiguous in the file. To get the correct address each pointer needs
to be multiplied by the page size of 400h. So we can see that the first
page of the stream directory is at 0DFh * 400h = 37C00h, and then it
continues at 0E0h * 400h = 38000h.
Inside the PDB file - stream directory
The stream directory is a structure in the following form:-
+0h dword number of streams
+4h a dword for each stream giving the size in bytes of the stream
0=no stream
-1=no stream
+?h array of pointers to the streams
Here is a dump of the file at 37C00h:-
037C00: 15 00 00 00 48 03 00 00 59 00 00 00 98 F2 02 00 ....H...Y.......
037C10: D7 07 00 00 00 00 00 00 D0 0A 00 00 6C 03 00 00 ............l...
037C20: 18 11 00 00 AA 14 00 00 FF FF FF FF 19 00 00 00 ................
037C30: 70 00 00 00 B4 05 00 00 68 01 00 00 1C 00 00 00 p.......h.......
037C40: FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF ................
037C50: FF FF FF FF C8 00 00 00 D9 00 00 00 DE 00 00 00 ................
037C60: DC 00 00 00 18 00 00 00 19 00 00 00 1A 00 00 00 ................
037C70: 1B 00 00 00 1C 00 00 00 1D 00 00 00 1E 00 00 00 ................
037C80: 1F 00 00 00 20 00 00 00 21 00 00 00 22 00 00 00 .... ...!..."...
037C90: 23 00 00 00 24 00 00 00 25 00 00 00 26 00 00 00 #...$...%...&...
037CA0: 27 00 00 00 28 00 00 00 29 00 00 00 2A 00 00 00 '...(...)...*...
037CB0: 2B 00 00 00 2C 00 00 00 2D 00 00 00 2E 00 00 00 +...,...-.......
037CC0: 2F 00 00 00 30 00 00 00 31 00 00 00 32 00 00 00 /...0...1...2...
037CD0: 33 00 00 00 34 00 00 00 35 00 00 00 36 00 00 00 3...4...5...6...
037CE0: 37 00 00 00 38 00 00 00 39 00 00 00 3A 00 00 00 7...8...9...:...
The first dword contains the value 15h. This indicates that there are
21 streams of data in the file. It also means that there are 21 dwords
following (giving stream sizes). Page pointers therefore start at
+58h which is at 37C58h in the file.
The stream sizes indicate how many page pointers there are for each
stream. This is the same system as is used to indicate how many pointers
there are to the stream directory itself.
So for example, we can see that stream 1 is 348h bytes long. This
can be fitted into one page, so we would expect to find only one pointer
to stream 1. This pointer (at 37C58h) is 0D9h, which multiplied by the
page size of 400h is 36400h.
Stream 2 is 59h bytes long and its pointer is 0DEh * 400h = 37800h.
Stream 3 is 2F298h bytes (193,176 decimal) long. It therefore covers
189 pages and has 189 pointers starting at 37C5Ch. Its first page is
at 0DEh (37800h) its second page is at 0DCh (37000h), its third page is
at 18h (6000h) and so on.
Some stream sizes are either zero or -1 and these can be ignored.
There will be no page pointer at all for these streams.
The streams
I have not tried very hard to identify the contents of the streams
since it is reasonably easy to find the main one of interest (symbols).
Like in the "JG" type pdb files, the symbols stream is either the eighth
or the ninth stream. Streams 1 to 4 always seem to contain the same
type of information. Above stream 4 the contents of the streams tend to
vary. Sometimes streams are missing altogether or other streams are
added. So far I have not found an index indicating what the streams
contain.
The ones I have identified so far are:-
- Stream 1 - (possibly) previous stream directory.
- Stream 2 - pdb file authenticity.
- Stream 3 - material from the .debug$S and .debug$T sections
in the object file. This can be voluminous, since
it will contain a lot of unused material, for example
structures and structure members from include files
referred to in the source script.
- Stream 4 - files used in the build process.
- Stream 8 or 9 - symbols.
- Above stream 8 you will find section data, other debug symbols,
linker own file information and linked import information.
Stream 2 - pdb file authenticity
This field in important because it allows a check to be made to ensure
that the pdb file matches the executable concerned.
Here is a dump of the file at 37800h holding the pdb file authenticity:-
037800: 94 2E 31 01 25 55 1A 40 07 00 00 00 91 22 DB B2 ..1.%U.@....."..
037810: E8 8F 02 45 A2 05 56 A2 84 96 D4 42 11 00 00 00 ...E..V....B....
037820: 2F 4C 69 6E 6B 49 6E 66 6F 00 2F 6E 61 6D 65 73 /LinkInfo./names
037830: 00 02 00 00 00 04 00 00 00 01 00 00 00 06 00 00 ................
037840: 00 00 00 00 00 0A 00 00 00 0A 00 00 00 00 00 00 ................
037850: 00 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
If you compare this with the "age" and GUID in the "CODEVIEW" debug
directory which we saw in the executable, you can see that there is
an exact match. Here the age is at +8h, and the GUID is at +0Ch.
There is also a timedate here at +4h, but this will not necessarily
match that in the executable.
Symbol stream
In "DS" files each symbol is in the following structure which is similar
to that found in the earlier "JG" files, except that the symbol type
numbers have changed and the string containing the symbol name is no
longer preceded by a size byte (ie. it's no longer a pascal string):-
+0h word - size of structure not including this word but
including the padding after the string
+2h word - type of symbol. So far the following are known:-
1108h = data type (from h or inc file)
110Ch = symbol marked as "static" in the object file
110Eh = global data variables, function names, imported functions
local variables
1125h = function prototype
+4h dword - reserved
+8h dword - offset value
+0Ch word - section number
+0Eh bytes - null terminated string containing symbol name
+?h bytes - padding to next dword
Copyright © Jeremy Gordon 2004
Back to top
|