Monday, May 16, 2011

GDB scripting example: reloading symbols for UEFI target

As part of helping the GSoC student with debugging, I got tired of manually groking output and loading module symbols in GDB, because it's a long and painful process. I knew GDB had scripting facilities, so I figured I'd use them. Never in my wildest dreams did I expect Python support (whoo, whoo!).

To be honest, I fully expect every serious place working with UEFI and using GDB already has something like this. But no one has been sharing...and I really don't mind. My Python is not too 1337, but I'm pretty happy with the end result.

The Python bindings expose the whole type/value concept pretty well, so there is no excuse to be dealing with hardcoded reads from memory offsets. That gets extremely tiring to implement (in UEFI case, manually locating the loaded image table and then reading symbol file name involves about 10-12 different structures), is completely unmaintainable, and is not portable across varying architectures.

I've decided to avoid such mess, and given that I can't describe the types manually (and I don't want to), I end up using a dummy binary (GdbSyms.dll) that just contains the symbols I care about. This binary is loaded using symbol-file, then my logic builds a list of files to add-symbol-file, the dummy binary is unloaded, and the real symbols are then loaded. This dummy binary is built in the EDK2 TianoCore environment for the architecture you are wishing to debug.

Debugging session looks something like this (-o flag used if PE files were converted from Mach-O or ELF)
$ gdb
(gdb) source gdb_uefi.py
(gdb) reload-uefi -o /path/to/GdbSyms.dll
(gdb) ... now have symbols, can do backtrace, etc ...

One minor annoyance is that the Python bindings are new enough that it has taken a few releases to stabilize them. In my case my GDB was old enough that certain operations were kind of awkward, but I figured the end result being more portable was worth it ;-). Some examples:

    #
    # Sets a field in a struct to a value, i.e.
    #      value->field_name = data.
    #
    # Newer Py bindings to Gdb provide access to the inferior
    # memory, but not all, so have to do it this awkward way.
    #

    def set_field (self, value, field_name, data):
        gdb.execute ("set *(%s *) 0x%x = 0x%x" % \
                         (str (value[field_name].type), \
                              long (value[field_name].address), \
                              data))

    #
    # Returns data backing a gdb.Value as an array.
    # Same comment as above regarding newer Py bindings...
    #

    def value_data (self, value, bytes=0):
        value_address = gdb.Value (value.address)
        array_t = self.ptype ('UINT8')
        value_array = value_address.cast (array_t)
        if bytes == 0:
            bytes = value.type.sizeof
        data = array.array ('B')
        for i in range (0, bytes):
            data.append (value_array[i])
        return data

    #
    # Returns a UTF16 string corresponding to a (CHAR16 *) value in EFI.
    #

    def parse_utf16 (self, value):
        index = 0
        data = array.array ('H')
        while value[index] != 0:
            data.append (value[index])
            index = index + 1
        return data.tostring ().decode ('utf-16')

Using something like set_field was necessary, as I had to compute CRC32 of a structure at some point, with the CRC field naturally nulled-out. Overall, because I stuck to symbols instead of memory reads/writes, the whole affair took probably about 3-4 hours, mostly spent getting adjusted to the Python bindings and their limitations.

Because I always deal with abstract symbols, even dealing with parsing PE/COFF files is much simplfied. For example, dealing with 32- and 64-bit differences is a breeze:
    #
    # Returns True if pe_headers refer to a PE32+ image.
    #

    def pe_is_64 (self, pe_headers):
        if pe_headers['Pe32']['OptionalHeader']['Magic'] == self.PE32PLUS_MAGIC:
            return True
        return False

    #
    # Returns the PE (not so) optional header.
    #

    def pe_optional (self, pe):
        if self.pe_is_64 (pe):
            return pe['Pe32Plus']['OptionalHeader']
        else:
            return pe['Pe32']['OptionalHeader']

    #
    # Returns the symbol file name for a PE image.
    #

    def pe_parse_debug (self, pe):
        opt = self.pe_optional (pe)
        debug_dir_entry = opt['DataDirectory'][6]
        dep = debug_dir_entry['VirtualAddress'] + opt['ImageBase']
        dep = dep.cast (self.ptype ('EFI_IMAGE_DEBUG_DIRECTORY_ENTRY'))
        cvp = dep.dereference ()['RVA'] + opt['ImageBase']
        cvv = cvp.cast(self.ptype ('UINT32')).dereference ()
        if cvv == self.CV_NB10:
            return cvp + self.sizeof('EFI_IMAGE_DEBUG_CODEVIEW_NB10_ENTRY')
        elif cvv == self.CV_RSDS:
            return cvp + self.sizeof('EFI_IMAGE_DEBUG_CODEVIEW_RSDS_ENTRY')
        elif cvv == self.CV_MTOC:
            return cvp + self.sizeof('EFI_IMAGE_DEBUG_CODEVIEW_MTOC_ENTRY')
        return gdb.Value(self.EINVAL)

The actual script is at https://github.com/andreiw/andreiw-wip/blob/master/uefi/DebugPkg/Scripts/gdb_uefi.py. Prebuilts of the dummy GdbSyms for X64 and IA32, as well as sources for it (buildable in EDK2) are at https://github.com/andreiw/andreiw-wip/tree/master/uefi/DebugPkg. I'll upload an ARM prebuilt soon as well.

Here are some links on the GDB scripting facilities. This is a blog in french, but the text isn't exactly Le Rouge et le Noir, so it's pretty easy to comprehend...
http://blog.nibbles.fr/1138

Pretty printers for GDB -
http://tromey.com/blog/?p=524
http://blog.rethinkdb.com/make-debugging-easier-with-custom-pretty-prin

Official documentation (note, some of the stuff here isn't in gdb 7.1, which is what I use...)
http://sourceware.org/gdb/onlinedocs/gdb/Python-API.html#Python-API

Pretty slides highlighting use -
http://people.fedoraproject.org/~dmalcolm/presentations/PyCon-US-2011/GdbPythonPresentation/GdbPython.html#1

No comments:

Post a Comment