Adding debug metadata – Advanced IR Generation-1
To allow source-level debugging, we have to add debug information. Support for debug information in LLVM uses debug metadata to describe the types of the source language and other static information, and intrinsics to track variable values. The LLVM core libraries generate debug information in the DWARF format on Unix systems and in PDB format for Windows. We’ll look at the general structure in the next section.
Understanding the general structure of debug metadata
To describe the general structure, LLVM uses metadata similar to the metadata for type-based analysis. The static structure describes the file, the compilation unit, functions and lexical blocks, and the used data types.
The main class we use is llvm::DIBuilder, and we need to use the llvm/IR/DIBuilder header file to get the class declaration. This builder class provides an easy-to-use interface to create the debug metadata. Later, the metadata is either added to LLVM objects such as global variables, or is used in calls to debug intrinsics. Here’s some important metadata that the builder class can create:
- llvm::DIFile: This describes a file using the filename and the absolute path of the directory containing the file. You use the createFile() method to create it. A file can contain the main compilation unit or it could contain imported declarations.
- llvm::DICompileUnit: This is used to describe the current compilation unit. Among other things, you specify the source language, a compiler-specific producer string, whether optimizations are enabled or not, and, of course, DIFile, in which the compilation unit resides. You create it with a call to createCompileUnit().
- llvm::DISubprogram: This describes a function. The most important information here is the scope (usually DICompileUnit or DISubprogram for a nested function), the name of the function, the mangled name of the function, and the function type. It is created with a call to createFunction().
- llvm::DILexicalBlock: This describes a lexical block and models the block scoping found in many high-level languages. You can create this with a call to createLexicalBlock().
LLVM makes no assumptions about the language your compiler translates. As a consequence, it has no information about the data types of the language. To support source-level debugging, especially displaying variable values in a debugger, type information must be added too. Here are some important constructs:
- The createBasicType() function, which returns a pointer to the llvm::DIBasicType class, creates the metadata to describe a basic type such as INTEGER in tinylang or int in C++. Besides the name of the type, the required parameters are the size in bits and the encoding – for example, if it is a signed or unsigned type.
- There are several ways to construct the metadata for composite data types, as represented by the llvm::DIComposite class. You can use the createArrayType(), createStructType(), createUnionType(), and createVectorType() functions to instantiate the metadata for array, struct, union, and vector data types, respectively. These functions require the parameter you expect, such as the base type and the number of subscriptions for an array type or a list of the field members of a struct type.
- There are also methods to support enumerations, templates, classes, and so on.
The list of functions shows you that you have to add every detail of the source language to the debug information. Let’s assume your instance of the llvm::DIBuilder class is called DBuilder. Let’s also assume that you have some tinylang source in a file called File.mod in the /home/llvmuser folder. Inside this file is the Func():INTEGER function at line 5, which contains a local VAR i:INTEGER declaration at line 7. Let’s create the metadata for this, beginning with the information for the file. You need to specify the filename and the absolute path of the folder in which the file resides:
llvm::DIFile *DbgFile = DBuilder.createFile(“File.mod”,
“/home/llvmuser”);