This is an introductory article about the ELF object files for people who want to build a back-end code generator for a compiler from scratch.

Copyright(c) September 2008, Roberto García López

Revision 2: October 2014, Roberto García López

Acknowledgments

The realization of this article would not be possible without the collaboration of all those people out there in the cyberspace.

I want to say thank you to those people that help me in the public IRCs at irc.node.net in the channels #ubuntu and ##c.

But I want to give special thanks to Dave Poirier from IEEE, Michael Matz from Suse Linux because they always spent a lot of their time explaining me all I needed and they always answer my questions almost immediately. And special thanks to the open source movement.

Notes about revision 2

This revision is needed because the code I wrote in September 2008 doesn’t compile in modern Linux distributions and consequently some readers has sent me emails notifying me about that.

I’ve tested the whole source code in Ubuntu 14.04 but it has to work in any Linux distribution.

More important, the original article was hosted at http://knol.google.com and this service was discontinued so the article was lost.

I keep the original source code here in case you want to run it in a Linux distribution of the year 2008 or earlier.

Introduction

In this article we discuss the construction of the skeleton of an ELF relocatable object file ready to be linked and executed in an Intel i386 processor.

Prerequisites

This article assumes that you are familiar with the C language and the Assembler language.

If you are not familiar with the structure of ELF object files you must read the TIS ELF documentation.[1]

The sources in this tutorial were done using the Netwide Assembler version 2.10.09 and GNU C compiler version 4.8.2. To compile the tutorial’s source code the following packages must be installed in your system libelfg0 and libelfg0-dev.

All sources were tested in Ubuntu v14.04.

Be sure to install the libelfg0-dev because it is not installed by default in Ubuntu 14.04. You can install it using the Ubuntu Software Center.

Getting started

The goal of this article is to create a program capable of generate the simplest static skeleton of an ELF relocatable object file for Linux using the language C. With static I mean that the code generator will generate the same object file every time the user invokes it. The code inside this file will be the same always.

Once you run the code generator it will creates an object file which will be linked with the GNU Linker (ld) or any other linker.

The code generator uses the libelf library to alleviate the code writing. I’ll try to explain the most of the functions needed to create the object files. If you want to learn about how to obtain information from an ELF file then you must read the article written by Joseph Koshy[2].

All the structures that use the libelf library are explained in detail in the TIS (Committee) ELF documentation. This committee is in charge of the maintenance of the ELF file format. That’s why the ELF format is considered an open standard.

The code generator that will be built in this article is specific for the Intel 80386 architecture. But of course, changing some constants and a few lines of code you can get a code generator for any other Unix-like operating system and any other architecture.

All the source code used in each tutorial is available here. The files inside the package are available under the GPL v2.

Building the object file’s skeleton

This section is dedicated to the creation of the skeleton of the object file. Skeleton means the structure of an object file; in this case it will be a file that has only the minimum content needed to be linkable and executed.

For this section of the article we will use some tutorials which will show us how to build the skeleton of an object file that is ready to be linked. Each tutorial has a C source program ready to be compiled and executed. Each file is numbered in this way skeleton.x.c; where x correspond to the number of the tutorial.

For example, the file skeleton.1.c creates an object file that the only thing it does is return the control to the operating system. But the skeleton.5.c creates an object file that make a call to a function in the libc.so library.

To create the skeleton we need to write some code in the .text section of the object file so the faster way to do it is creating an assembler file with the same functionality than the skeleton that creates our C program, then we copy the data in each section of the object file generated by NASM and we write it in the same way in our C program.

Here is the code in the .text section of the file skeleton.1.asm

section .text _start: mov eax, 1 ; system call number (sys_exit) int 0x80

When we compile it with NASM it produces the following opcodes {0xb8, 1, 0, 0, 0, 0xcd, 0x80} then we will use them as our .text section code in the skeleton.x.c file with a statement like this

const char assemblerInstructions[] = { 0xb8, 1, 0, 0, 0, 0xcd, 0x80 };

The same we will do with .data section and so on. Because the algorithm needed to produce the opcodes for the .text section is beyond the scope of this article.

Once we run the program created in each tutorial it will create an object file named generated.o. This file will be ready to be linked with the GNU linker ld; which is the goal of this article.

Tutorial #1. Returning the control to the operating system

This tutorial shows how to return the control to the operating system.

The source file for this tutorial is skeleton.1.c.

The first thing we need to do is initialize the libelf library setting the version of the ELF to the current version with this statement

if (elf_version(EV_CURRENT) == EV_NONE) errx(EX_SOFTWARE, "ELF library initialization failed: %s", elf_errmsg(-1));

Important: The call to the function elf_version() must appear before the call to the function elf_begin(), otherwise the call to the function elf_begin() will fail.

Now it’s time to create the file where the code will be written. It’s done through the open() function in this way

int FileDes = open("generated.o", O_CREAT | O_WRONLY | O_TRUNC, 0777);

At this point we need to create an ELF handler with this statement

Elf *pElf = elf_begin(FileDes, ELF_C_WRITE, NULL);

The third argument is ignored when the second is ELF_C_WRITE. ELF_C_WRITE indicates that the ELF file will be created.

Creating the ELF header

The next step is to create the ELF header

Elf32_Ehdr *pEhdr = elf32_newehdr(pElf);

The elf32_newehdr() function create an ELF header structure and initialize it members to their default values. Because of that we need to specify some values that are specifics to the i386.

pEhdr->e_ident[EI_CLASS] = ELFCLASS32; // Defined by Intel architecture pEhdr->e_ident[EI_DATA] = ELFDATA2LSB; // Defined by Intel architecture pEhdr->e_machine = EM_386; // Intel architecture pEhdr->e_type = ET_REL; // Relocatable file (object file) pEhdr->e_shstrndx = 1; // Point to the .shstrtab section

e_shstrndx is equal to 1 because in this code generator it’s always the first section. But the order of the sections is not important.

Creating the sections

Once the ELF header is done it’s time to create the sections of the file.

Creating the section .shstrtab

We will start creating the sections header’s string table (.shstrtab section). This section only contains the names (strings) of each section in the object file.

Elf_Scn *pScn = elf_newscn(pElf); // (1)

The elf_newscn() function create a new section so the next step is to create a data for that section with the following statement

Elf_Data *pData = elf_newdata(pScn); // (2)

Now we have to assign values to the data of that section. The first field is the alignment of the section. This section is always aligned to one

pData->d_align = 1;

The offset of the each section is computed by the libelf library so we don’t have to worry about it.

// pData->d_off = 0;

Now we will indicate what will be the content of the section through the d_buf field. For this section the content is an array that contains the names of each section.

const char defaultStrTable[] = { /* offset 00 */ '', // The NULL section /* offset 01 */ '.', 's', 'h', 's', 't', 'r', 't', 'a', 'b', '', /* offset 11 */ '.', 's', 't', 'r', 't', 'a', 'b', '', /* offset 19 */ '.', 's', 'y', 'm', 't', 'a', 'b', '', /* offset 27 */ '.', 'c', 'o', 'm', 'm', 'e', 'n', 't', '', /* offset 36 */ '.', 'b', 's', 's', '', /* offset 41 */ '.', 'd', 'a', 't', 'a', '', /* offset 47 */ '.', 'r', 'e', 'l', '.', 't', 'e', 'x', 't', '', /* offset 57 */ '.', 't', 'e', 'x', 't', '' }; pData->d_buf = (void *) defaultStrTable;

The ELF type for all sections is ELF_T_BYTE so

pData->d_type = ELF_T_BYTE;

Now we must specify the size of the content’s section, it means the size of the content in the field d_buf.

pData->d_size = sizeof(defaultStrTable);

The version is always the current version and it’s filled by libelf so we don’t have to specify it.

// pData->d_version = EV_CURRENT;

Now it’s time to get the section’s header with the function elf32_getshdr() to configure the rest of the section.

Elf32_Shdr *pShdr = elf32_getshdr(pScn);

The first step here is to set the name of the section through the sh_name field. This field contain an integer value that indicates the offset where the name appear in the default string table

(defaultStrTable[]). In our case it’s the first string.

pShdr->sh_name = 1; // Point to the name of the section

The fields below has the values that indicate the TIS ELF documentation

pShdr->sh_type = SHT_STRTAB; pShdr->sh_flags = 0;

The rest of the fields of the structure are configured by the libelf library.

Creating the section .strtab

Now it’s time to create the .strtab section. This is the section that contains all the strings of the object file but the strings of the section .shstrtab.

To create the section .strtab we need to repeat the steps above.

We need to create a new section in the same way we did in (1). The next step is to create a new data for the section like we did in (2). Specify the alignment for the section in this case is 1. The content for this section is all the strings that the object file will use (see skeleton.1.c for details) in the source it’s an array named strtab[]. The first string is the null string because of TIS documentation. The type of the data is ELF_T_BYTE. The size of the data is the sizeof the strtab[] (the array that contains all the strings). The type of this section is SHT_STRTAB. And has no flags because of TIS documentation.

Creating the section .symtab

The .symtab section contains the symbol table of the object file. Every symbol that appears in the object file must be declared in this section. Some sections must be declared here.

To create the section we need to repeat the same steps above (1) and (2). The alignment is 4 for this section. The buffer is an array of Elf32_Sym[] (see skeleton.1.c for details). Here is the description of each item in the array for skeleton.1.c. The first section is always the null section. See the TIS documentation. The second section is always the definition of the source file (for example generated.pas or generated.c). The third section is the definition of the .text section (although it is not necessary because the declaration of a section is mainly for relocations and in this example it is not needed any relocation and usually have STB_LOCAL binding. And the last declaration in this section is the declaration of _start symbol. This is the default entry point for the GNU linker. In this example the value of _start is zero because it is located before the first instruction in the .text section so its value is zero. The binding is STB_GLOBAL and must be defined in only one object file if you want to compile more than one object file as one binary executable or a as shared object. This symbol is like the main function in a C program. The type for the data is ELF_T_BYTE. The type for the section is SHT_SYMTAB. There are no flags for this section. The member sh_link points to the section that contains the strings, in the most of the cases it is the .strtab section. In the member sh_info the value is assigned through the macro ELF_ST_INFO(b, t). The b comes from bind and t from type. The t argument is one greater than the symbol table index of the last local symbol and the b argument is STB_LOCAL. This is what the TIS documentation says (see book III page 1-2 figure 1-1). We must take care about the value for t because if the value is not the correct then will appear unpredictable errors.

Creating the section .text

This is the last section for the first demonstration of this tutorial. This is the section that contains the executable code.

To create the section we need to repeat the same steps above (1) and (2). The alignment is 4. The type for the data is ELF_T_BYTE. The type for this section is SHT_PROGBITS. The flags for this section are SHF_ALLOC and SHF_EXECINSTR. See TIS documentation book I page 1-15 figure 1-13.

Writing the generated file to the disk

At this point all the information needed for the relocatable object file has been supplied so finally we will write it to the disk. But before writing the data to the disk we need to adjust a lot of things like the offset in the object file for each section and so on. But fortunately libelf does that for us. The only thing we need to do is to update all these things with the following statement

elf_update(pElf, ELF_C_NULL);

So now we are ready to write the file to the disk. To realize it we use the following statement

elf_update(pElf, ELF_C_WRITE);

Closing handlers

At this point we don’t need to do anything else so we will close all handlers used so far with the following statements

elf_end(pElf);

The above statement frees the memory and all internal data used by the library. So we don’t have to worry about resource leak.

The last handler we need to free is the file descriptor

close(FileDes);

Tutorial #2. Printing a string.

This tutorial shows how to print a string calling the kernel and incorporate the .data and .rel.text sections.

The source file for this tutorial is skeleton.2.c.

As we can see skeleton.2.c is skeleton.1.c with a few more lines.

Creating the sections

Now we will see what goes in the .data and .rel.text section.

Creating the section .data

We will start creating the section .data. This section contains all the data that can be modified during the execution of the program. To create this section we will repeat the same steps that we did in the last tutorial.

In this section will be stored the values of each initialized symbol (variable, constants).

Creating the section .rel.text

Every time an instruction makes reference to a symbol located in the data segment we need to declare a relocation against the symbol’s address in the instruction. All relocations are stored in this section.

Here is a portion of the source code skeleton.2.asm

_start: mov ecx, msg ; pointer to string mov edx, 12 ; length of string to print mov ebx, 1 ; where to write, stdout mov eax, 4 ; write sysout command to int 80 hex int 0x80 ; interrupt 80 hex, call kernel

For this program is needed only one relocation because msg is defined in the .data section.

Relocation consists of an array of Elf32_Rel structures for the Intel architecture. The structure contains the fields

offset. The offset of the symbol info. This field is filled with the macro ELF32_R_INFO. The first argument is the index in the symbol table where the symbol is (.bbs section, .data section, etc) and the second argument is the constant R_386_32 because it’s a relocation against an internal symbol. If the relocation is against an external symbol we must use the constant R_386_PC32. External symbol mean that the symbol reside outside the object file we’re creating. The alignment for this section is 4 The type of the section is SHT_REL This section has no flags The sh_link member points to the symbol table and sh_info points to the section where the relocation will be applied.

Modifying the section .strtab



In this section the only modification needed is to add the symbol “msg” to the buffer.

Once we have finished building the file skeleton.1.c we have to add the .data section with a string “Hello world” using an alignment of 4 bytes.

Add the symbol “msg” to the .strtab section.

Now we have to update the .symtab section to put the .data section as TIS ELF Documentation indicates.

Add the “msg” symbol and connect this with its physical location in the .data section through st_name field.

Update the sh_info field of the .symtab section. Remember that it points to one item greater than the last local. And because we added the msg symbol and the .data section to the .symtab section we need to update this field too.

Update the .text section with the new code.

Create the .rel.text section. The name comes from the convention described in TIS ELF v1.2 on page 1-4 in book III. That says “.relname and .relaname”. Conventionally, name is supplied by the section to which the relocations apply. Thus a relocation section for .text normally would have the name .rel.text or .rela.text.

If we add all this to skeleton.1.c we will get skeleton.2.c.

Tutorial #3. Introducing the “.rodata” section.

This tutorial is the same as tutorial #2. The only difference is that the section .data has been replaced by the section .rodata. This section is used to store the global constants of the object file. The data stored there can’t be modified.

Compare the sources skeleton.2.c and skeleton.3.c to see the differences.

Tutorial #4. Calling internal procedures.

This tutorial shows how to call an internal procedure.

For this tutorial there isn’t an equivalent in C because it doesn’t have any difference with the tutorial #2 and tutorial #3.

_start: ; prepare to call the procedure "_printstr" push msg push dword [len] ; call the procedure call _printstr

For this example are needed two relocations one against msg and the other against len.

There isn’t need for a relocation for the procedure _PrintStr because it is local to the file and reside in the .text section.

It is not necessary any other explanation for this tutorial so we can go to the next one.

Tutorial #5. Calling external functions.

This tutorial shows how to invoke a function that resides in a shared object from our binary executable file. We will invoke the function printf() that reside in libc.so.

For more examples in this topic read the tutorial from the University of Maryland at Baltimore County[3].

The only thing to have into account when calling an external function is the order of the arguments when they are stored in the stack before calling the function the most of the functions are coded using C so the order of the arguments is the last is pushed first into the stack and the first is pushed last (for more information about calling functions and it arguments read the Dragon Book[4]).

In the .strtab section we need to add the printf symbol.

In the .symtab section we need to add the reference to the function printf(). Remember that the field st_info must be filled with this statement

st_info = ELF32_ST_INFO(STB_GLOBAL, STT_NOTYPE);

The index for any external symbol is undefined so it must be filled with this statement

st_shndx = SHN_UNDEF;

And finally we need to create a relocation against the printf() function in the .reltext section with this statement

r_info = ELF32_R_INFO(x, R_386_PC32);

Where x is the index in the .symtab section where the symbol reside and the constant R_386_PC32 is used if the symbol is external.

See the file skeleton.5.c for a complete example.

Conclusion

As we have seen the structure of an ELF file is simple.

To create it we need to create the header of the file and the sections needed. There are optional sections and required sections which are listed below

.shstrtab . This section contains the names of each section.

. This section contains the names of each section. .strtab . This section contains all the strings.

. This section contains all the strings. .symtab . This section contains the symbol table.

. This section contains the symbol table. .text. This section contains the executable code.

The other sections can be added as needed.

Your contribution to the source code or the article in any way will be appreciated!