Skip to content

ELF Files

The content of this section is derived from the ELF 1.2 standard, with some modifications and reorganization. The main references are as follows:

  1. ELF File Format Analysis, Peking University, Teng Qiming
  2. ELF - Destroying Christmas

Introduction

ELF (Executable and Linkable Format) files, also known as object files in Linux, mainly have the following three types:

  • Relocatable File: Contains code and data generated by the compiler. The linker will link it with other object files to create executable files or shared object files. In Linux systems, such files generally have the .o suffix.
  • Executable File: The programs we typically execute in Linux.

  • Shared Object File: Contains code and data; these files are what we call library files, generally ending with .so. Generally, there are two usage scenarios:

    • The linker (Link eDitor, ld) may process it along with other relocatable files and shared object files to generate another object file.
    • The Dynamic Linker combines it with an executable file and other shared objects to create a process image.

Regarding the naming of Link eDitor, https://en.wikipedia.org/wiki/GNU_linker

Object files are created by the assembler and linker. They are the binary form of text programs and can run directly on a processor. Programs that require a virtual machine to execute (such as Java) do not fall within this scope.

Here, we mainly focus on the ELF file format.

File Format

Object files participate in both program linking and program execution. For convenience and efficiency, depending on the process, the object file format provides two parallel views of its content, as shown below:

First, let's focus on the linking view.

At the beginning of the file is the ELF Header, which provides the overall organization of the entire file.

If a Program Header Table exists, it tells the system how to create a process. Object files used to generate a process must have a program header table, but relocatable files do not need this table.

The section part contains most of the information used in the linking view: instructions, data, symbol tables, relocation information, and so on.

The Section Header Table contains information describing the file's sections. Each section has an entry in the table, providing the section name, section size, and other information. Object files used for linking must have a section header table; other object files may or may not have one.

Here is a more visual representation of the linking view:

For the execution view, the main difference is that there are no sections, but rather multiple segments. In fact, these segments mostly originate from sections in the linking view.

Note:

Although the diagram arranges things in the order of ELF header, program header table, sections, and section header table, in reality, except for the ELF header, the other parts do not have a strict order.

Data Representation

The ELF file format supports 8-bit/32-bit architectures. Of course, this format is extensible and can also support processors with smaller or larger bit widths. Therefore, object files contain some control data that indicates the architecture used by the object file, allowing it to be identified and interpreted in a generic way. Other data in the object file is encoded in the format of the target processor, regardless of the machine on which it was created. What this essentially means is that object files can be cross-compiled — we can generate ARM platform executable code on an x86 platform.

All data structures in an object file follow "natural" size and alignment rules, as shown below:

Name Size Alignment Purpose
Elf32_Addr 4 4 Unsigned program address
Elf32_Half 2 2 Unsigned half integer
Elf32_Off 4 4 Unsigned file offset
Elf32_Sword 4 4 Signed large integer
Elf32_Word 4 4 Unsigned large integer
unsigned char 1 1 Unsigned small integer

If necessary, data structures can contain explicit padding to ensure that 4-byte objects are 4-byte aligned, to force the size of data structures to be a multiple of 4, and so on. Data is also aligned accordingly. Thus, a structure containing an Elf32_Addr member will be aligned on a 4-byte boundary in the file.

For portability, ELF files do not use bit fields.

Character Representation

To be determined.

Note: In the following discussion, we primarily focus on 32-bit as our basis for explanation.

ELF Header

The ELF Header describes the overview of the ELF file. Using this data structure, all information in the ELF file can be indexed. The data structure is as follows:

#define EI_NIDENT   16

typedef struct {
    unsigned char   e_ident[EI_NIDENT];
    ELF32_Half      e_type;
    ELF32_Half      e_machine;
    ELF32_Word      e_version;
    ELF32_Addr      e_entry;
    ELF32_Off       e_phoff;
    ELF32_Off       e_shoff;
    ELF32_Word      e_flags;
    ELF32_Half      e_ehsize;
    ELF32_Half      e_phentsize;
    ELF32_Half      e_phnum;
    ELF32_Half      e_shentsize;
    ELF32_Half      e_shnum;
    ELF32_Half      e_shstrndx;
} Elf32_Ehdr;

Each member starts with the prefix e, which should stand for ELF. The specific description of each member is as follows.

e_ident

As mentioned earlier, ELF provides an object file framework to support multiple processors and multiple encoding formats. This variable specifies how to decode and interpret the machine-independent data in the file. The meanings of different indices in this array are as follows:

Macro Name Index Purpose
EI_MAG0 0 File identification
EI_MAG1 1 File identification
EI_MAG2 2 File identification
EI_MAG3 3 File identification
EI_CLASS 4 File class
EI_DATA 5 Data encoding
EI_VERSION 6 File version
EI_PAD 7 Start of padding bytes

Among these,

e_ident[EI_MAG0] through e_ident[EI_MAG3], the first 4 bytes of the file, are called the "magic number" and identify the file as an ELF object file. As for why it starts with 0x7f, that has not been specifically researched.

Name Value Position
ELFMAG0 0x7f e_ident[EI_MAG0]
ELFMAG1 'E' e_ident[EI_MAG1]
ELFMAG2 'L' e_ident[EI_MAG2]
ELFMAG3 'F' e_ident[EI_MAG3]

e_ident[EI_CLASS] is the byte following e_ident[EI_MAG3] and identifies the file's class or capacity.

Name Value Meaning
ELFCLASSNONE 0 Invalid class
ELFCLASS32 1 32-bit file
ELFCLASS64 2 64-bit file

The ELF file is designed to be portable among machines with different byte lengths, without imposing the maximum or minimum byte length of the machine. ELFCLASS32 supports machines with file sizes and virtual address spaces up to 4GB; it uses the basic types defined above.

ELFCLASS64 is used for 64-bit architectures.

The e_ident[EI_DATA] byte specifies the encoding of processor-specific data in the object file. The currently defined encodings are:

Name Value Meaning
ELFDATANONE 0 Invalid data encoding
ELFDATA2LSB 1 Little-endian
ELFDATA2MSB 2 Big-endian

Other values are reserved and will be assigned to new encodings as necessary in the future.

The file data encoding indicates how the file contents should be parsed. As mentioned earlier, ELFCLASS32 files use variable types of 1, 2, and 4 bytes. For the different defined encodings, their representations are shown below, with the byte number in the upper-left corner.

ELFDATA2LSB encoding uses two's complement, with the Least Significant Byte occupying the lowest address.

ELFDATA2MSB encoding uses two's complement, with the Most Significant Byte occupying the lowest address.

e_ident[EI_DATA] specifies the version number of the ELF header. Currently, this value must be EV_CURRENT, which is the previously mentioned e_version.

e_ident[EI_PAD] specifies the starting address of unused bytes in e_ident. These bytes are reserved and set to 0; programs that process object files should ignore them. If these bytes are used in the future, the value of EI_PAD will change.

e_type

e_type identifies the object file type.

Name Value Meaning
ET_NONE 0 No file type
ET_REL 1 Relocatable file
ET_EXEC 2 Executable file
ET_DYN 3 Shared object file
ET_CORE 4 Core dump file
ET_LOPROC 0xff00 Processor-specific lower bound
ET_HIPROC 0xffff Processor-specific upper bound

Although the content of core dump files is not specified in detail, ET_CORE is still reserved to mark such files. From ET_LOPROC to ET_HIPROC (inclusive) is reserved for processor-specific scenarios. Other values may be assigned to new object file types as necessary in the future.

e_machine

This field specifies the machine architecture on which the current file can run.

Name Value Meaning
EM_NONE 0 No machine type
EM_M32 1 AT&T WE 32100
EM_SPARC 2 SPARC
EM_386 3 Intel 80386
EM_68K 4 Motorola 68000
EM_88K 5 Motorola 88000
EM_860 7 Intel 80860
EM_MIPS 8 MIPS RS3000

Here, EM should be an abbreviation for ELF Machine.

Other values are reserved for new machines as necessary in the future. Additionally, processor-specific ELF names use the machine name for differentiation, and flags generally have a prefix of EF_ (ELF Flag). For example, a flag called WIDGET on the EM_XYZ machine would be called EF_XYZ_WIDGET.

e_version

Identifies the version of the object file.

Name Value Meaning
EV_NONE 0 Invalid version
EV_CURRENT 1 Current version

1 indicates the initial file format; future extensions will use larger numbers. Although the value of EV_CURRENT is 1 above, it may change to reflect the current version number — for example, ELF is still only at version 1.2 to this day.

e_entry

This field specifies the virtual address where the system transfers control to the corresponding code in the ELF file. If there is no associated entry point, this field is 0.

e_phoff

This field gives the byte offset of the Program Header Table from the beginning of the file (Program Header table OFFset). If the file has no program header table, this value is 0.

e_shoff

This field gives the byte offset of the Section Header Table from the beginning of the file (Section Header table OFFset). If the file has no section header table, this value is 0.

e_flags

This field gives processor-specific flags associated with the file. These flags are named in the format EF_machine_flag.

e_ehsize

This field gives the byte size of the ELF file header (ELF Header Size).

e_phentsize

This field gives the byte size of each entry in the program header table (Program Header ENTry SIZE). All entries are the same size.

e_phnum

This field gives the number of entries in the program header table (Program Header entry NUMber). Therefore, the product of e_phnum and e_phentsize gives the total byte size of the program header table. If the file has no program header table, this value is 0.

e_shentsize

This field gives the byte size of a section header (Section Header ENTry SIZE). A section header is one entry in the section header table; all entries in the section header table occupy the same amount of space.

e_shnum

This field gives the number of entries in the section header table (Section Header NUMber). Therefore, the product of e_shnum and e_shentsize gives the total byte size of the section header table. If the file has no section header table, this value is 0.

e_shstrndx

This field gives the index of the section header table entry associated with the section name string table (Section Header table InDeX related with section name STRing table). If the file has no section name string table, this value is SHN_UNDEF. For more details, please refer to the "Sections" and "String Table" sections later.

Program Header Table

Overview

The Program Header Table is an array of structures, where each element is of type Elf32_Phdr, describing a segment or other information the system needs when preparing the program for execution. In the ELF header, e_phentsize and e_phnum specify the size of each element and the number of elements in this array. A segment in an object file contains one or more sections. Program headers are only meaningful for executable files and shared object files.

In other words, the Program Header Table is specifically designed for the segments used during ELF file runtime.

The data structure of Elf32_Phdr is as follows:

typedef struct {
    ELF32_Word  p_type;
    ELF32_Off   p_offset;
    ELF32_Addr  p_vaddr;
    ELF32_Addr  p_paddr;
    ELF32_Word  p_filesz;
    ELF32_Word  p_memsz;
    ELF32_Word  p_flags;
    ELF32_Word  p_align;
} Elf32_Phdr;

The description of each field is as follows:

Field Description
p_type This field indicates the type of the segment, or provides related information about the structure.
p_offset This field gives the offset from the beginning of the file to the first byte of the segment.
p_vaddr This field gives the virtual address of the first byte of the segment in memory.
p_paddr This field is only used in systems related to physical address addressing. Since "System V" ignores physical addressing for applications, the content of this field is not constrained for executable files and shared object files.
p_filesz This field gives the size of the segment in the file image, which may be 0.
p_memsz This field gives the size of the segment in the memory image, which may be 0.
p_flags This field gives flags associated with the segment.
p_align The p_vaddr and p_offset of loadable program segments must be integer multiples of the page size. This member gives the alignment for the segment in both the file and memory. If this value is 0 or 1, no alignment is required. Otherwise, p_align should be a positive integral power of 2, and p_vaddr should equal p_offset modulo p_align.

Segment Types

The segment types in an executable file are as follows:

Name Value Description
PT_NULL 0 Indicates the segment is unused, and other members of the structure are undefined.
PT_LOAD 1 This type of segment is a loadable segment, with its size described by p_filesz and p_memsz. Bytes from the file are mapped to the beginning of the corresponding memory segment. If p_memsz is greater than p_filesz, the "remaining" bytes are all set to 0. p_filesz cannot be greater than p_memsz. Loadable segments are sorted in ascending order by p_vaddr in the program header.
PT_DYNAMIC 2 This type of segment provides dynamic linking information.
PT_INTERP 3 This type of segment gives the location and length of a NULL-terminated string that will be invoked as the interpreter. This segment type is meaningful only for executable files (though it may also appear in shared object files). Furthermore, this segment may appear at most once in a file. If present, it must precede all loadable segment entries.
PT_NOTE 4 This type of segment gives the location and size of auxiliary information.
PT_SHLIB 5 This segment type is reserved, but its semantics are unspecified. Programs containing this type of segment do not conform to the ABI standard.
PT_PHDR 6 This segment type, if present, specifies the size and location of the program header table itself, both in the file and in memory. This type of segment may appear at most once in a file. Furthermore, it only appears if the program header table is part of the program's memory image. If present, it must precede all loadable segment entries.
PT_LOPROC~PT_HIPROC 0x70000000 ~0x7fffffff This range of types is reserved for processor-specific semantics.

Base Address

The virtual addresses in the program header may not be the actual virtual addresses in the program's memory image. Typically, executable programs contain code with absolute addresses. For the program to execute correctly, segments must reside at the corresponding virtual addresses. On the other hand, shared object files usually contain position-independent code. This allows shared object files to be loaded by multiple processes while maintaining correct program execution. Although the system selects different virtual addresses for different processes, it still preserves the relative addresses between segments, because position-independent code uses relative addresses between segments for addressing, and the differences between virtual addresses in memory must match the differences between virtual addresses in the file. The difference between the virtual address of any segment in memory and the corresponding virtual address in the file is a single constant value for any given executable or shared object. This difference is the base address, and one use of the base address is to relocate the program during dynamic linking.

The base address of an executable or shared object file is computed during execution from the following three values:

  • The virtual memory load address
  • The maximum page size
  • The lowest virtual address of the program's loadable segments

To compute the base address, first determine the smallest virtual memory address among the loadable segments' p_vaddr values, then round down that memory virtual address to the nearest multiple of the maximum page size — this is the base address. Depending on the type of file being loaded into memory, the memory address may or may not be the same as p_vaddr.

Segment Permissions - p_flags

A program loaded into memory has at least one loadable segment. When the system creates the memory image for a loadable segment, it sets the segment permissions according to p_flags. The possible segment permission bits are:

Among these, all bits in PF_MASKPROC are reserved for processor-specific semantic information.

If a permission bit is set to 0, that type of segment is not accessible. The actual memory permissions depend on the corresponding memory management unit, and different systems may operate differently. Although all permission combinations are possible, the system generally grants more permissions than requested. In any case, unless explicitly stated otherwise, a segment will not have write permission. Below are all possible combinations.

For example, generally speaking, the .text segment usually has read and execute permissions, but not write permission. The data segment typically has write, read, and execute permissions.

Segment Contents

A segment may contain one or more sections, but this does not affect program loading. Nevertheless, we still need various data to enable program execution, dynamic linking, and so on. Below we describe the typical contents of segments. For different segments, the order of sections and the number of sections they contain may differ. Additionally, processor-related constraints may alter the structure of the corresponding segment.

As shown below, the code segment contains only read-only instructions and data. Of course, this example does not cover all possible segments.

The data segment contains writable data and instructions. Typically, it includes the following:

The PT_DYNAMIC type element in the program header points to the .dynamic section. The GOT (Global Offset Table) and PLT (Procedure Linkage Table) contain information related to position-independent code. Although in the example given here the PLT section appears in the code segment, this may vary for different processors.

The .bss section has the type SHT_NOBITS, which means it does not occupy space in the ELF file, but it does occupy space in the executable file's memory image. Typically, uninitialized data is at the end of the segment, which is why p_memsz is larger than p_filesz.

Note:

  • Different segments may overlap, meaning different segments can contain the same sections.

Section Header Table

This data structure is actually located at the end of the ELF file (Why is it placed at the end of the file?), but for the convenience of explanation, we discuss this table here.

This structure is used to locate the specific position of each section in the ELF file.

First, the e_shoff field in the ELF header gives the byte offset from the beginning of the file to the section header table's location. e_shnum tells us the number of entries in the section header table; e_shentsize gives the byte size of each entry.

Second, the section header table is an array where each element is of type ELF32_Shdr, and each element describes an overview of a section.

ELF32_Shdr

Each section header can be described using the following data structure:

typedef struct {
    ELF32_Word      sh_name;
    ELF32_Word      sh_type;
    ELF32_Word      sh_flags;
    ELF32_Addr      sh_addr;
    ELF32_Off       sh_offset;
    ELF32_Word      sh_size;
    ELF32_Word      sh_link;
    ELF32_Word      sh_info;
    ELF32_Word      sh_addralign;
    ELF32_Word      sh_entsize;
} Elf32_Shdr;

The meaning of each field is as follows:

Member Description
sh_name Section name. This is an index into the Section Header String Table Section, so this field is actually a numeric value. The actual content in the string table is a NULL-terminated string.
sh_type Categorizes the section based on its content and semantics. The specific types are described below.
sh_flags Each bit represents a different flag, describing whether the section is writable, executable, requires memory allocation, and other attributes.
sh_addr If the section will appear in the process's memory image, this member gives the address where the section's first byte should reside in the process image. Otherwise, this field is 0.
sh_offset Gives the offset from the beginning of the file to the first byte of the section. Sections of type SHT_NOBITS do not occupy file space, so their sh_offset member gives a conceptual offset.
sh_size This member gives the byte size of the section. Unless the section type is SHT_NOBITS, the section occupies sh_size bytes in the file. A section of type SHT_NOBITS may have a non-zero length but does not occupy space in the file.
sh_link This member gives a section header table index link, whose specific interpretation depends on the section type.
sh_info This member gives additional information, whose interpretation depends on the section type.
sh_addralign Some sections have address alignment requirements. For example, if a section contains a doubleword variable, the system must ensure the entire section is doubleword aligned. In other words, sh\_addr \% sh\_addralign=0. Currently, only values of 0 and positive integral powers of 2 are allowed. Values of 0 and 1 indicate no alignment constraints.
sh_entsize Some sections contain tables of fixed-size entries, such as the symbol table. For such sections, this member gives the byte size of each entry. Otherwise, this member is 0.

As mentioned earlier, the section header at index zero (SHN_UNDEF) also exists and marks undefined section references. The information for this entry is as follows:

Field Name Value Description
sh_name 0 No name
sh_type SHT_NULL Inactive
sh_flags 0 No flags
sh_addr 0 No address
sh_offset 0 No file offset
sh_size 0 No size
sh_link SHN_UNDEF No link information
sh_info 0 No auxiliary info
sh_addralign 0 No alignment requirement
sh_entsize 0 No entries

Special Indices

Several special indices in the section header table are as follows:

Name Value Meaning
SHN_UNDEF 0 Marks undefined, missing, irrelevant, or otherwise meaningless section references. For example, a "defined" symbol associated with section number SHN_UNDEF is an undefined symbol. Note: Although index 0 is reserved for undefined values, the section header table still contains an entry for index 0. That is, if the ELF header's e_shnum is 6, the indices should be 0 through 5. More details will be explained later.
SHN_LORESERVE 0xff00 Lower bound of the reserved index value range.
SHN_LOPROC 0xff00 Processor-specific lower bound
SHN_HIPROC 0xff1f Processor-specific upper bound
SHN_ABS 0xfff1 Absolute value for the associated reference. For example, symbols associated with section number SHN_ABS have absolute values and are not affected by relocation.
SHN_COMMON 0xfff2 Symbols defined relative to this section are common symbols, such as FORTRAN COMMON or unallocated external variables in C.
SHN_HIRESERVE 0xffff Upper bound of the reserved index value range.

The system reserves index values between SHN_LORESERVE and SHN_HIRESERVE (inclusive), and these values are not referenced in the section header table. That is, the section header table does not contain entries for reserved indices. This is not entirely clear.

Selected Section Header Fields

sh_type

Section types currently have the following possible range. SHT is an abbreviation for Section Header Table.

Name Value Description
SHT_NULL 0 This section type is inactive; other members in this section header have undefined values.
SHT_PROGBITS 1 This section type contains program-defined information; its format and meaning are determined entirely by the program.
SHT_SYMTAB 2 This section type contains a symbol table (SYMbol TABle). Currently, an object file may only contain one section of each type, though this restriction may be relaxed in the future. Generally, SHT_SYMTAB sections provide symbols for link editing (i.e., ld), although they can also be used for dynamic linking.
SHT_STRTAB 3 This section type contains a string table (STRing TABle).
SHT_RELA 4 This section type contains relocation entries with explicit addends (RELocation entry with Addends), such as Elf32_Rela for 32-bit object files. An object file may have multiple relocation sections.
SHT_HASH 5 This section type contains a symbol hash table (HASH table).
SHT_DYNAMIC 6 This section type contains dynamic linking information (DYNAMIC linking).
SHT_NOTE 7 This section type contains information that marks the file in some way (NOTE).
SHT_NOBITS 8 This section type does not occupy file space but is otherwise similar to SHT_PROGBITS. Although this section type contains no bytes, its corresponding section header's sh_offset member still contains a conceptual file offset.
SHT_REL 9 This section type contains relocation entries without explicit addends (RELocation entry without Addends). For example, the Elf32_rel type for 32-bit object files. An object file may have multiple relocation sections.
SHT_SHLIB 10 This section type is reserved, but its semantics are not yet defined.
SHT_DYNSYM 11 As a complete symbol table, it may contain many symbols unnecessary for dynamic linking. Therefore, an object file may also contain an SHT_DYNSYM section that holds a minimal set of dynamic linking symbols to save space.
SHT_LOPROC 0X70000000 This value specifies the lower bound reserved for processor-specific semantics (LOw PROCessor-specific semantics).
SHT_HIPROC OX7FFFFFFF This value specifies the upper bound reserved for processor-specific semantics (HIgh PROCessor-specific semantics).
SHT_LOUSER 0X80000000 This value specifies the lower bound of indices reserved for applications.
SHT_HIUSER 0X8FFFFFFF This value specifies the upper bound of indices reserved for applications.

sh_flags

Each bit in the sh_flags field of a section header provides corresponding flag information, defining whether the content of the corresponding section can be modified, executed, and so on. If a flag bit is set, its value is 1; undefined bits are all 0. The currently defined values are as follows; other values are reserved.

Name Value Description
SHF_WRITE 0x1 This section contains data that is writable during process execution.
SHF_ALLOC 0x2 This section occupies memory during process execution. For certain control sections that do not occupy space in the object file's memory image, this attribute is off.
SHF_EXECINSTR 0x4 This section contains executable machine instructions (EXECutable INSTRuction).
SHF_MASKPROC 0xf0000000 All bits in this mask are reserved for processor-specific semantics.

When the section type differs, sh_link and sh_info will have different meanings.

sh_type sh_link sh_info
SHT_DYNAMIC Section header index of the string table used by the section 0
SHT_HASH Section header index of the symbol table used by this hash table 0
SHT_REL/SHT_RELA Section header index of the associated symbol table Section header index of the section to which relocation applies
SHT_SYMTAB/SHT_DYNSYM OS-specific information. In ELF files on Linux, this points to the offset in the Section Header Table of the string section corresponding to the symbols in the symbol table. OS-specific information
other SHN_UNDEF 0

Example

Here is a classic example of an ELF file.

When time permits, a better example will be provided with a concrete program.

References