Based on an article originally written to explain relocs for the ArmLinux system. This rewrite is oriented more towards the i386 user, although the principles are the same.
Later, at link time, the linker would merge all the .o files, building a table of where symbols are ultimately located. Then the linker would run through the set of relocs, filling them in.
A reloc consists of three parts:
These relocs are scattered through the .o files, and are used at link time create the correct binary executable file. Once all the relocs are resolved, the linker is pretty well done its job.
At least this is the way things used to work, in the days of static linking.
With the introduction of run-time linking, the designers of the ELF format decided that relocs are a suitable entity to hold run-time resolution information. So now we have executable files which still have relocs in them, even after linking.
However, new algorithms are required to signal how these fixups are
to be done. Hence the introduction of a new, extended family of reloc numbers
(i.e. algorithms)
It sounds easy. To fix up a program, all we have to do is run through the relocs, calculating the new data and writing it into the reloc target address. NOT!
One of the targets of the ELF binary system is a separation of code and data. The code of apps and libraries is marked read-only and executable. The data is marked read-write, and not-executable.
The code is read-only so that multiple processes can use the code, having loaded the code into memory only once. Each process has its own page tables, mapping the code into its own memory. The code is never modified, and appears identical in each process space. Naturally, the code must be position independent; each process can load the app into a different address.
The code segment is allowed to contain constant pointers and strings (.rodata).
The data segment is read-write and is mapped into each process space differently. [In Linux, each data segment is loaded from the same base mmap, but it is marked copy-on-write; after the first write, each process has its own copy of the data.] Therefore, relocs can only point to the data segment.
This half-and-half nature of ELF binaries leads us to an interesting design point. Some of the relocs that we wish to make are in the data segment. These are easy to do: we can add relative offsets, or write absolute addresses with no problem. But the fixups in the code area are more difficult. The ELF reloc design forces us to make the code relocs "bounce off" an entry in the data area, known as the GOT (global offset table).
In other words, if code needs to refer to a global object, it instead refers to an entry in the GOT[], and at run-time, the GOT entry is fixed-up to point to the intended data. In this manner, the code space need never be fixed-up at run time. If the code needs to refer to a local object, it refers to it "relative to the &GOT[0]"; this too is position independent. NOTE 1
If the code needs to jump to a subroutine in a different module, the linker creates an array of jump-stubs, called the PLT (procedure linkup table). These jump-stubs execute indirectly, using an entry in the GOT[] to implement the far call.
Finally, ELF implements run time linking by deferring function resolution until the function is called. This means that calls to library functions go through a fixup process the first time that they are called.
The rest of this paper explains the operation of these concepts.
NOTE 1: Relative (GOTOFF) code is made "relative to the start of the GOT table". Instead, it could have been made "relative to the load address of the module", which would have been cleaner in my opinion. But there are reasons that some architectures chose the former, so we'll stick with it.
2) Object files which are going to be part of a library are a little different. For one thing, they must be compiled as PIC code, (Position Independent Code code), using the -fpic flag. This triggers the compiler to build the code in a non-standard manner. The compiler makes a distinction between local data/functions and global data/functions. Relocs in the code/.rodata sections must use GOT-based relocs, because the code/.rodata area itself cannot be modified at run time. The relocs are:
in code:
2-i) reference to local symbol: use the relative distance from the
GOT to the local symbol (R_*_GOTOFF); these relocs can exist in
the code area, because they will be fully resolved at link time
2-ii) reference to a global symbol: create an entry in the GOT and
let the run-time system deposit the symbol's address into the GOT for us
(R_*_GOT32)
2-iii) In addition, relative calls to subroutiine (R_*_PC32)
can be used.
2-iv) there is another reloc used by the compiler to implement function
prolog (R_*_GOTPC)
in data:
2-v) reference to symbol (R_*_32) [NOTE: symbols which are global
have a reloc that references the symbol by name; symbols which are local
can have a reloc that simply references the section number, and have a
section-offset contained in the reloc. See NOTE 2]
3) Executables need to be able to refer to global data (such as errno)
as if there is only one copy. ELF systems do this by copying global symbols
down into the application .bss space. Then the executable and all the libraries
point to this single copy. To realize this, we need relocs:
3-i) reach into a library to a symbol and copy down the data into our
own .bss space (R_*_COPY)
3-ii) pointer to global data (R_*_GLOB_DAT; only used by the
profiler)
3-iii) pointer to library function (R_*_JMP_SLOT)
Notice that all of these relocs must modifiy only the data section
of the executable; the code section is read-only! All the relocs from the
.o file have either been resolved, or mutated into one of the above 3.
4) Shared libraries are the most complex. By the time the library is
linked, all the R_*_GOTOFF relocs (from the .o files) are resolved.
4-i) All the R_*_GOT32 relocs are resolved, pointing at GOT
entries. At link time, these GOT entries get relocs of their own, pointing
to the global data/function. (R_*_GLOB_DAT/R_*_JMP_SLOT respectively).
4-ii) There will be times when local data structures need to hold absolute
pointers to local data. Put the module-relative address of the symbol in
the library; at run-time, add the module-load address to it (R_*_RELATIVE)
NOTE 3
Again, notice that all of these relocs must modifiy only the data section of the executable; the code section is read-only!
When the linker creates 3) and 4) above, the linker actually creates code and data that was not explicit in the .o files. There is a .plt section created in the code segment, which is an array of function stubs used to handle the run-time resolution of library calls. There is a .got section created in the data segment, which holds pointers to global symbols. Both of these synthetic sections are "helpers" to the code segment, since the code segment cannot be modified at run-time.
To make all this happen, the object files must contain information about whether a symbol is global or local, function or data, and the object size. (The old a.out scheme did not require all this extra info)
A detailed description of all the relocs is in Appendix A
NOTE 2 At this point, I'll mention that global relocs must neccessarily involve the three aspects of a reloc:
For instance, in this i386 code
.section .text
xorl %eax,
%eax ; sample code
.L1: call _do_something_local
movl %eax,
(%ebx)
.section .data
.L4: .word Lextern
.L3: .word .L1
; this
The code on the 3rd line (the call) needs to be fixed up, but that's easy, since it's a PC relative fixup.
If the .o file has no idea where .Lextern is, it must neccessarily
create a reloc which refers to
symbol Lextern.
.L4 .word 0 reloc_type=R_ARM_32 reloc_symbol=LexternThe word at .L3 needs a fixup as well. If the .o file can determine the location of a local symbol, such as .L1, then it is allowed to replace the symbol with a section-plus-offset. The offset is stored in the reloc target address, and the section is an entry in the reloc symbol table
.L3 .word 4 reloc_type=R_ARM_32 reloc_symbol=[.text]This reduces the number of symbols in the symbol table, making run-time linking easier.
NOTE 3 Notice that the R_*_GOTOFF and R_*_GOT32
relocs include an offset from &GOT[0}, which is usually about halfway
through the module. The R_*_RELATIVE relocs, on the other hand, contains
an offset from the beginning of the module. Why? Tradition. See Note 1
above
1) In the code:
call function_call_nThis is typical code using the relative jump or call. In static executables, the function call goes directly to the target. In dynamically-lined executables, the target is an entry in the PLT that is part of the app.
2) In the PLT: The PLT is a synthetic area, created by the linker. It exists in both executables and libraries. It is an array of stubs, one per imported function call.
On i386 architecture, this code looks like:
PLT[n+1]: jmp *GOT[n+3] push #n @push n as a signal to the resolver jmp PLT[0]A subroutine call to PLT[n+1] will result jumping indirect through GOT[n+3]. The first PLT entry is special, and the first 3 entries of the GOT are special, so hence these offsets.
Once everything is running properly, the GOT[n+3] is the address of the real subroutine, and execution continues.
However, when first invoked, GOT[n+3] points back to PLT[n+1]+6, which is the push/jmp sequence. Going through the PLT[0], the execution is directed to the resolver, a part of the dynamic loader. Resolver() uses the argument on the stack to determine 'n' and resolves the symbol 'n'. The resolver() then repairs GOT[n+3] to point directly at the target subroutine.
The first PLT entry is slightly different, and is used to form a trampoline to the fixup code.
PLT[0]: push &GOT[1] jmp GOT[2] @points to resolver()Flow is directed to the resolver routine. 'n' is already on the stack, and &GOT[1] gets added on the stack. This way the resolver (located in ld-linux.so.2) can determine which library is asking for its service.
3) In the GOT: The GOT (global offset table) contains helper pointers for both PLT fixups and GOT fixup. The first 3 entries are special/reserved. The next M entries belong to the PLT fixups. The next D entries belong to various data fixups.
The GOT is a synthetic area, created by the linker. It exists in both executables and libraries.
When the GOT is first set up, all the GOT entries relating to PLT fixups are pointing to code back in their respective PLT entries.
The special entries in the GOT are
GOT[0] = linked list pointer used
by the dyn-loader
GOT[1] = pointer to the reloc
table for this module
GOT[2] = pointer to the fixup/resolver
code, located in the ld-linux.so.2 library
followed by
GOT[3] .... GOT[3+M] = indirect
function call helpers, one per imported function
GOT[3+M+1] ...... GOT[end]
= indirect pointers for global data references, one per imported global
Remember that each library and executable gets its own PLT and GOT array.
Exectuable binary files include header information that indicates a load address. Libraries, because they are position-independent, don't need a load address, but contain a 0 in this field.
i386
Start | Len | Usage |
0 | 4k | zero page |
0000.1000 | 128M | not used |
0800.0000 | 896M | app code/data space
followed by small-malloc() space |
4000.0000 | 1G | mmap space
library load space large-malloc() space |
8000.0000 | 1G | stack space
working back from BFFF.FFE0 |
The kernel has a preferred location for mmap data objects, at 0x4000.0000. Since the libraries are loaded by mmap, and they have a don't-care load address, they end up here.
The library that most of us are using for malloc (GLIBC) handles small mallocs by calling sys_brk(), which extends the data area after the app, at 0x0800.0000+sizeof(app). Large mallocs are realized by creating a mmap, so these end up in the pool at 0x4000.0000.
As the mmap pool grows upward, the stack grows downward. Between them, they share 2G bytes.
The shared library design usually has kernel load the app first, then
the kernel-loader notices that it need support, and loads the dyn-loader
library (usually /lib/ld-linux.so.2) at 0x4000.0000. Execution is given
over to ld-linux.so.2, and it loades the rest of the libraries after itself.
You can see where libraries will load by using the utility ldd
ldd foo_app
There is a diagnostic case where the app is invoked by
/lib/ld-linux.so.2 foo_app
foo_arg ....
In this case, the ld-linux.so.2 is loaded as an app. Since
it was built as a library, it tries to load at 0. [In ArmLinux, this is
forbidden, so the kernel pushes it up to 0x1000.] Once ld-linux.so.2
loads, it reads it argv[1] and loads the foo_app at its
preferred location (0x0800.0000). Other libraries are loaded up a the mmap
area. So, in this case, the user memory map appears as
Start | Len | Usage |
0 | 128M | ld-linux.so.2
followed by small-malloc() space |
0800.0000 | 896M | app code/data space |
4000.0000 | 1G | mmap space
lib space large-malloc() space |
8000.0000 | 1G | stack space,
working backward from BFFF.FFE0 |
Notice that the small malloc space is much smaller in this case, but this is supposed to be for load-testing and diagnostics, so it's not too bad.
Please, if you need more text, let me know: patb@corel.ca
Here is some analysis of the i386 design:
Intel
in .o files; these are the old relocs......
Reloc | Meaning |
R_386_32 | simply deposit the absolute memory address of "symbol" into a dword |
R_386_PC32 | determine the destinance from this memory location to the "symbol", then add it to the value currently at this dword; deposit the result back into the dword |
These four were introduced with dynamic libraries; they are found only
in .o files which are going to be part of a library (pic code):
R_386_GOT32 | this reloc is going to persist through the link stage
the linker should mutate this into a R_386_GLOB_DATA in the library |
R_386_GOTPC | determine the distance from here to the GLOBAL_OFFSET_TABLE (&GOT[0])
and deposit the difference as a dword into this location (does not involve
a symbol!)
used in function prolog to calculate &GOT[0] |
R_386_GOTOFF | determine the distance from the GLOBAL_OFFSET_TABLE to the (local) "symbol"; store said distance in the dword at this location; create an entry in the GOT[]; change this reloc into a R_386_RELATIVE and point it at the GOT[] entry |
R_386_PLT32 | create a new entry in the PLT[] and GOT[]
determine the distance from here to the PLT[] entry and store that distance as a dword at this location at final link, rename the reloc to a R_386_JMP_SLOT, keeping the same "symbol" and point it at the GOT[] entry |
Executable files that are built "static" have no relocs in them. They run standalone.
In executable files which are intended to run with shared libraries......
R_386_JMP_SLOT | at dynamic link time, deposit the address of "symbol" (a subroutine) into this dword |
R_386_COPY | read a string of bytes from the "symbol" address and deposit a copy
into this location; the "symbol" object has an intrinsic length
i.e. move initialized data from a library down into the app data space |
Dynamic library files also have R_386_JMP_SLOT relocs, plus
R_386_GLOB_DATA | at load time, deposit the address of "symbol" into this dword; the
"symbol" is in another module
this reloc is, in a sense, the complement of the R_386_COPY above |
R_386_RELATIVE | at dynamic link time, read the dword at this location, add it to the run-time start address of this module; deposit the result back into this dword |
Note that R_386_32 relocs can appear in libraries as well. These must be executed carefully!
R_386_COPY and R_386_GLOB_DATA can be considered complements of each other. Suppose you have a global data object defined/initialized in a dynamic library. The library will have the binary version of the object in its .data space. When the application is built, the linker puts a R_386_COPY reloc in there to copy the data down to the application's .bss space. In turn, the library never references the original global object; it references the copy that is in the application data space, through a corresponding R_386_GLOB_DATA. Wierd, huh? After loading and copying, the original data (from the library) is never used; only the copy (in the app data space).
To make the whole dynamic linking operation happen, the linker introduces
several "synthetic" constructs into the target when you build an app or
a library:
.got == &GOT[0} | Global Offset Table: a small section of data memory where run-time fixups are made; there is only one of these per-app or per-library |
GLOBAL_OFFSET_TABLE | a pointer to the .got |
.plt == &PLT[0] | Procedure Lookup Table: a small section of code which helps the run-time resolution work properly |
The compiler can signal to the assembler that it wants to trigger one
of the above constructs by:
implicit func | i386 syntax | ARM syntax |
.got pointer | var@GOT(%ebx) | var(GOT) |
.got data | var@GOTOFF(%ebx) | var(GOTOFF) |
GLOBAL_OFFSET_TABLE | same | same |
.plt jump | func@PLT | func(PLT) |
Note that the C/C++ programmer does not allocate this memory; it is created by, and used by the linker.
To make the job of the linker a bit easier, the relocs are clustered
together in the app-file or the library-file.
.rel.bss section | contains all the R_386_COPY relocs |
.rel.plt section | contains all the R_386_JMP_SLOT relocs
these modify the first half of the GOT elements |
.rel.got section | contains all the R_386_GLOB_DATA relocs
these modify the second half of the GOT elements |
.rel.data section | contains all the R_386_32 and R_386_RELATIVE relocs |
Here is an excerpt from the reloc list for my version of /usr/bin/dir
Relocation section '.rel.got' at offset 0xb6c contains 1 entries: Offset Info Type Symbol's Value Symbol's Name 08054748 00106 R_386_GLOB_DAT 00000000 __gmon_start__
Relocation section '.rel.bss' at offset 0xb74 contains 8 entries: Offset Info Type Symbol's Value Symbol's Name 08054800 04405 R_386_COPY 08054800 __ctype_tolower 08054804 00605 R_386_COPY 08054804 stdout 08054808 03505 R_386_COPY 08054808 stderr 0805480c 01905 R_386_COPY 0805480c __ctype_toupper 08054810 01105 R_386_COPY 08054810 _nl_msg_cat_cntr 08054814 00905 R_386_COPY 08054814 __ctype_b 08054818 01405 R_386_COPY 08054818 optarg 0805481c 02205 R_386_COPY 0805481c optind
Relocation section '.rel.plt' at offset 0xbb4 contains 58 entries: Offset Info Type Symbol's Value Symbol's Name 08054660 00e07 R_386_JUMP_SLOT 08048dc4 readlink 08054664 03c07 R_386_JUMP_SLOT 08048dd4 getgrnam 08054668 02407 R_386_JUMP_SLOT 08048de4 ferror 0805466c 04107 R_386_JUMP_SLOT 08048df4 strchr 08054670 01007 R_386_JUMP_SLOT 08048e04 __overflow 08054674 04507 R_386_JUMP_SLOT 08048e14 __register_frame_info 08054678 01f07 R_386_JUMP_SLOT 08048e24 _obstack_begin 0805467c 02b07 R_386_JUMP_SLOT 08048e34 fnmatch 08054680 02907 R_386_JUMP_SLOT 08048e44 localtime 08054684 02f07 R_386_JUMP_SLOT 08048e54 strcmpWhen the reloc algorithm is invoked, it has direct access to:
Some architectures, like the M68k use a different reloc, called Rela, which has one extra parameter, called an addend. This makes the relocs 12 bytes each, instead of 8. The Rel is just as flexible (in my opinion :)
Example code
This appendix will show C code which triggers the new relocs.
Suppose we have this libary code:
typedef struct { char* p; char (*f)(int); } _st; char fPub(int a) {return 'a';} static char fLocal(int a) {return 'b';} static char cLocal; char cPub; _st a[] = { {&cLocal, // 1 fLocal}, // 2 {&cPub, // 3 fPub} }; // 4 int foo(int a) { // 5 return fPub(a) // 6 + fLocal(a) // 7 + (int) &cPub // 8 + cPub // 9 + (int) &cLocal // 10 + cLocal; // 11 }When the compiler builds the .o files, lines 1 and 2 are marked as needing a full 32 bit address; R_386_32 relocs are generated. But the address can be determined locally, so the symbols can be dropped and offsets used instead. Lines 3 & 4 will also generate a R_386_32 reloc, requesting a full absolute address, to be associated with the symbols "cPub" and "fPub".
Line 5 publishes a function foo() as a public symbol. Since it can be
called from outside, and it needs to be position independent (-fpic),
it needs to generate a local reference to the GOT. Early in the prolog
in foo(), the compiler will generate something like:
mov &GOT[0], %ebx
so that the rest of the subroutine has a reference to &GOT[0] for
further processing. Note that line 5 requires &GOT[0], which itself
requires a reloc: R_386_GOTPC, meaning "the distance from here to
the GOT". This extra reloc is the overhead of each public function that
is compiled -fpic.
Line 6 will trigger a R_386_PLT32 reloc, using the symbol "fPub". Line 7 also generates the same reloc, against the symbol "fLocal". (This reloc will disappear at the final link.)
Line 8 requires the address of a public object. This object location has to be flexible at run time, so a R_386_GOT32 reloc is used. Later, at link time, this will create an address slot in the GOT[].
Line 9 requires the "contents of" the same object. The object contents are fetched by using the address contained in the GOT[] entry. [Note that compiler is smart enough to use a single reloc to realize both line 8 and line 9]
Lines 10 & 11 require the address and contents of a local object. We don't know exactly where this object is going to be a run time, but we do know that it is local, and we can state its position relative to the &GOT[0]. So a pair of R_386_GOTOFF relocs are generated. [Again, the compiler may merge lines 10 and 11 into a single reloc, but it didn't when I built this example.]
[Because of the structure of the linker, full name resolution isn't checked until a link is made with an executable. In other words, if your library has unresolved references, you won't find out about it until you try to make an app using your library.]
To summarize, the .o file contains the following relocs:
in the data section:
Relocation section '.rel.data' at offset 0x470 contains 4 entries: Offset Info Type Symbol's Value Symbol's Name 00000000 00401 R_386_32 00000000 .bss 00000004 00201 R_386_32 00000000 .text 00000008 00c01 R_386_32 00000001 cPub 0000000c 00a01 R_386_32 00000000 fPuband in the code section:
Relocation section '.rel.text' at offset 0x440 contains 6 entries: Offset Info Type Symbol's Value Symbol's Name 00000028 00e0a R_386_GOTPC 00000000 _GLOBAL_OFFSET_TABLE_ 00000031 00a04 R_386_PLT32 00000000 fPub 0000003a 00604 R_386_PLT32 0000000c fLocal 00000049 00c03 R_386_GOT32 00000001 cPub 00000057 00409 R_386_GOTOFF 00000000 .bss 0000005e 00409 R_386_GOTOFF 00000000 .bss
Lines 3 & 4 remain as R_386_32 relocs, and will ask the dyn-linker for the full 32 bit absolute address to be deposited into the reloc target.
The reloc triggered by line 5 is fixed up fully, and does not appear in the library.
The reloc in line 6 will cause the linker to add a PLT entry, and a corresponding GOT entry. The latter gets a R_386_JUMP_SLOT reloc, using the symbol "fPub". [The code generated at line 6 appears to be a subroutine call into the PLT entry.]
The reloc at line 7 can be fully resolved by the final linker stage, so it is transformed into a direct call to fLocal().
The reloc at line 8 and 9 will cause the linker to add a GOT entry, which will hold &cPub. The GOT entry gets marked with a R_386_GLOB_DAT reloc, asking the dyn-linker for the full 32 bit abolute address.
The relocs at line 10 & 11 can be fully resolved at final link time. They turn into "find the data at &GOT[0] plus this offset", so no reloc is required.
As you can see, the 10 relocs in the .o file turn into 4 in the library.
Also, the PLT gets a new entry, and the GOT gets two new entries.
extern int fPub(int); extern int cPub; int main() { return fPub(123) // 1 + cPub; // 2 }When the .o file is created, there is a R_386_PC32 generated for "fPub" and a R_386_32 generated for the "cPub".
When the executable is created, the R_386_PC32 from line 1 will cause an entry in the PLT, and the code will call into the PLT. At the same time, the linker will create an entry in the GOT, which the PLT will jump through. The GOT entry will get a R_386_JUMP_SLOT reloc, using the symbol "fPub".
The data reference in line 2 will cause a local copy of the global cPub
to be created in the data space of the app. The data reference at line
2 is changed to point to this new global data, and the reloc is resolved.
This new global gets a R_386_COPY reloc, using the symbol "cPub". The symbol
has certain properties, including the fact that references data, and that
it has a length of 1 byte. At run time, the dyn-linker will find the symbol
cPub in one of the libraries and copy the 1 byte down from the library
into the app data space. The dyn-linker will then publish that latter address
as the address of "cPub".