|
| 1 | +--- |
| 2 | +layout: post |
| 3 | +title: Hooking Internal Functions of ELF Binaries with LD_PRELOAD via Redirection to the PLT |
| 4 | +tags: [hooking, reverse-engineering, instrumentation, debugging, ELF, LD_PRELOAD, redirect-to-PLT, AMD64, x86-64, PIE] |
| 5 | +author-id: julian |
| 6 | +--- |
| 7 | + |
| 8 | +It is well known that `LD_PRELOAD` can be used to override shared library |
| 9 | +functions loaded at runtime by the dynamic linker [1]. What is not so well known is |
| 10 | +that *internal functions* - functions whose code lies within the `.text` section |
| 11 | +of the binary - can also be be hooked indirectly using a simple trick that relies on `LD_PRELOAD`, even though |
| 12 | +these functions obviously are not imported from dynamically-linked libraries |
| 13 | +(shared objects). |
| 14 | + |
| 15 | + |
| 16 | +### Overview |
| 17 | + |
| 18 | +The following will be discussed: |
| 19 | + - a discription of redirect-to-PLT |
| 20 | + - use cases |
| 21 | + - redirect-to-PLT is not GOT/PLT hooking or infection |
| 22 | + - demonstration of the technique with toy program |
| 23 | + |
| 24 | +Prerequisites: |
| 25 | + - basic familiarity with the following: |
| 26 | + - the ELF format |
| 27 | + - dynamic linking in Linux |
| 28 | + - `LD_PRELOAD` - what it is, how to use it |
| 29 | + |
| 30 | +Tools: |
| 31 | + - Keystone Engine |
| 32 | + - Python 3 |
| 33 | + - GCC |
| 34 | + |
| 35 | +# Introduction |
| 36 | + |
| 37 | +When a function in the `.text` section is called, the instruction pointer jumps |
| 38 | +to the address of the first instruction of that function. To hook such a function, |
| 39 | +the instruction pointer can be redirected to jump to the entry in the |
| 40 | +Procedure Linkage Table (PLT) of a shared library function which will be called |
| 41 | +instead. This shared library function can then be overridden via `LD_PRELOAD` to |
| 42 | +inject a custom shared library function which contains the code to be executed |
| 43 | +in place of the hooked internal function. Crucially, even though this code is |
| 44 | +called from within a shared library that is used elsewhere in the program and called under different |
| 45 | +conditions from the hooked internal function, it is possible to control when this |
| 46 | +code executes. |
| 47 | + |
| 48 | +Put simply, this technique is essentially an extension of the `LD_PRELOAD` |
| 49 | +technique such that it can be used to override internal functions as well, |
| 50 | +wherein flow of execution detours from code resident in the binary's |
| 51 | +`.text` section to code imported from an injected shared library. It consists |
| 52 | +of a redirect and an override: |
| 53 | + |
| 54 | + 1. First, the call to the target internal function is replaced via patching |
| 55 | + with a call to a shared library function in the PLT. |
| 56 | + 2. Next, that particular shared library function is overridden with code from |
| 57 | + a custom shared library, and that shared library is loaded via `LD_PRELOAD` |
| 58 | + |
| 59 | +### Use Cases |
| 60 | + |
| 61 | +Redirect-to-PLT may be useful when there is a need to insert debugging instrumentation into internal |
| 62 | +functions or if we want to override an internal function, but adding code to the |
| 63 | +binary itself is not desirable. |
| 64 | + - code may be added to a binary by adding a new segment or via segment padding |
| 65 | + infection techniques [3][4], but this is quite cumbersome for a few reasons: |
| 66 | + 1. adding code this way usually requires re-engineering the binary file |
| 67 | + to some extent, extending or adding segments, changing flags, updating |
| 68 | + information in the ELF header and the program load table to reflect |
| 69 | + changes made to the binary image and so forth. |
| 70 | + 2. calling shared library functions in code added to the binary is rather |
| 71 | + complex, thus system calls are typically made directly. This often |
| 72 | + necessitates writing code in assembly rather than C or using both |
| 73 | + together. |
| 74 | + |
| 75 | + As a result of the restructions imposed by this approach, it is not very |
| 76 | + flexible and writing code to accomplish this appears to be a comapratively |
| 77 | + slow and error-prone endeavor. |
| 78 | + |
| 79 | + - However, if we want to analyze the behavior of an internal function (via debug `printf()` |
| 80 | + statements for exemple) using the redirect-to-PLT trick, we can recreate the |
| 81 | + logic of the function in a shared library, add the desired modifications, patch |
| 82 | + the code to call that library instead of the chosen internal function, |
| 83 | + and then inject this shared library with `LD_PRELOAD`. The instrumented |
| 84 | + code in this shared library will then be executed instead of the original |
| 85 | + internal function code. |
| 86 | + |
| 87 | +### Hooking with redirect-to-PLT vs GOT/PLT hooking |
| 88 | + |
| 89 | +It should be noted that even though this method relies on the PLT for redirection, |
| 90 | +it is not related to GOT/PLT hooking [2], in which the GOT or PLT are overwritten |
| 91 | +in order to override imported shared library functions in a similar vein to |
| 92 | +`LD_PRELOAD`. This *redirect-to-PLT* trick is a hack to override *internal |
| 93 | +functions* specifically; no changes are made to the GOT or the PLT. |
| 94 | + |
| 95 | + |
| 96 | +# Overriding an Internal Function in a Toy Program |
| 97 | + |
| 98 | +For the following program (`example_program1`), we want to hook the |
| 99 | +`detour_me()` function: |
| 100 | + |
| 101 | +```c |
| 102 | +/* |
| 103 | +detour the detour_me(void) function via redirect-to-PLT to print |
| 104 | +a string of our choosing. For now, let us choose "I <3 LD_PRELOAD" |
| 105 | +*/ |
| 106 | + |
| 107 | +#include <stdio.h> |
| 108 | + |
| 109 | +void detour_me(void) { |
| 110 | + printf("Can "); |
| 111 | + printf("you "); |
| 112 | + printf("detour "); |
| 113 | + printf("this "); |
| 114 | + printf("function?\n"); |
| 115 | +} |
| 116 | + |
| 117 | +int main(void) { |
| 118 | + printf("In main(), before detour_me()\n"); |
| 119 | + detour_me(); |
| 120 | + printf("In main(), after detour_me()\n"); |
| 121 | +} |
| 122 | +``` |
| 123 | +
|
| 124 | +The approach is as follows: |
| 125 | + 1. Select a suitable shared library function to override |
| 126 | + 2. Patch the `CALL` to `detour_me()` to point to the PLT entry of the chosen shared |
| 127 | + library function |
| 128 | + 3. Design the custom shared library to inject |
| 129 | + 4. Use `LD_PRELOAD` to inject the shared library. In this case the hook will print |
| 130 | + "I <3 LD_PRELOAD". |
| 131 | +
|
| 132 | +**Before beginning, a copy of the original binary should be made. Here the copy |
| 133 | +will be called `copy_to_patch`. Subsequent steps will involve this copy, not |
| 134 | +the original binary.** |
| 135 | +
|
| 136 | +To select a suitable shared library function to override, we can examine which |
| 137 | +shared library functions have entries in the PLT. One way of doing this is using `grep` to |
| 138 | +search through disassembly of the binary output by `objdump`: |
| 139 | +
|
| 140 | +```shell |
| 141 | +$ objdump -dj .text copy_to_patch | grep plt |
| 142 | + 65e: e8 0d ff ff ff callq 570 <__cxa_finalize@plt> |
| 143 | + 69a: e8 c1 fe ff ff callq 560 <printf@plt> |
| 144 | + 6ab: e8 b0 fe ff ff callq 560 <printf@plt> |
| 145 | + 6bc: e8 9f fe ff ff callq 560 <printf@plt> |
| 146 | + 6cd: e8 8e fe ff ff callq 560 <printf@plt> |
| 147 | + 6d9: e8 72 fe ff ff callq 550 <puts@plt> |
| 148 | + 6ec: e8 5f fe ff ff callq 550 <puts@plt> |
| 149 | + 6fd: e8 4e fe ff ff callq 550 <puts@plt> |
| 150 | +
|
| 151 | +``` |
| 152 | +Since this example program is trivial, we could override any of these, but here |
| 153 | +`__cxa_finalize()` will be chosen since it illustrates the flexibility of this |
| 154 | +approach and will also introduce an interesting challenge associated with using this |
| 155 | +technique. |
| 156 | + |
| 157 | +Next, the call to `detour_me()` needs to be patched to point to the entry in the |
| 158 | +PLT for `__cxa_finalize()`. From the bit of output above, it can be seen that the file offset |
| 159 | +of the the PLT entry for `__cxa_finalize()` is 0x570. According to the disassembly |
| 160 | +of `main()`, `detour_me()` is called at file offset 0x6f1: |
| 161 | + |
| 162 | +```shell |
| 163 | +00000000000006e1 <main>: |
| 164 | + 6e1: 55 push %rbp |
| 165 | + 6e2: 48 89 e5 mov %rsp,%rbp |
| 166 | + 6e5: 48 8d 3d ca 00 00 00 lea 0xca(%rip),%rdi # 7b6 <_IO_stdin_used+0x26> |
| 167 | + 6ec: e8 5f fe ff ff callq 550 <puts@plt> |
| 168 | + 6f1: e8 94 ff ff ff callq 68a <detour_me> <---------------- |
| 169 | + 6f6: 48 8d 3d d7 00 00 00 lea 0xd7(%rip),%rdi # 7d4 <_IO_stdin_used+0x44> |
| 170 | + 6fd: e8 4e fe ff ff callq 550 <puts@plt> |
| 171 | + 702: b8 00 00 00 00 mov $0x0,%eax |
| 172 | + 707: 5d pop %rbp |
| 173 | + 708: c3 retq |
| 174 | + 709: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) |
| 175 | +``` |
| 176 | + |
| 177 | +Key pieces of information for patching: |
| 178 | + |
| 179 | + - `main()` calls `detour_me()` at offset 0x6f1 |
| 180 | + - the PLT entry for `__cxa_finalize()` is at 0x570 |
| 181 | + |
| 182 | +Python script to patch the copy of the example program: |
| 183 | + |
| 184 | +<script src="https://gist.github.com/BinaryResearch/e70d29e2d3e36f9967fe7d0c64cb1841.js"></script> |
| 185 | + |
| 186 | +After the patch is applied, `__cxa_finalize()` is called from `main()` instead of `detour_me()`: |
| 187 | + |
| 188 | +```shell |
| 189 | +00000000000006e1 <main>: |
| 190 | + 6e1: 55 push %rbp |
| 191 | + 6e2: 48 89 e5 mov %rsp,%rbp |
| 192 | + 6e5: 48 8d 3d ca 00 00 00 lea 0xca(%rip),%rdi # 7b6 <_IO_stdin_used+0x26> |
| 193 | + 6ec: e8 5f fe ff ff callq 550 <puts@plt> |
| 194 | + 6f1: e8 7a fe ff ff callq 570 <__cxa_finalize@plt> <-------------- |
| 195 | + 6f6: 48 8d 3d d7 00 00 00 lea 0xd7(%rip),%rdi # 7d4 <_IO_stdin_used+0x44> |
| 196 | + 6fd: e8 4e fe ff ff callq 550 <puts@plt> |
| 197 | + 702: b8 00 00 00 00 mov $0x0,%eax |
| 198 | + 707: 5d pop %rbp |
| 199 | + 708: c3 retq |
| 200 | + 709: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) |
| 201 | + |
| 202 | +``` |
| 203 | + |
| 204 | +Now that the binary has been patched, it is time to write a shared library to |
| 205 | +inject. Fortunately, in this case the logic of the program is very simple and |
| 206 | +the library function chosen to be overridden can simply be substituted. We need |
| 207 | +not concern ourselves with wrapping it. |
| 208 | + |
| 209 | +Here is the code of the custom shared library to inject: |
| 210 | + |
| 211 | +<script src="https://gist.github.com/BinaryResearch/14348015a7e62bd6619f68e04ef172ed.js"></script> |
| 212 | + |
| 213 | +This will be compiled via |
| 214 | + |
| 215 | +```shell |
| 216 | +$ gcc -shared -fPIC -o override_cxa_finalize.so override_cxa_finalize.c |
| 217 | +``` |
| 218 | + |
| 219 | +Now we are ready to inject the code! |
| 220 | + |
| 221 | +```shell |
| 222 | +$ LD_PRELOAD=$PWD/override_cxa_finalize.so ./copy_to_patch |
| 223 | +In main(), before detour_me() |
| 224 | +I <3 LD_PRELOAD |
| 225 | +In main(), after detour_me() |
| 226 | +I <3 LD_PRELOAD |
| 227 | +I <3 LD_PRELOAD |
| 228 | +``` |
| 229 | + |
| 230 | +It works, but there is a problem: `__cxa_finalize()` is called 3 times, whereas |
| 231 | +in the original binary the function we want to hook, `detour_me()`, is called |
| 232 | +only once. How can we ensure that the detour for `detour_me()` is executed **only** |
| 233 | +when `__cxa_finalize()` is called from `main()`? |
| 234 | + |
| 235 | +This is one of the main challenges associated with using |
| 236 | +a library function to hook an internal function; depending on which library function |
| 237 | +is chosen, it may be called an arbitrary number of times and across a variety of |
| 238 | +circumstances which may be hard or impossible to predict or account for. |
| 239 | + |
| 240 | +In this case, one possible solution is to take advantage of the fact that according to the prototype for |
| 241 | +`__cxa_finalize()`, the function takes an argument and that the value of this argument |
| 242 | +will vary across calls to `__cxa_finalize()`. The code overriding `detour_me()` |
| 243 | +can be set to execute for a particular value of the argument. |
| 244 | + |
| 245 | +<script src="https://gist.github.com/BinaryResearch/4ebb6c2dca3f725ba05414e30c44598e.js"></script> |
| 246 | + |
| 247 | +This produces the desired behavior: |
| 248 | + |
| 249 | +```shell |
| 250 | +$ LD_PRELOAD=$PWD/override_cxa_finalize_A.so ./copy_to_patch |
| 251 | +In main(), before detour_me() |
| 252 | +Argument to __cxa_finalize(): 0x1 |
| 253 | +I <3 LD_PRELOAD |
| 254 | +In main(), after detour_me() |
| 255 | +Argument to __cxa_finalize(): 0x556fdf12d008 |
| 256 | +Argument to __cxa_finalize(): 0x7f22c14cb028 |
| 257 | +``` |
| 258 | + |
| 259 | +Another option is counting the number of times `__cxa_finalize()` is called so that the |
| 260 | +"I <3 LD_PRELOAD" message is printed only when `detour_me()` is being hooked. |
| 261 | +Aside from the very first call to `__cxa_finalize()`, we do not want our code for |
| 262 | +`detour_me()` to execute. Therefore, |
| 263 | +if the number of times `__cxa_finalize()` has been called can be checked *within* |
| 264 | +`__cxa_finalize()`, the code overriding `detour_me()` can be made to execute *only* |
| 265 | +upon the first call to `__cxa_finalize()` and otherwise not. |
| 266 | + |
| 267 | +This can be accomplished by using `setenv()` and `getenv()` within the injected shared library |
| 268 | +to create, update and read an |
| 269 | +environmental variable stored on the stack that keeps track of the number of times `__cxa_finalize()` is called during program runtime: |
| 270 | + |
| 271 | +<script src="https://gist.github.com/BinaryResearch/6721ab6e867e8837752000a64fa23dce.js"></script> |
| 272 | + |
| 273 | +And inject the new library: |
| 274 | +```shell |
| 275 | +$ LD_PRELOAD=$PWD/override_cxa_finalize_B.so ./copy_to_patch |
| 276 | +In main(), before detour_me() |
| 277 | +__cxa_finalize() called 1 time! |
| 278 | +I <3 LD_PRELOAD |
| 279 | +In main(), after detour_me() |
| 280 | +__cxa_finalize() called 2 times! |
| 281 | +__cxa_finalize() called 3 times! |
| 282 | +``` |
| 283 | + |
| 284 | +Once again, the code for `detour_me()` in the injected library is executed |
| 285 | +only when `__cxa_finalize()` is called in `main()` in place of `detour_me()`. |
| 286 | + |
| 287 | + |
| 288 | +# Conclusion |
| 289 | + |
| 290 | + - By patching a function call to an internal function to jump to a shared library |
| 291 | + function entry in the PLT, that shared library function will be called instead |
| 292 | + of the internal function. Thus the internal function is now hooked by a shared |
| 293 | + library function. |
| 294 | + - The shared library function that hooks the internal function can be overridden |
| 295 | + with a custom library via `LD_PRELOAD`. |
| 296 | + - Since execution detours to the shared library function, there are few constraints |
| 297 | + on what can be executed instead of the code of the internal function. For example, |
| 298 | + unlike when inserting code into the binary itself, |
| 299 | + library calls can be made easily, and space is a non-factor. There is no need to |
| 300 | + use code caves, look for `00` padding, extend segments, update variable relocations manually, etc. |
| 301 | + - However, ensuring that the code overriding the internal function is executed |
| 302 | + only when that internal function is hooked by the shared library function may require |
| 303 | + coding triggers in the custom shared library, depending on which library function was chosen as the |
| 304 | + internal function override; |
| 305 | + program- and runtime-specific conditions may be very particular. |
| 306 | + |
| 307 | +In this post, a toy example was used to introduce this technique. In the next part, |
| 308 | +it will be demostrated how to use redirect-to-PLT to insert debugging instrumentation into the internal functions of crackme programs. |
| 309 | + |
| 310 | + |
| 311 | +### Links and References |
| 312 | + |
| 313 | + 1. [Dynamic linker tricks: Using LD_PRELOAD to cheat, inject features and investigate programs](https://rafalcieslak.wordpress.com/2013/04/02/dynamic-linker-tricks-using-ld_preload-to-cheat-inject-features-and-investigate-programs/) |
| 314 | + 2. [SHARED LIBRARY CALL REDIRECTION VIA ELF PLT INFECTION](http://phrack.org/issues/56/7.html) |
| 315 | + 3. [Infecting the plt/got](https://lief.quarkslab.com/doc/latest/tutorials/05_elf_infect_plt_got.html) |
| 316 | + 4. [UNIX VIRUSES](https://www.win.tue.nl/~aeb/linux/hh/virus/unix-viruses.txt) |
0 commit comments