GDB and Core Dump
25 Jul 2024 • Leave CommentsThis is a tutorial on dynamically analyzing and exploiting objects with GDB (GNU DeBugger). Without explicit notice, the object type is Core Dump. The terms object and executable are used interchangably in this post.
Please playaround with the code in the Appendix section.
- Limitations
- Extensions
- Configuration
- Objects with Debug Info
- Core Dump
- Quickstart
- GDB Commands
- More Tools
- Memory
- Nginx
- Appendix
Limitations
GDB currently does not support Apple Arm chipset (e.g. M1). To bypass this issue, we can run GDB within Docker container.
The command gdb --configuration
shows how GDB was built.
(kong-dev) kong@kong-ee:/kong-ee$ gdb --configuration
This GDB was configured as follows:
configure --host=aarch64-linux-gnu --target=aarch64-linux-gnu
--with-auto-load-dir=$debugdir:$datadir/auto-load
--with-auto-load-safe-path=$debugdir:$datadir/auto-load
--with-expat
--with-gdb-datadir=/usr/share/gdb (relocatable)
--with-jit-reader-dir=/usr/lib/gdb (relocatable)
--without-libunwind-ia64
--with-lzma
--with-babeltrace
--without-intel-pt
--with-xxhash
--with-python=/usr (relocatable)
--with-python-libdir=/usr/lib (relocatable)
--with-debuginfod
--with-curses
--without-guile
--without-amd-dbgapi
--enable-source-highlight
--enable-threading
--enable-tui
--with-system-readline
--with-separate-debug-dir=/usr/lib/debug (relocatable)
--with-system-gdbinit=/etc/gdb/gdbinit
--with-system-gdbinit-dir=/etc/gdb/gdbinit.d
("Relocatable" means the directory can be moved with the GDB installation
tree, and GDB will still find it.)
Extensions
gef
gef (GDB Ehanced Features) is a Python plugin for vanilla GDB, supporting both x86 (32/64) and Arm (AArch 32/64). It is as actively maintained as its successor pwndbg, comes with just a single Python script file and requires no dependencies except Python. Howeveer, some gef commands depends on a few other tools like file, readelf, nm, and ps.
There are other alternatives like pwndbg and peda. pwndbg is as actively maintained as gef, but the installation script introduces customizations, and spreads files everywhere. The script might introduce conflicts things conflicting with my local setup (e.g. bashrc). I quite do not like this style. peda, on the other hand, is almost deprecated, and the last commit is from 4 years ago.
In this post, we elect gef!
Before using gef, ensure the following tools are available in your system.
file
readelf
nm
ps
python3
Additionally, ensure the locale is set to UTF-8
, otherwise gef reports "UnicodeEncodeError". See https://github.com/hugsy/gef/issues/195.
We can download gef from the official repo.
ubuntu@ip-172-31-9-194:~/workspace/ git clone https://github.com/hugsy/gef
ubuntu@ip-172-31-9-194:~/workspace$ cd gef/
ubuntu@ip-172-31-9-194:~/workspace/gef$ ls
LICENSE README.md docs gef.py mkdocs.yml scripts tests
The only file required is the gef.py
script. We can load gef.py
on the fly.
ubuntu@ip-172-31-9-194:~/misc$ gdb -q
(gdb) source ~/workspace/gef/gef.py
GEF for linux ready, type `gef' to start, `gef config' to configure
93 commands loaded and 5 functions added for GDB 12.1 in 0.00ms using Python engine 3.10
gef➤
Alternatively, load gef automatically on GDB startup.
ubuntu@ip-172-31-9-194:~/misc$ echo 'source /path/to/gef.py' >> ~/.config/gdb/gdbinit
ubuntu@ip-172-31-9-194:~/misc$ gdb -q
GEF for linux ready, type `gef' to start, `gef config' to configure
93 commands loaded and 5 functions added for GDB 12.1 in 0.00ms using Python engine 3.10
gef➤
At any moment, we can use the gef context command to show registers, stack, code (disassembly), code, frames, etc.
gef➤ context
From the figure above, the stack is growing from high address to low address.
In addition to standard GDB commands, gef offers enhanced commands. Use the gef
command to list all enhanced commands.
openresty-gdb-utils
openresty-gdb-utils is another GDB extension to debug OpenResty programs, including Nginx, ngx_lua
, LuaJIT, etc.
Regarding installation, configuration and commands, please refer to the official page.
nginx.gdb
nginx.gdb is personal GDB script with a few functionalities.
Configuration
-
~/.config/gdb/gdbearlyinit
which is checked before any other GDB configuration file.# similar to option '-q' set startup-quietly on
-
~/.config/gdb/gdbinit
which is the primary configuration file.# gef source /home/ubuntu/workspace/gef/gef.py # openresty-gdb-utils directory /home/ubuntu/workspace/openresty-gdb-utils py import sys py sys.path.append("/home/ubuntu/workspace/openresty-gdb-utils") source luajit20.gdb source ngx-lua.gdb source luajit21.py source ngx-raw-req.py set python print-stack full
Objects with Debug Info
When compiling an executable, we can add the options -g
and -Og
to gcc
to include debug information.
ubuntu@ip-172-31-9-194:~/misc$ cat test.c
#include <stdio.h>
int main()
{
printf("%d\n", 100/0);
}
# '-g' includes debug info
# '-Og' optimize debug experience
ubuntu@ip-172-31-9-194:~/misc$ gcc -Wall -Wextra -Og -g -std=c11 -o test.out test.c
ubuntu@ip-172-31-9-194:~/misc$ file test.out
test.out: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=06b7264bd3f05cfc7ea928d4cc9b257a4c83c8cd, for GNU/Linux 3.2.0, with debug_info, not stripped
The -Og
is designated for debug, but the program might be less optimized and performant. Do not use it for production.
Without debug information, gdb cannot get information like symbol translation, backtrace, etc.
Core Dump
This post focuses on Core Dump object. It is the memory dump of a running process.
There are multiple ways to get a Core Dump. For example, we can make use of the command gcore to dump the memory. Alternatively, we can configure the system to automatically generate a Core Dump when a program crash. Please read "biji" for details.
Please also read the tutorial on how to generate Core Dump within K8s environment.
Verify Core Dump
After receiving a Core Dump file, we must validate it is legitmate.
ubuntu@ip-172-31-9-194:~/misc$ file core_new.2436
core_new.2436: ELF 64-bit LSB core file, x86-64, version 1 (SYSV), SVR4-style, from 'nginx: worker prnginx: worker process', real uid: 1000, effective uid: 1000, real gid: 1000, effective gid: 1000, execfn: '/usr/local/openresty/nginx/sbin/nginx', platform: 'x86_64'
Additionally, we can inspect the logical and physical disk size to see if it is sparse file.
ubuntu@ip-172-31-9-194:~/misc$ ll -hs core_new.2436
747M -rw-r--r-- 1 ubuntu ubuntu 3.0G Jul 31 03:44 core_new.2436
ubuntu@ip-172-31-9-194:~/misc$ du -hsc -- core_new.2436
747M core_new.2436
747M total
ubuntu@ip-172-31-9-194:~/misc$ du -hsc --apparent-size -- core_new.2436
3.0G core_new.2436
3.0G total
~ $ stat core_new.2436
File: core_new.2436
Size: 3114838104 Blocks: 1528400 IO Block: 4096 regular file
ubuntu@ip-172-31-9-194:~/misc$ bc <<< '(1528400 * 512)/1024/1024'
746
Why is Core Dump file a spare file? Core dump is a dump of the virtual memory space, but process usually do not use the entire virtual space. There are unmapped space ranges for two purposes.
- Reserved but unmapped space.
- Gaps between different segments (e.g. stack, heap) are never mapped.
- Memory fragmentation.
- Modern 64-bit systems have enormous theoretical address spaces. A process would not use all of them.
GDB understands the sparse Core Dump. Although Core Dump is often sparse, it is not always guaranteed.
Quickstart
Command gdb is an interactive Shell capable of inspecting execution details of a process at a certain point, like dereferencing an uninitialized or NULL pointer (address 0x0
).
With the option --nx
, gdb skips the configuration.
~ $ gdb --nx
GNU gdb (Ubuntu 12.1-0ubuntu1~22.04.2) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
(gdb)
In this section, we will describe multiple ways to make use of the GDB.
Launch a Process
With the option --args
, you can pass to the program arguments at the end of command line.
~ $ gdb [--args] /path/to/program [arglist]
# -or- interactively
~ $ gdb
(gdb) file /path/to/program
(gdb) run [arglist]
Attach to a Process
With the option -p
, you can omit the binary. We can generate a Core Dump file on the fly with gcore.
# with binary
~ $ gdb /path/to/program <pid>
# no binary; must have '-p'
~ $ gdb -p pid
# generate core dump
(gdb) help gcore
generate-core-file, gcore
Save a core file with the current state of the debugged process.
Usage: generate-core-file [FILENAME]
Argument is optional filename. Default filename is 'core.PROCESS_ID'.
(gdb) gcore
warning: Memory read failed for corefile section, 4096 bytes at 0xffffffffff600000.
Saved corefile core.1231557
(gdb) shell ls -lhs core.1231557
9.9M -rw-rw-r-- 1 ubuntu ubuntu 9.9M Aug 15 14:15 core.1231557
Load a Core Dump
With the option -c
, you can omit the binary.
# binary available
~ $ gdb /path/to/program /path/to/core.pid
# no binary; must have '-c'
~ $ gdb -c /path/to/core.pid
# -or- interactively
(gdb) file /path/to/program
(gdb) core-file /path/to/core.pid
GDB Commands
At any moment, we can type the special command help to get help. For sophiscated manual, please refer to the official documentation.
(gdb) help
(gdb) help internals
(gdb) help disassemble
The following is a list of commonly used gdb commands.
- Keyboard ENTER key repeats the last command.
-
run starts or restart the program. The argument list can be provided immediately like
run -a -x y
.A program either runs successfully, or runs into issues. When the program is running in the middle, we can stop it via shortcut
Ctrl-C
or gdb commandsignal SIGINT
. -
backtrace shows the call stack. With the option
-full
, it also prints local variables.(gdb) backtrace #0 0x0000ffffa7a7bd74 in __GI_epoll_pwait (epfd=32, events=0xaaab04fa8760, maxevents=512, timeout=492, set=0x0) at ../sysdeps/unix/sysv/linux/epoll_pwait.c:40 #1 0x0000aaaadbfe4ac0 in ngx_epoll_process_events (cycle=0xaaab04fa10e0, timer=492, flags=1) at src/event/modules/ngx_epoll_module.c:800 #2 0x0000aaaadbfcf65c in ngx_process_events_and_timers (cycle=0xaaab04fa10e0) at src/event/ngx_event.c:258 #3 0x0000aaaadbfe10b4 in ngx_worker_process_cycle (cycle=0xaaab04fa10e0, data=0x0) at src/os/unix/ngx_process_cycle.c:793 #4 0x0000aaaadbfdccb8 in ngx_spawn_process (cycle=0xaaab04fa10e0, proc=0xaaaadbfe0fc8 <ngx_worker_process_cycle>, data=0x0, name=0xaaaadc2ba400 "worker process", respawn=-3) at src/os/unix/ngx_process.c:199 #5 0x0000aaaadbfdfa98 in ngx_start_worker_processes (cycle=0xaaab04fa10e0, n=1, type=-3) at src/os/unix/ngx_process_cycle.c:382 #6 0x0000aaaadbfdefc4 in ngx_master_process_cycle (cycle=0xaaab04fa10e0) at src/os/unix/ngx_process_cycle.c:135 #7 0x0000aaaadbf91b2c in main (argc=5, argv=0xffffd729e868) at src/core/nginx.c:387
gef has a more powerful context command to show more information.
-
frame. Stack Frame is an important concept in GDB. It represents a function call and occupies an entry in the call stack.
Take the output from step 3 for example, the
#0
is the innermost frame (most recent function), while#7
is the outermost frame (the main function). We call the frame#0
the current frame. It is the base of a lot of GDB commands.A frame contains the function name, arguments, local variables, source code line, etc. Specially, it includes the memory address at which the function is executing, namely the address where the code is mapped onto, recorded in the register
$pc
. Please be noted that, it is not the stack address.frame N # Select frame number N up # Move to the caller frame (up the call stack) down # Move to the callee frame (down the call stack) info frame # Show detailed info about current frame info locals # Show local variables in current frame info args # Show function arguments of current frame print var # Print value of variable in current frame set var=value # Set value of variable in current frame
-
disassemble shows the assembly code. By default, it shows the code surrounding the register
$pc
, namely the current frame. With the option/s
, it also shows the source code.(gdb) disassemble /s 0x0000aaaadbfe4ac0 Dump of assembler code for function ngx_epoll_process_events: src/event/modules/ngx_epoll_module.c: 785 { 0x0000aaaadbfe4a38 <+0>: stp x29, x30, [sp, #-128]! 0x0000aaaadbfe4a3c <+4>: mov x29, sp 0x0000aaaadbfe4a40 <+8>: str x0, [sp, #40] 0x0000aaaadbfe4a44 <+12>: str x1, [sp, #32] 0x0000aaaadbfe4a48 <+16>: str x2, [sp, #24] 786 int events; 787 uint32_t revents; 788 ngx_int_t instance, i; 789 ngx_uint_t level; 790 ngx_err_t err; 791 ngx_event_t *rev, *wev; 792 ngx_queue_t *queue; 793 ngx_connection_t *c; 794 795 /* NGX_TIMER_INFINITE == INFTIM */ 796 797 ngx_log_debug1(NGX_LOG_DEBUG_EVENT, cycle->log, 0, 0x0000aaaadbfe4a4c <+20>: ldr x0, [sp, #40] 0x0000aaaadbfe4a50 <+24>: ldr x0, [x0, #16] 0x0000aaaadbfe4a54 <+28>: ldr x0, [x0] 0x0000aaaadbfe4a58 <+32>: bl 0xaaaadc195650 <ngx_http_lua_kong_get_dynamic_log_level> --Type <RET> for more, q to quit, c to continue without paging-- 0x0000aaaadbfe4a5c <+36>: and x0, x0, #0x80 0x0000aaaadbfe4a60 <+40>: cmp x0, #0x0 0x0000aaaadbfe4a64 <+44>: b.eq 0xaaaadbfe4a88 <ngx_epoll_process_events+80> // b.none 0x0000aaaadbfe4a68 <+48>: ldr x0, [sp, #40] 0x0000aaaadbfe4a6c <+52>: ldr x1, [x0, #16] 0x0000aaaadbfe4a70 <+56>: ldr x4, [sp, #32] 0x0000aaaadbfe4a74 <+60>: adrp x0, 0xaaaadc2ba000 0x0000aaaadbfe4a78 <+64>: add x3, x0, #0xac0 0x0000aaaadbfe4a7c <+68>: mov w2, #0x0 // #0 0x0000aaaadbfe4a80 <+72>: mov x0, #0x8 // #8 0x0000aaaadbfe4a84 <+76>: bl 0xaaaadbf946fc <ngx_log_error_core> 798 "epoll timer: %M", timer); 799 800 events = epoll_wait(ep, event_list, (int) nevents, timer); 0x0000aaaadbfe4a88 <+80>: adrp x0, 0xaaaadc661000 <week+16> 0x0000aaaadbfe4a8c <+84>: add x0, x0, #0xb90 0x0000aaaadbfe4a90 <+88>: ldr w4, [x0] 0x0000aaaadbfe4a94 <+92>: adrp x0, 0xaaaadc6a0000 <ngx_processes+47632> 0x0000aaaadbfe4a98 <+96>: add x0, x0, #0xa48 0x0000aaaadbfe4a9c <+100>: ldr x1, [x0] 0x0000aaaadbfe4aa0 <+104>: adrp x0, 0xaaaadc6a0000 <ngx_processes+47632> 0x0000aaaadbfe4aa4 <+108>: add x0, x0, #0xa50 0x0000aaaadbfe4aa8 <+112>: ldr x0, [x0] 0x0000aaaadbfe4aac <+116>: mov w2, w0 --Type <RET> for more, q to quit, c to continue without paging--q Quit
The objdump has similar capabilities.
-
list shows only source code. With the argument
.
, it shows code surrounding the current frame.gef➤ frame 1 #1 0x0000aaaaad984ac0 in ngx_epoll_process_events (cycle=0xaaaab6b09130, timer=0x134, flags=0x1) at src/event/modules/ngx_epoll_module.c:800 800 events = epoll_wait(ep, event_list, (int) nevents, timer); gef➤ list . 795 /* NGX_TIMER_INFINITE == INFTIM */ 796 797 ngx_log_debug1(NGX_LOG_DEBUG_EVENT, cycle->log, 0, 798 "epoll timer: %M", timer); 799 800 events = epoll_wait(ep, event_list, (int) nevents, timer); 801 802 err = (events == -1) ? ngx_errno : 0; 803 804 if (flags & NGX_UPDATE_TIME || ngx_event_timer_alarm) {
To show the source code pathname.
(gdb) info sources /home/kong/.cache/bazel/_bazel_kong/ee741459d483c56c5059256f49a7a414/execroot/_main/bazel-out/aarch64-fastbuild/bin/build/kong-dev/openresty/nginx/sbin/nginx: (Full debug information has not yet been read for this file.) /home/kong/.cache/bazel/_bazel_kong/ee741459d483c56c5059256f49a7a414/execroot/_main/bazel-out/aarch64-fastbuild/bin/external/openresty/openresty.build_tmpdir/build/nginx-1.25.3/src/core/nginx.c, /home/kong/.cache/bazel/_bazel_kong/ee741459d483c56c5059256f49a7a414/execroot/_main/bazel-out/aarch64-fastbuild/bin/external/openresty/openresty.build_tmpdir/build/nginx-1.25.3/src/os/unix/ngx_files.h, /home/kong/.cache/bazel/_bazel_kong/ee741459d483c56c5059256f49a7a414/execroot/_main/bazel-out/aarch64-fastbuild/bin/external/openresty/openresty.build_tmpdir/build/nginx-1.25.3/src/core/ngx_log.h,
-
break toggle a breakpoint and pauses the program at the specified line.
Use list or disassemble to identify the code line you are interested in.
(gdb) break ngx_epoll_module.c:800 Breakpoint 2 at 0xaaaadbfe4a88: file src/event/modules/ngx_epoll_module.c, line 800. (gdb) info breakpoints Num Type Disp Enb Address What 2 breakpoint keep y 0x0000aaaadbfe4a88 in ngx_epoll_process_events at src/event/modules/ngx_epoll_module.c:800 (gdb) delete breakpoints 2 (gdb) info breakpoints No breakpoints, watchpoints, tracepoints, or catchpoints.
You can define a conditional breakpoint like
break 15 if var > 10
. -
watch monitors a variable and pauses the program when the variable is modified.
(gdb) info variables -t ngx_cycle_t All defined variables with type matching regular expression "ngx_cycle_t" : File ../echo-nginx-module-0.63/src/ngx_http_echo_filter.c: 28: static volatile ngx_cycle_t *ngx_http_echo_prev_cycle; File ../headers-more-nginx-module-0.37/src/ngx_http_headers_more_filter_module.c: 111: static volatile ngx_cycle_t *ngx_http_headers_more_prev_cycle; File ../ngx_lua-0.10.26/src/ngx_http_lua_module.c: 71: static volatile ngx_cycle_t *ngx_http_lua_prev_cycle; File src/core/ngx_cycle.c: 21: volatile ngx_cycle_t *ngx_cycle; File src/os/unix/ngx_process_cycle.c: 73: static ngx_cycle_t ngx_exit_cycle; (gdb) watch ngx_cycle Hardware watchpoint 5: ngx_cycle (gdb) info watchpoints Num Type Disp Enb Address What 5 hw watchpoint keep y ngx_cycle
- continue resumes the program until crash, the next breakpoint or exit.
- next executes the next statement.
-
step steps into funtion call while next does not.
stepi steps over machine instruction instead of source code statement.
- finish completes the current function call and then pause.
-
print shows values of variables.
(gdb) print ngx_cycle $12 = (volatile ngx_cycle_t *) 0xaaab04fa10e0 (gdb) print * $12 $13 = {conf_ctx = 0xaaab04fa2580, pool = 0xaaab04fa10a0, log = 0xaaab04fa10f8, new_log = {log_level = 6, file = 0xaaab04fa15f0, connection = 0, disk_full_time = 0, handler = 0x0, data = 0x0, writer = 0x0, wdata = 0x0, action = 0x0, next = 0x0}, log_use_stderr = 0, files = 0x0, free_connections = 0xffff705cbb78, free_connection_n = 15840, modules = 0xaaab04fa2cb0, modules_n = 89, modules_used = 1, reusable_connections_queue = {prev = 0xaaab04fa1180, next = 0xaaab04fa1180}, reusable_connections_n = 0, connections_reuse_time = 0, listening = {elts = 0xaaab055dbf30, nelts = 16, size = 296, nalloc = 20, pool = 0xaaab04fa10a0, old_elts = 0xaaab055d45d0}, paths = {elts = 0xaaab04fa1530, nelts = 5, size = 8, nalloc = 10, pool = 0xaaab04fa10a0, old_elts = 0x0}, config_dump = {elts = 0xaaab04ffdd40, nelts = 7, size = 24, nalloc = 8, pool = 0xaaab04fa10a0, old_elts = 0xaaab04ffde30}, config_dump_rbtree = {root = 0xaaab04fc3850, sentinel = 0xaaab04fa1248, insert = 0xaaaadbfa0258 <ngx_str_rbtree_insert_value>}, config_dump_sentinel = {key = 0, left = 0x0, right = 0x0, parent = 0x0, color = 0 '\000', data = 0 '\000'}, open_files = {last = 0xaaab04fa1278, part = {elts = 0xaaab04fa15f0, nelts = 7, next = 0x0}, size = 40, nalloc = 20, pool = 0xaaab04fa10a0}, shared_memory = {last = 0xaaab051b0240, part = { elts = 0xaaab04fa1940, nelts = 1, next = 0xaaab04fc4a60}, size = 88, nalloc = 1, pool = 0xaaab04fa10a0}, connection_n = 16384, files_n = 0, connections = 0xffff7057f010, read_events = 0xffff703fe010, write_events = 0xffff7027d010, old_cycle = 0x0, conf_file = { len = 58, data = 0xaaab04fa1480 "/kong-ee/bazel-bin/build/kong-dev/kong/servroot/nginx.conf"}, conf_param = {len = 0, data = 0xaaab04fa14f0 ""}, conf_prefix = {len = 48, data = 0xaaab04fa13a0 "/kong-ee/bazel-bin/build/kong-dev/kong/servroot/"}, prefix = {len = 48, data = 0xaaab04fa13e0 "/kong-ee/bazel-bin/build/kong-dev/kong/servroot/"}, error_log = {len = 14, data = 0xaaab04fa1440 "logs/error.log"}, lock_file = {len = 64, data = 0xaaab055b4330 "/kong-ee/bazel-bin/build/kong-dev/kong/servroot/logs/nginx.lock.accept"}, hostname = {len = 7, data = 0xaaab04fa2c70 "kong-ee"}, intercept_error_log_handler = 0x0, intercept_error_log_data = 0x0, entered_logger = 0}
-
x shows values referenced by an memory address.
0x0000ffffe2a53890│+0x0000: 0x0000ffffe2a538d0 → 0x0000ffffe2a53950 → 0x0000ffffe2a539a0 → 0x0000ffffe2a539d0 → 0x0000ffffe2a53a40 → 0x0000ffffe2a53a80 → 0x0000ffffe2a53b90 ← $x29, $sp 0x0000ffffe2a53898│+0x0008: 0x0000aaaaad984ac0 → <ngx_epoll_process_events+0088> str w0, [sp, #64] 0x0000ffffe2a538a0│+0x0010: 0x0000ffffe2a54028 → 0x0000ffffe2a54dcf → "nginx: worker process" 0x0000ffffe2a538a8│+0x0018: 0x0000000000000005 0x0000ffffe2a538b0│+0x0020: 0x0000aaaaadffde88 → 0x0000aaaaad931400 → <__do_global_dtors_aux+0000> stp x29, x30, [sp, #-32]! 0x0000ffffe2a538b8│+0x0028: 0x0000aaaaad931574 → <main+0000> sub sp, sp, #0x320 gef➤ x/a 0x0000ffffe2a538a0 0xffffe2a538a0: 0xffffe2a54028 gef➤ x/a 0xffffe2a54028 0xffffe2a54028: 0xffffe2a54dcf gef➤ x/s 0xffffe2a54dcf 0xffffe2a54dcf: "nginx: worker process" gef➤ print (char *) 0x0000ffffe2a54dcf $5 = 0xffffe2a54dcf "nginx: worker process"
- info is very useful. Please see the Informational Commands section.
- record combined with *reverse-
* (e.g. *reverse-step*) can execute the program in reverse order. Therefore, there is no need to re-run the program to inpsect previous execution contexts. - shell invokes a shell command, like shell ll.
Similar to Bash, you can check historical gdb commands in ~/.gdb_history
.
Informational Commands
-
Process info
(gdb) info inferiors Num Description Connection Executable * 1 process 2436 1 (core) /usr/local/openresty/nginx/sbin/nginx
-
Architecture info
gef➤ arch get Arch: Architecture(X86, 64, LITTLE_ENDIAN) Reason: The architecture has been detected via the ELF headers gef➤ arch list Available architectures: Architecture(ARM, ARM, LITTLE_ENDIAN) ARM Architecture(ARM64, None, LITTLE_ENDIAN) ARM64 AARCH64 Architecture(MIPS, MIPS64, LITTLE_ENDIAN) MIPS64 Architecture(MIPS, MIPS32, LITTLE_ENDIAN) MIPS Architecture(PPC, PPC64, LITTLE_ENDIAN) PowerPC64 PPC64 Architecture(PPC, PPC32, LITTLE_ENDIAN) PowerPC PPC Architecture(RISCV, RISCV, LITTLE_ENDIAN) RISCV Architecture(SPARC, V9, LITTLE_ENDIAN) SPARC64 Architecture(SPARC, None, LITTLE_ENDIAN) SPARC Architecture(X86, 64, LITTLE_ENDIAN) X86_64 i386:x86-64 Architecture(X86, 32, LITTLE_ENDIAN) X86
-
Registers info
gef➤ registers $r12 $r12 : 0x00007fffffffdd38 → 0x00007fffffffe07e → "/home/ubuntu/misc/test-gdb.out" (gdb) info registers $r12 r12 0x7fffffffdd38 0x7fffffffdd38 (gdb) x/a 0x7fffffffdd38 0x7fffffffdd38: 0x7fffffffe07e (gdb) x/s 0x7fffffffe07e 0x7fffffffe07e: "/home/ubuntu/misc/test-gdb.out"
-
ELF info
gef➤ elf-info Magic : 7f 45 4c 46 Class : 0x2 - ELF_64_BITS Endianness : 0x1 - LITTLE_ENDIAN ... ───────────────────────────────────────────────────────────────────────────────────────────────── Program Header ────────────────────────────────────────────────────────────────────────────────────── [ #] Type Offset Virtaddr Physaddr FileSiz MemSiz Flags Align [ 0] PT_PHDR 0x40 0x40 0x40 0x2d8 0x2d8 PF_R 0x8 [ 1] PT_INTERP 0x318 0x318 0x318 0x1c 0x1c PF_R 0x1 ... -──────────────────────────────────────────────────────────────────────────────────────────────── Section Header ────────────────────────────────────────────────────────────────────────────────────── [ #] Name Type Address Offset Size EntSiz Flags Link Info Align [ 0] UNKN 0x0 0x0 0x0 0x0 UNKNOWN_FLAG 0x0 0x0 0x0 [ 1] .interp SHT_PROGBITS 0x318 0x318 0x1c 0x0 ALLOC 0x0 0x0 0x1 [ 2] .note.gnu.property SHT_NOTE 0x338 0x338 0x30 0x0 ALLOC 0x0 0x0 0x8 [ 3] .note.gnu.build-id SHT_NOTE 0x368 0x368 0x24 0x0 ALLOC 0x0 0x0 0x4 [ 4] .note.ABI-tag SHT_NOTE 0x38c 0x38c 0x20 0x0 ALLOC 0x0 0x0 0x4 [ 5] .gnu.hash SHT_GNU_HASH 0x3b0 0x3b0 0x33a0 0x0 ALLOC 0x6 0x0 0x8 ...
More Tools
Apart from GDB, we have a bunch of other tools to inspect objects.
To make it simple, we can even use vim
to inspect the file.
hexdump
hexdump can dump file contents in various formats (e.g. hexadecimal), including customized format with the option --format
.
The following example dumps the contents in ASCII format.
~ $ hexdump -C core.pid | less
000140b0 f0 b8 1d 00 00 00 00 00 0a 00 00 00 00 00 00 00 |................|
000140c0 00 5f 49 54 4d 5f 64 65 72 65 67 69 73 74 65 72 |._ITM_deregister|
000140d0 54 4d 43 6c 6f 6e 65 54 61 62 6c 65 00 5f 5f 67 |TMCloneTable.__g|
000140e0 6d 6f 6e 5f 73 74 61 72 74 5f 5f 00 5f 49 54 4d |mon_start__._ITM|
000140f0 5f 72 65 67 69 73 74 65 72 54 4d 43 6c 6f 6e 65 |_registerTMClone|
00014100 54 61 62 6c 65 00 63 72 79 70 74 5f 72 00 6c 75 |Table.crypt_r.lu|
00014110 61 5f 70 75 73 68 66 73 74 72 69 6e 67 00 6c 75 |a_pushfstring.lu|
00014120 61 5f 73 65 74 65 78 64 61 74 61 00 6c 75 61 5f |a_setexdata.lua_|
00014130 67 65 74 66 69 65 6c 64 00 6c 75 61 4c 5f 70 75 |getfield.luaL_pu|
00014140 73 68 72 65 73 75 6c 74 00 6c 75 61 5f 78 6d 6f |shresult.lua_xmo|
00014150 76 65 00 6c 75 61 5f 72 65 73 75 6d 65 00 6c 75 |ve.lua_resume.lu|
00014160 61 4c 5f 61 64 64 6c 73 74 72 69 6e 67 00 6c 75 |aL_addlstring.lu|
00014170 61 4c 5f 6f 70 74 6e 75 6d 62 65 72 00 6c 75 61 |aL_optnumber.lua|
00014180 5f 6e 65 77 75 73 65 72 64 61 74 61 00 6c 75 61 |_newuserdata.lua|
00014190 5f 73 65 74 66 69 65 6c 64 00 6c 75 61 5f 73 65 |_setfield.lua_se|
000141a0 74 74 6f 70 00 6c 75 61 5f 70 75 73 68 73 74 72 |ttop.lua_pushstr|
000141b0 69 6e 67 00 6c 75 61 6f 70 65 6e 5f 66 66 69 00 |ing.luaopen_ffi.|
000141c0 6c 75 61 5f 67 65 74 69 6e 66 6f 00 6c 75 61 5f |lua_getinfo.lua_|
000141d0 69 73 6e 75 6d 62 65 72 00 6c 75 61 4c 5f 63 68 |isnumber.luaL_ch|
000141e0 65 63 6b 6c 73 74 72 69 6e 67 00 6c 75 61 5f 6e |ecklstring.lua_n|
000141f0 65 78 74 00 6c 75 61 5f 74 6f 74 68 72 65 61 64 |ext.lua_tothread|
00014200 00 6c 75 61 5f 67 63 00 6c 75 61 5f 63 72 65 61 |.lua_gc.lua_crea|
However, all the non-printable characters are shown as dots in the ASCII column.
~ $ for ((i=0; i<=255; ++i)); do printf "\x$(printf %x $i)"; done | hexdump -C
00000000 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f |................|
00000010 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f |................|
00000020 20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f | !"#$%&'()*+,-./|
00000030 30 31 32 33 34 35 36 37 38 39 3a 3b 3c 3d 3e 3f |0123456789:;<=>?|
00000040 40 41 42 43 44 45 46 47 48 49 4a 4b 4c 4d 4e 4f |@ABCDEFGHIJKLMNO|
00000050 50 51 52 53 54 55 56 57 58 59 5a 5b 5c 5d 5e 5f |PQRSTUVWXYZ[\]^_|
00000060 60 61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6e 6f |`abcdefghijklmno|
00000070 70 71 72 73 74 75 76 77 78 79 7a 7b 7c 7d 7e 7f |pqrstuvwxyz{|}~.|
00000080 80 81 82 83 84 85 86 87 88 89 8a 8b 8c 8d 8e 8f |................|
00000090 90 91 92 93 94 95 96 97 98 99 9a 9b 9c 9d 9e 9f |................|
000000a0 a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 aa ab ac ad ae af |................|
000000b0 b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 ba bb bc bd be bf |................|
000000c0 c0 c1 c2 c3 c4 c5 c6 c7 c8 c9 ca cb cc cd ce cf |................|
000000d0 d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 da db dc dd de df |................|
000000e0 e0 e1 e2 e3 e4 e5 e6 e7 e8 e9 ea eb ec ed ee ef |................|
000000f0 f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 fa fb fc fd fe ff |................|
00000100
The following examples show that the object contains a lot of zeros.
~ $ hexdump -C 0x577af3685000-0x577af8547000.bin | less
00003b90 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00003ba0 00 00 00 00 00 00 00 00 91 00 00 00 00 00 00 00 |................|
00003bb0 30 8c 68 f3 7a 57 00 00 30 8c 68 f3 7a 57 00 00 |0.h.zW..0.h.zW..|
00003bc0 a0 8c 68 f3 7a 57 00 00 06 00 00 00 00 00 00 00 |..h.zW..........|
00003bd0 31 37 32 2e 31 36 2e 30 2e 31 30 3a 35 33 00 00 |172.16.0.10:53..|
00003be0 00 00 00 00 00 00 00 00 98 8a 68 f3 7a 57 00 00 |..........h.zW..|
00003bf0 40 8c 68 f3 7a 57 00 00 00 00 00 00 00 00 00 00 |@.h.zW..........|
00003c00 40 72 9b 6b 0b 73 00 00 40 8b 68 f3 7a 57 00 00 |@r.k.s..@.h.zW..|
00003c10 31 37 32 2e 31 36 2e 30 2e 31 30 00 00 00 00 00 |172.16.0.10.....|
00003c20 02 00 00 35 ac 10 00 0a 00 00 00 00 00 00 00 00 |...5............|
00003c30 00 00 00 00 00 00 00 00 61 00 00 00 00 00 00 00 |........a.......|
00003c40 0e 00 00 00 00 00 00 00 d0 8b 68 f3 7a 57 00 00 |..........h.zW..|
00003c50 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00003c60 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00003c70 88 8b 68 f3 7a 57 00 00 10 00 00 00 00 00 00 00 |..h.zW..........|
00003c80 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00003c90 00 00 00 00 00 00 00 00 91 00 00 00 00 00 00 00 |................|
00003ca0 20 8d 68 f3 7a 57 00 00 20 8d 68 f3 7a 57 00 00 | .h.zW.. .h.zW..|
00003cb0 90 8d 68 f3 7a 57 00 00 06 00 00 00 00 00 00 00 |..h.zW..........|
00003cc0 20 8c 68 f3 7a 57 00 00 10 00 00 00 00 00 00 00 | .h.zW..........|
00003cd0 0e 00 00 00 00 00 00 00 e0 8c 68 f3 7a 57 00 00 |..........h.zW..|
00003ce0 31 37 32 2e 31 36 2e 30 2e 31 30 3a 35 33 00 00 |172.16.0.10:53..|
00003cf0 00 00 00 00 00 00 00 00 e8 8b 68 f3 7a 57 00 00 |..........h.zW..|
00003d00 30 8d 68 f3 7a 57 00 00 00 00 00 00 00 00 00 00 |0.h.zW..........|
00003d10 f8 31 98 69 0b 73 00 00 f8 8b 68 f3 7a 57 00 00 |.1.i.s....h.zW..|
00003d20 00 00 00 00 00 00 00 00 61 00 00 00 00 00 00 00 |........a.......|
strings
Though hexdump shows ASCII characters, it is not easy to extract the ASCII column. GNU strings, on the other hand, can dump printable characters and helps identify large strings in the Core Dump.
The option -t
prefixes the string with the offset in the object file. It is useful show the distribution and density of string.
~ $ strings -t d /path/to/core.pid > core.pid.density
~ $ strings [-n 4] /path/to/core.pid > core.pid.ascii
~ $ sort -o core.pid.ascii.sorted core.pid.ascii
~ $ uniq -c core.pid.ascii.sorted > core.pid.ascii.sorted.uniq
~ $ sort -nrk1,1 -o core.pid.ascii.sorted.uniq.sorted core.pid.ascii.sorted.uniq
However, non-printable characters are not included in the output.
readelf
readelf is the architecture independent but objdump can dump source code.
Specially, gef supports print ELF info.
gef➤ elf-info
Regarding more examples of readelf, please see "biji".
objdump
readelf cannot dump source code as objdump does. Here is an example.
~ $ objdump --source --source-comment=txt test-gdb.out
...
0000000000001149 <f>:
txt#include <stdio.h>
txt
txtint f(int a, int b)
txt{
1149: f3 0f 1e fa endbr64
txt int sum;
txt sum = a + b;
114d: 8d 04 37 lea (%rdi,%rsi,1),%eax
txt return sum;
txt}
1150: c3 ret
...
GDB command disassemble and list can also dump source code and assembler code.
Memory
We can make use of top
to show the general memory utilization.
~ $ top -o +RES -p $(pgrep -d',' nginx)
Tasks: 2 total, 0 running, 2 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.9 us, 1.8 sy, 0.0 ni, 97.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 7841.2 total, 1885.2 free, 4823.7 used, 1392.9 buff/cache
MiB Swap: 1024.0 total, 105.4 free, 918.6 used. 3017.5 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
37937 kong 20 0 1393704 414628 11904 S 0.0 5.2 14:36.72 nginx
37936 kong 20 0 1023736 108848 896 S 0.0 1.4 0:00.00 nginx
Memory mappings represents how virtual memory space is mapped to RAM. It tells us how the memory is used. GDB and gef both can show memory mappings. Of the three commands below, maintenance info sections
only show the mappings of different memory sections of the ELF. It corresponds to only a few starting lines of vmmap
and info proc mappings
that include dyanmically allocated memory (e.g. heap/stack).
(gdb) maintenance info sections
gef➤ vmmap
(gdb) info proc mappings
Without GDB, we can show the memory mappings of a running process with pmap
.
~ $ pmap -X $(pgrep nginx) > all-$(date -Iseconds).pmap
Core dumps captured at different timestamps reveal newly allocated chunks. Moreover, we can use hexdump, strings and even the vim
to see what is inside the new chunks. How to identify newly allocated chunks?
The following are two excerpts from two Core Dumps. We will use this data as an example to show the process.
# Core Dump 1
[14] 0x577ae51df000->0x577ae521a000 at 0x00002a40: load1 ALLOC LOAD READONLY HAS_CONTENTS
[15] 0x577ae54fc000->0x577ae54ff000 at 0x0003da40: load2 ALLOC LOAD READONLY HAS_CONTENTS
[16] 0x577ae54ff000->0x577ae551e000 at 0x00040a40: load3 ALLOC LOAD HAS_CONTENTS
[17] 0x577ae551e000->0x577ae5550000 at 0x0005fa40: load4 ALLOC LOAD HAS_CONTENTS
[18] 0x577ae63cd000->0x577ae66a0000 at 0x00091a40: load5 ALLOC LOAD HAS_CONTENTS
[19] 0x577ae66a0000->0x577af3685000 at 0x00364a40: load6 ALLOC LOAD HAS_CONTENTS
[20] 0x730b65ac0000->0x730b671c1000 at 0x0d349a40: load7 ALLOC LOAD HAS_CONTENTS
# Core Dump 2
[14] 0x577ae51df000->0x577ae521a000 at 0x00002a78: load1 ALLOC LOAD READONLY HAS_CONTENTS
[15] 0x577ae54fc000->0x577ae54ff000 at 0x0003da78: load2 ALLOC LOAD READONLY HAS_CONTENTS
[16] 0x577ae54ff000->0x577ae551e000 at 0x00040a78: load3 ALLOC LOAD HAS_CONTENTS
[17] 0x577ae551e000->0x577ae5550000 at 0x0005fa78: load4 ALLOC LOAD HAS_CONTENTS
[18] 0x577ae63cd000->0x577ae66a0000 at 0x00091a78: load5 ALLOC LOAD HAS_CONTENTS
[19] 0x577ae66a0000->0x577af8547000 at 0x00364a78: load6 ALLOC LOAD HAS_CONTENTS
[20] 0x730b64d1d000->0x730b671c1000 at 0x1220ba78: load7 ALLOC LOAD HAS_CONTENTS
The 19th mapping in Core Dump 2 has expanded its ending address than the Core Dump 1. We can calculate the increased size in bytes.
~ $ bc <<< 'obase=16; ibase = 16; 0X577AF8547000 - 0X577AF3685000'
4EC2000
~ $ bc <<< 'obase=10; ibase = 16; 0X577AF8547000 - 0X577AF3685000'
82583552
Now let us extract the newly allocated memory from Core Dump 2 and save to the local disk.
(gdb) dump memory ~/misc/0x577af3685000-0x577af8547000.bin 0x577af3685000 0x577af8547000
With the extracted ranged memory dump, we can resort to More Tools.
Please check "whatever" and "kong-dev" for more details. You are strongly recommended to read Memory Leak (and Growth) Flame Graphs.
Nginx
(gdb) p ngx_cycle->pool
$1 = (ngx_pool_t *) 0x577ae63f3ff0
(gdb) p * ngx_cycle->pool
$2 = {d = {last = 0x577ae63f7ff0 "", end = 0x577ae63f7ff0 "", next = 0x577ae63fd020, failed = 6}, max = 4095, current = 0x577ae643f5f0, chain = 0x0, large = 0x577ae64c52d8, cleanup = 0x577ae64c5980, log = 0x577ae63f4058}
(gdb)
src/ngx_http_lua_socket_udp.c
-> ngx_http_lua_socket_udp_setpeername
-> r = ngx_http_lua_get_req(L);
-> host.data = ngx_palloc(r->pool, len + 1);
(gdb) p * (ngx_http_upstream_resolved_t *) 0x577af3685060
$4 = {host = {len = 3471771947605571377, data = 0x33353a30312e <error: Cannot access memory at address 0x33353a30312e>}, port = 0, no_port = 96185581326184, naddrs = 96185581326496, addrs = 0x0, sockaddr = 0x0, socklen = 97, name = {len = 14,
data = 0x577af3685060 "172.16.0.10:53"}, ctx = 0x0}
Appendix
Please refer to How to look at the stack with gdb.
#include <stdio.h>
#include <stdlib.h>
int main() {
char stack_string[10] = "stack";
int x = 10;
char *heap_string;
heap_string = malloc(50);
printf("Enter a string for the stack: ");
gets(stack_string);
printf("Enter a string for the heap: ");
gets(heap_string);
printf("Stack string is: %s\n", stack_string);
printf("Heap string is: %s\n", heap_string);
printf("x is: %d\n", x);
}