gdb / coredump Debugging
Coredump
A core dump is essentially a dump of all memory in your program’s virtual address space: stack, heap, code and everything else.
Core dumps are large, it is all of the memory from the process. Even for simple programs, due to a large number of shared libraries being loaded.
Generally a crash can come from
a segfault (or any other signal),
a call to abort() (such as via a failed assertion),
a call to terminate() (such as via throwing an uncaught exception), or
other similar avenues.
The Problem
When core dump is not configured, crash results to "Segmentation fault".
But after configuration crash results to "Segmentation fault (core dumped)".
# ./test_bin
Segmentation fault
A core dump is a copy of process memory and can be investigated using a debugger. It is named from the era of magnetic core memory.
Core dump analysis is one approach for debugging
In another approach, one could run the program live in gdb to inspect the issue.
One could also use an external tracer to grab data and stack traces on segfault events.
To start with core dumps configuration
Default core dump settings is Disabled.
# ulimit -c ← This command shows the maximum size of core dumps created.
0 ← set to zero (default): Disabling core dumps
# cat /proc/sys/kernel/core_pattern ← stores a core dump file name with or without location
core ← here named as "core" in the current directory
To enable core dump,
# ulimit -c unlimited ← non-zero value or unlimited to enable coredump
# mkdir /var/cores
# echo "/var/cores/core.%e.%p.%i.%s" > /proc/sys/kernel/core_pattern ← here sets global location for core dump file, ONLY within a single boot cycle
OR
# echo "/var/cores/core.%e.%p" > /proc/sys/kernel/core_pattern
It is possible for customization of that core_pattern further;
eg, %p: pid, %i: tid, %t: time of dump, %e: binary filename, %h: hostname, etc.
The options are documented in the Linux kernel source, under Documentation/sysctl/kernel.txt (https://www.kernel.org/doc/Documentation/sysctl/kernel.txt)
To set global location for core dump file, across multiple boot cycles,
# set it via "kernel.core_pattern" in /etc/sysctl.conf ← To make core_pattern permanent and survive reboots
To Turn on core/crash dumps programmatically
The core limit can be set in C program itself, by calling setrlimit().
#include <errno.h>
#include <sys/resource.h>
#include <sys/wait.h>
struct rlimit core_limit;
core_limit.rlim_cur = RLIM_INFINITY;
core_limit.rlim_max = RLIM_INFINITY;
if (setrlimit(RLIMIT_CORE, &core_limit) < 0)
fprintf(stderr, "setrlimit: %s\nWarning: core dumps may be truncated or non-existant\n", strerror(errno));
core_limit.rlim_cur = RLIM_INFINITY;
core_limit.rlim_max = RLIM_INFINITY;
if (setrlimit(RLIMIT_CORE, &core_limit) < 0)
fprintf(stderr, "setrlimit: %s\nWarning: core dumps may be truncated or non-existant\n", strerror(errno));
An example of a watchdog by parent process, with core dumps. If forked child process crashes, then parent process grabs the core dump as configured programmatically.
#include <errno.h>
#include <stdio.h>
#include <string.h>
#include <sys/resource.h>
#include <sys/wait.h>
int main(int argc, char **argv)
{
// Try to enable core dumps
struct rlimit core_limit;
core_limit.rlim_cur = RLIM_INFINITY;
core_limit.rlim_max = RLIM_INFINITY;
if(setrlimit(RLIMIT_CORE, &core_limit) < 0)
fprintf(stderr, "setrlimit: %s\nWarning: core dumps may be truncated or non-existant\n", strerror(errno));
int status;
switch(fork())
{
case 0:
// We are the child process -- run the actual program
*(int *)1 = 2; // segfault
break;
case -1:
// An error occurred, shouldn't happen
perror("fork");
return -1;
default:
// We are the parent process -- wait for the child process to exit
if(wait(&status) < 0)
perror("wait");
printf("child exited with status %d\n", status);
if(WIFSIGNALED(status) && WCOREDUMP(status)) {
printf("got a core dump\n");
// find core dump, email it to your servers, etc.
}
}
return 0;
}
Starting GDB
# gdb [binary full path] ← To run program live in gdb to inspect
# gdb `which [binary]` [core dump file path] ← To analyze coredump of program
# gdb [binary full path] [core dump file path] ← To analyze coredump of program
Output after running gdb:
...
...
...
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007f0a37aac40d in doupdate () from /lib/x86_64-linux-gnu/libncursesw.so.5
After loading core dump file, the last two lines specify the type of segmentation fault, it's location with function name & binary/library name.
# dpkg -l | grep [libname] ← To get few details of installed library
Back Trace [backtrace OR bt]
Stack back traces shows point of failure, and are often help to identify a common problem.
It's usually the first command to use in a gdb session: bt (short for backtrace)
(gdb) backtrace ← To show stack backtrace
#0 0x0804bd18 in fun3 () from /lib/libtest.so
#1 0x0804bd38 in ?? () from /lib/libtest.so
#2 0x0804bd46 in fun1 ()
#3 ...
#4 ...
#5 0x0804bd57 in main ()
Read from bottom up, to go from parent to child. main() ---> XYZ() --> fun1() --> fun3()
The "??" entries represent symbol translation failure.
If symbols or stacks are too badly broken to make sense of the stack trace, then there are usually ways to fix it:
installing debug info packages (giving gdb more symbols, and letting it do DWARF-based stack walks), or
recompiling the software from source with frame pointers and debugging information (-fno-omit-frame-pointer -g)
The usual way is to build program without -g and to strip off the debug information before releasing. (without use of -g, one can still get a backtrace, except that inlined functions will not appear in it)
NOTE: core dumps will be created (if ulimit is set appropriately) even if binary doesn't contain debug information.
On a GNU (glibc) or BSD system, one can get a backtrace by calling 'backtrace' and related system calls. And then need to translate the function addresses into function names by running 'addr2line' (or duplicating its functionality).
Disassembly [disas]
(gdb) disas OR ← To disassembling (last segfaulted) function
(gdb) disas fun3() ← To disassembling the function
Dump of assembler code for function doupdate:
0x00007f0a37aac2e0 <+0>: push %r15
0x00007f0a37aac2e2 <+2>: push %r14
...
...
0x00007f0a37aac401 <+289>: mov 0x20cb68(%rip),%rax # 0x7f0a37cb8f70
0x00007f0a37aac408 <+296>: mov (%rax),%rsi
0x00007f0a37aac40b <+299>: xor %eax,%eax
=> 0x00007f0a37aac40d <+301>: mov 0x10(%rsi),%rdi
0x00007f0a37aac411 <+305>: cmpb $0x0,0x1c(%rdi)
...
...
The arrow "=>" points to segfault address in assembler code
Check Registers [info registers OR i r]
(gdb) info registers ← To print state of the registers
rax 0x0 0
rbx 0x1993060 26816608
rcx 0x19902a0 26804896
rdx 0x19ce7d0 27060176
rsi 0x0 0
rdi 0x19ce7d0 27060176
rbp 0x7f0a3848eb10 0x7f0a3848eb10 <SP>
rsp 0x7ffd33d93c00 0x7ffd33d93c00
r8 0x7f0a37cb93e0 13968186248905
...
...
Register state can be used to identify use of invalid address access comparing to that of segfaulted location from disassembly.
Memory Mappings [info proc mappings OR i proc m]
(gdb) info proc mappings
Mapped address spaces:
Start Addr End Addr Size Offset objfile
0x400000 0x6e7000 0x2e7000 0x0 /usr/bin/python2.7
0x8e6000 0x8e8000 0x2000 0x2e6000 /usr/bin/python2.7
0x8e8000 0x95f000 0x77000 0x2e8000 /usr/bin/python2.7
0x7f0a37a8b000 0x7f0a37ab8000 0x2d000 0x0 /lib/x86_64-linux-gnu/libncursesw.so.5.9
0x7f0a37ab8000 0x7f0a37cb8000 0x200000 0x2d000 /lib/x86_64-linux-gnu/libncursesw.s
...
...
Any address below first valid valid virtual address (Here 0x400000) is invalid, and if referenced, will trigger a segmentation fault.
Breakpoints [break OR b]
Once gdb is run with target binary, it’s symbols get loaded. After that breakpoint in program of target binary can be set.
While shared libraries symbol gets loaded when target binary is run. So setting breakpoint in program of shared library needs to set a breakpoint on ‘main()’ function.
Break point can be set to a function as
(gdb) break fun3
(gdb) break *fun3 + offset
(gdb) break sourcefile +linenumber
(gdb) start ← To have temporary breakpoint at main()
(gdb) b exit.c:32 ← breakpoint at exit of program, similar to breakpoint at last line of main() function
Breakpoints can be set once symbols are loaded by running gdb, before running program.
Breakpoints info [info breakpoints OR i b]
(gdb) info breakpoints ← To list breakpoints with information
Conditional Breakpoints [conditional OR cond]
Sometimes to reach the right invocation of a breakpoint, there may need more ‘continue’s. In case of more such invocations, it is always better to use a conditional breakpoint.
(gdb) cond <breakpoint no> <variable>==<value>
Watchpoint [watch]
(gdb) *(long**)0x12345678 ← To set data breakpoint until location memory changes
Run program [run OR r]
Run command takes arguments (if any) that will be passed to gdb target binary.
(gdb) run arg1 arg2 ← This ends up running “TargetBinary arg1 arg2”
Stepping [stepi OR si]
To step single instruction during debugging. Running this automatically breaks after executing current instruction.
Print value [print OR p]
Generally to print data being stored by a pointer or non-pointer variable.
(gdb) print <variable> ← To print value stored in variable
(gdb) print/a <address> or p/a ← Here ‘/a’ indicates format as an address
(gdb) print/d <variable or value> or p/d ← To print value in decimal
(gdb) print/x <variable or value> or p/x ← To print value in hexadecimal
(gdb) print/o <variable or value> or p/o ← To print value in octadecimal
(gdb) print $pc ← To print Program Counter
1 = (void(*)()) 0xab12cd34
(gdb) x 0xab12cd34 ← Access memory at address
(gdb) print $sp ← To print stack pointer
2 = (void *) 0x12345678
(gdb) print *(long**)0x12345678
3 = (long*) 0xabcdef
Continue [continue OR c]
(gdb) continue ← To continue execution till end or next break.
Record [record] , Reverse Stepping [reverse-stepi] , Reverse Continue [reverse-continue]
It needs to enable recording, which adds considerable overhead. So it is better no to enable recording from main() itself, but only at the required function.
Set breakpoint to required function
(gdb) break function1
Run target binary. Enable record once breakpoint reaches required function & then continue.
(gdb) record
(gdb) continue
After next halt (either due to segfault or due breakpoint), reverse stepping through lines or instructions since record enabled can be done.
It works by playing back register state from recording. It can be used to move back an instruction.
It is used for reversible debugging.
(gdb) reverse-stepi ← Go back 1 instruction
(gdb) reverse-continue ← To continue back in time
Selecting frame [frame OR f] , [up], [down OR do]
Most commands for examining the stack and other data in program work on whichever stack frame is selected at the moment. But any particular stack frame can be selected referring it’s index from ‘backtrace’ or ‘where’ output.
These frame selection commands finish by printing a two lines brief description of the stack frame just selected. The first line shows the frame number, the function name, the arguments, and the source file and line number of execution in that frame. The second line shows the text of that source line.
(gdb) frame n ← Select frame number n
Frame zero is the innermost (currently executing) frame, frame one is the frame that called the innermost one, and so on. The highest-numbered frame is the one for main.
(gdb) frame stack-addr [pc-addr] OR f stack-addr [pc-addr] ← Select the frame at address stack-addr.
This is useful mainly if the chaining of stack frames has been damaged by a bug, making it impossible for GDB to assign numbers properly to all frames. In addition, this can be useful when your program has multiple stacks and switches between them. The optional pc-addr can also be given to specify the value of PC (Program Counter) for the stack frame.
(gdb) up n ← Move n frames up the stack; n defaults to 1
For positive numbers n, this advances toward the outermost frame, to higher frame numbers, to frames that have existed longer.
(gdb) down n ← Move n frames down the stack; n defaults to 1
For positive numbers n, this advances toward the innermost frame, to lower frame numbers, to frames that were created more recently. You may abbreviate down as do.
Equivalent commands for GDB scripts, where the output might be unnecessary and distracting.
(gdb) select-frame
(gdb) up-silently n
(gdb) down-silently n
These commands are variants of ‘frame’, ‘up’ and ‘down’, respectively; they differ in that they do their work silently, without causing display of the new frame after selecting it.
Reference:
#) ‘Selecting a frame’ https://sourceware.org/gdb/onlinedocs/gdb/Selection.html
Listing source code [list OR l]
(gdb) list ← list source code around present statement
(gdb) list function1_name ← list few lines around mentioned function1_name
Return [return OR ret]
(gdb) return ← To return from current function immediately and not execute it
This changes the instruction path rather than the data.
Info [info OR i]
(gdb) info ← To list ‘info’ subcommands
Help [help OR h]
(gdb) help ← To list classes of commands
(gdb) help <command class> ← To get help on particular command & full listing
Text User Interface (TUI)
--tui option is used to run gdb in TUI mode
# gdb --tui <target binary> [coredump file]
After loading source code “>” in TUI shows the next line of code after crashed statement.
Arrows can be used for TUI mode, while commands can be typed for gdb mode.
(Source: https://undo.io/resources/presentations/cppcon-2015-greg-law-give-me-15-minutes-ill-change/)
# gdb --tui <target binary> [core file]
(gdb) start To add breakpoint at main()
(gdb) list To list code
(gdb) Ctrl + X + A Real TUI mode of source code
(gdb) next
(gdb) Ctrl + L Refresh screen
(gdb) Ctrl + X + 2 Source code with assembly
(gdb) Ctrl + X + 2 General Purpose Registers as well
(gdb) tui float reg Floating Registers
(gdb) Ctrl + X + 1 To go back
(gdb) Ctrl + P In order to go to previous command
(gdb) Ctrl + N In order to go to next command
Python Interpreter built in gdb (version 7 and after)
(gdb) python Multiple lines of python
(gdb) python [single line of python]
Python Pretty Printers can be used to print structures
#) “TUI Key Bindings”
Search a function
(gdb) search function1_name ← Search function1_name in current frame code
In TUI mode, search function1_name() only in current source file & shows it’s definition.
Searching a symbol [info symbol OR i sym]
(gdb) info symbol <symbolname> ← To get binary or library of symbolname
Even ‘objdump -tT <binary or library> | grep <symbolname>’ can be used.
Quit gdb [quit OR q]
(gdb) quit ← To come out of gdb debugging