This is a little compilation of some useful configuration options and basic methods to debug a Linux kernel. There is good documentation in kernel.org and elsewhere but back when I was learning these things I felt that there are few places where you can get a high level overview. We will go through more advanced methods of profiling and tracing in another post.

It is assumed that we already know how to compile a kernel, use menuconfig and what kernel options are.

Basic kernel configuration

First, there are a few options that we want to activate in order to have them available in our running kernel. Some of them will have a performance toll and won’t be present in a production system. Others will just increase kernel size. You can get an overview of what is available in Kconfig.debug, and also you might find this kernel symbol reference handy.

DEBUG_KERNEL

This option exists specifically to unhide debugging options in menuconfig.

DEBUG_INFO

The resulting kernel image will include debugging info resulting in a larger kernel image. This adds debug symbols to the kernel and modules (gcc -g), and is needed if you intend to use kernel crashdump or binary object tools like crash, kgdb, LKCD, gdb, etc on the kernel.

DEBUG_BUGVERBOSE

When a program crashes due to an exception, or the kernel detects an internal error, the kernel can print a not so brief message explaining what the problem was. This debugging information is useful to developers and kernel hackers when tracking down problems, but mostly meaningless to other people. This is always helpful for debugging but serves no purpose on a production system.

BUG() panics will output the file name and line number of the BUG call as well as the EIP and oops trace. This aids debugging but costs about 70-100K of memory.

CONFIG_FRAME_POINTER

This option inserts code to into the compiled executable which saves the frame information in registers or on the stack at different points which allows a debugger such as gdb to more accurately construct stack back traces while debugging the kernel.

CONFIG_KALLSYMS

This will give us more information in stack traces from kernel oopses. The kernel will print out symbolic crash information and symbolic stack backtraces. This increases the size of the kernel somewhat, as all symbols have to be loaded into the kernel image.

If a kernel crashes, this option will allow us to get an output that includes function names and not only addresses so it will be much easier to discover what is going on. This is an example taken from the Ubuntu wiki

BUG: unable to handle kernel NULL pointer dereference at virtual address 00000008 printing eip: c022a7b5 *pde = 00000000 Oops: 0000 [#1] SMP Modules linked in: thinkpad_acpi ppdev speedstep_lib cpufreq_conservative cpufreq_userspace cpufreq_ondemand cpufreq_stats cpufreq_powersave freq_table video bay dock ac sbs button container battery lp irtty_sir sir_dev pcmcia parport_pc parport snd_cs46xx gameport snd_ac97_codec ac97_bus snd_pcm_oss snd_mixer_oss nsc_ircc snd_pcm snd_seq_dummy irda crc_ccitt snd_seq_oss psmouse i2c_piix4 snd_seq_midi snd_rawmidi snd_seq_midi_event serio_raw pcspkr snd_seq i2c_core snd_timer snd_seq_device snd soundcore snd_page_alloc shpchp pci_hotplug intel_agp yenta_socket rsrc_nonstatic pcmcia_core agpgart evdev ext3 jbd mbcache sg sr_mod cdrom sd_mod uhci_hcd usbcore ata_piix ata_generic libata scsi_mod e100 mii thermal processor fan fuse apparmor commoncap CPU: 0 EIP: 0060:[<c022a7b5>] Not tainted VLI EFLAGS: 00010202 (2.6.22-12-generic #1) EIP is at acpi_ns_internalize_name+0xd/0x83 eax: 00000008 ebx: 00000000 ecx: 00000000 edx: c7879e54 esi: d0b980c0 edi: c7879e54 ebp: c7879e70 esp: c7879de8 ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 Process modprobe (pid: 4467, ti=c7878000 task=ce5c94c0 task.ti=c7878000) Stack: 00000000 00000000 d0b97e60 00008080 c01c4390 d0b97e60 00000000 00000000 d0b980c0 00000000 c7879e70 c022a85c d0b97e60 c795d030 c7c604e0 c01c44ef 00000004 d0b97e60 c7acea18 c01c3884 00008080 00000004 00000004 00000080 Call Trace: [<c01c4390>] __sysfs_new_dirent+0x20/0x50 [<c022a85c>] acpi_ns_get_node+0x31/0x93 [<c01c44ef>] sysfs_make_dirent+0x2f/0x50 [<c01c3884>] sysfs_add_file+0x74/0x90 [<d0b910b7>] drv_acpi_handle_init+0x37/0x90 [thinkpad_acpi] [<c0231aef>] acpi_ut_release_mutex+0x5b/0x63 [<c0233ac0>] acpi_method_notify_enable+0x15/0x34 [<d0b5ba32>] cmos_init+0x52/0x70 [thinkpad_acpi] [<d0b5c21f>] thinkpad_acpi_module_init+0x27f/0x69a [thinkpad_acpi] [<c014a811>] sys_init_module+0x151/0x1a00 [<c01fb8cf>] prio_tree_insert+0x1f/0x250 [<c01041d2>] sysenter_past_esp+0x6b/0xa9 ======================= Code: c7 44 24 14 01 00 00 00 8b 54 24 14 8d 04 96 e9 f2 fe ff ff 83 c4 18 89 d0 5b 5e 5f 5d c3 55 57 89 d7 56 53 83 ec 1c 85 c0 74 67 <80> 38 00 74 62 85 d2 74 5e 89 04 24 89 e0 e8 b5 fb ff ff 8b 4c EIP: [<c022a7b5>] acpi_ns_internalize_name+0xd/0x83 SS:ESP 0068:c7879de8

We can manually check the symbol to address map in /proc/kallsyms. This output is in nm format.

# head -20 /proc/kallsyms 0000000000000000 A irq_stack_union 0000000000000000 A __per_cpu_start 0000000000004000 A cpu_debug_store 0000000000005000 A cpu_tss_rw 0000000000008000 A gdt_page 0000000000009000 A exception_stacks 000000000000e000 A entry_stack_storage 000000000000f000 A cpu_llc_id 000000000000f020 A cpu_llc_shared_map 000000000000f060 A cpu_core_map 000000000000f0a0 A cpu_sibling_map 000000000000f0e0 A cpu_info 000000000000f1d0 A cpu_number 000000000000f1d8 A this_cpu_off 000000000000f1e0 A x86_cpu_to_acpiid 000000000000f1e4 A x86_cpu_to_apicid 000000000000f1e6 A x86_bios_cpu_apicid 000000000000f1e8 A sched_core_priority 000000000000f200 A cpu_loops_per_jiffy 000000000000f220 A pmc_prev_left

Also, it is easy to access this information from kernel code. We can just use kallsyms_lookup_name() from kallsyms.c.

CONFIG_KALLSYMS_ALL

Normally kallsyms only contains the symbols of functions for nicer OOPS messages and backtraces (i.e., symbols from the text and inittext sections). This is sufficient for most cases. And only in very rare cases (e.g., when a debugger is used) all symbols are required (e.g., names of variables from the data sections, etc).

This option makes sure that all symbols are loaded into the kernel image (i.e., symbols from all sections) in cost of increased kernel size (depending on the kernel configuration, it may be 300KiB or something like this).

CONFIG_IKCONFIG

This option enables the complete Linux kernel .config file contents to be saved in the kernel. It provides documentation of which kernel options are used in a running kernel or in an on-disk kernel. This information can be extracted from the kernel image file with the script scripts/extract-ikconfig and used as input to rebuild the current kernel or to build another kernel.

CONFIG_IKCONFIG_PROC

Enable access to .config through /proc/config.gz in a running kernel.

$ zcat /proc/config.gz | head -20 # # Automatically generated file; DO NOT EDIT. # Linux/x86 4.16.4-1 Kernel Configuration # CONFIG_64BIT=y CONFIG_X86_64=y CONFIG_X86=y CONFIG_INSTRUCTION_DECODER=y CONFIG_OUTPUT_FORMAT="elf64-x86-64" CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig" CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_MMU=y CONFIG_ARCH_MMAP_RND_BITS_MIN=28 CONFIG_ARCH_MMAP_RND_BITS_MAX=32 CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8 CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16 CONFIG_NEED_DMA_MAP_STATE=y CONFIG_NEED_SG_DMA_LENGTH=y CONFIG_GENERIC_ISA_DMA=y

cmdline

We can check what parameters were used to invoke our running kernel in /proc/cmdline

$ cat /proc/cmdline 8250.nr_uarts=0 bcm2708_fb.fbwidth=656 bcm2708_fb.fbheight=416 bcm2708_fb.fbswap=1 vc_mem.mem_base=0x3ec00000 vc_mem.mem_size=0x40000000 dwc_otg.lpm_enable=0 console=ttyS0,115200 console=tty1 root=PARTUUID=9f51643f-02 rootfstype=ext4 elevator=deadline fsck.repair=yes rootwait

These are normally provided by the bootloader (GRUB or uboot), but sometimes those are locked so we can use CONFIG_CMDLINE and CONFIG_CMDLINE_OVERRIDE to work around this when we are building a kernel.

CONFIG_CMDLINE. Enter arguments here that should be compiled into the kernel image and used at boot time. If the boot loader provides a command line at boot time, it is appended to this string to form the full kernel command line, when the system boots.

CONFIG_CMDLINE_OVERRIDE. have the kernel ignore the boot loader command line, and use ONLY the built-in command line. This is used to work around broken boot loaders.

Analyzing running modules

printk

Probably the most basic means of debugging is reading printk output. There are several logging levels, and not all of them are active by default. Values go from emergency which is level 0 to debug which is level 8, with the default being warning which is level 4.

We can activate maximum verbose logging with

echo 8 > /proc/sys/kernel/printk

We can check the current logging level with

$ cat /proc/sys/kernel/printk 7 4 1 7 current default minimum boot-time-default

We can only see debug messages ( level 8 ), if we compile our kernel with CONFIG_DEBUG. Note that if we are generating too much logging we can severely affect system performance, specially if we are sending kernel logs through slow UART. For this situations, it is a good idea to use printk_ratelimited() or printk_once() if we are trying to introduce printk statements in places where they can get called too often, like the network packet stack.

For instance,

printk_ratelimited(KERN_WARNING "this line will be limited. Iteration %d

", count);

will result in many iterations being suppresed with a message like

[6886.360547] __ratelimit: 190 callbacks suppressed

an example of printk_once() could be

printk_once(KERN_INFO "this will only be printed once

");

See printk.h and this article for more details.

Dynamic debug

Using pr_debug() globally will cause a tremendous amount of logging so it is not very practical. To make the debug level more manageable, dynamic debugging was introduced.

We can activate it with CONFIG_DYNAMIC_DEBUG. We can see in printk.h that dynamic_pr_debug() will take over in this case.

/* If you are writing a driver, please use dev_dbg instead */ #if defined(CONFIG_DYNAMIC_DEBUG) #include <linux/dynamic_debug.h> /* dynamic_pr_debug() uses pr_fmt() internally so we don't need it here */ #define pr_debug(fmt, ...) \ dynamic_pr_debug(fmt, ##__VA_ARGS__) #elif defined(DEBUG) #define pr_debug(fmt, ...) \ printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__) #else #define pr_debug(fmt, ...) \ no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__) #endif

In order to control what we want to print, a file will appear in dynamic_debug/control. We now have absolute control about what printk lines we want to be logged. We can hide (-p ) or show (+p ) log messages at the file level, function level or even line level.

Examples

echo file tpm_nsc.c line 346-373 +p > /sys/kernel/dynamic_debug/control echo file tpm_nsc.c function init_nsc +p > /sys/kernel/dynamic_debug/control

We can also configure this at module load time with the dyndbg option, for example

# modprobe snd_hda_intel dyndbg=+p

CONFIG_DEBUG_FS

If we have this option selected, then we can mount debugfs like so

mount -t debugfs none /sys/kernel/debug

debugfs is a virtual filesystem where kernel modules can offer debug information and sometimes even receive specific configuration options through the hooks defined in debugfs.h.

For instance, if we are compiling Zswap with DEBUG_FS we can check different status parameters such as

# cat /sys/kernel/debug/zswap/written_back_pages

kgdb

This is really powerful. We can use good old gdb to analyze our kernel. We can set breakpoints, see memory contents and all the rest. We will need gdb or some cross-gdb if we are working with a different architecture, for example an embedded ARM board.

In our kernel we will want to activate

CONFIG_KGDB. Include in-kernel hooks for kgdb, the Linux kernel source level debugger

CONFIG_KGDB_SERIAL_CONSOLE. Share a serial console with kgdb. Sysrq-g must be used to break in initially.

CONFIG_MAGIC_SYSRQ . This will activate the magic SysRq Key.

First we configure kgdboc ( kgdb over console ) with the tty and baud rate of our UART.

echo "ttyS0,115200" > /sys/module/kgdboc/parameters/kgdboc

Next, we tell our kernel to enter in debug mode. This will freeze waiting for a gdb connection.

echo g > /proc/sysrq-trigger

We will probably also want to activate kgdbcon in order to send printk statements to the gdb session.

echo 1 > /sys/module/kgdb/parameters/kgdb_use_con

Finally, we connect our (cross)gdb instance to the QEMU instance or real board. First, we run the debugger passing it the kernel that we are running, compiled with debug symbols (CONFIG_DEBUG_INFO). Then, we configure the serial port and attach with the target remote command, just like when we use gdbserver .

./toolchain-mips_34kc_gcc-5.2.0_musl-1.1.11/bin/mips-openwrt-linux-gdb ./build_dir/target-mips_34kc_musl-1.1.11/linux-ar71xx_generic/linux-4.1.13/vmlinux set serial baud 115200 target remote /dev/ttyUSB0

We can also have the kernel wait for the gdb connection at early boot, but then we have to configure kgdboc in the kernel command line, and use the kgdbwait option. In this example we also are deactivating Kernel Address Space Layout Randomization so the memory locations are not messed up.

console=ttyS0,115200 kgdboc=ttyS0,11520 nokaslr kgdbwait

The kernel will be frozen until we connect with gdb. This will allow us to debug from early in the boot process if we need to.

kdb

kdb is a shell interface that lets us connect through a keyboard or serial console and inspect memory, registers and logs. It isn’t as powerful as kgdb in terms of inspecting execution following the source code. More details can be found in the kernel docs, but I recommend just sticking to kgdb.

CONFIG_PROC_KCORE

Kcore provides a virtual ELF core file of the live kernel accessible at /proc/kcore. This can be read with gdb and other ELF tools. No modifications can be made of the kernel memory using this mechanism.

This is very cool, because we can analyze a picture of the memory at the moment of invoking gdb. We need a copy of the running kernel compiled with debug symbols (CONFIG_DEBUG_INFO).

Unlike with kgdb, this is a limited simulation of the memory contents, and we cannot do fancy stuff such as control execution. Still we can access memory contents. For instance, if your kernel with symbols is vmlinux we would invoke gdb just like we would run it to analyze a core dump from any executable, but our core dump will be /proc/kcore.

# gdb vmlinux /proc/kcore (gdb) print jiffies $1 = 16467921357 (gdb)

We can refresh memory contents with

(gdb) core-file /proc/kcore

You can read a more detailed example in this article.