We recently had an incident where we had a Redis instance blocked on writing to disk, hung inside the kernel.

Through a variety of other circumstances this meant the only up to date copy of business critical data was in memory on a single machine, multiple independent replicas and backups were unavailable, and that machine could crash at any moment.

The problem was eventually solved without data loss of any kind due to our Techops team being utterly brilliant, but during the “incident” a number of ideas that would otherwise be dismissed as absolutely crazy were discussed.

One of the ideas was based on the simple idea that redis was an in-memory database. We had access to the machine. And the repeated question came up of just copying all the memory off of the machine, or at least out of the process, and wrest our data from it’s grasp.

Although mostly not serious, at least to start with, as options lessened the idea of taking a core-dump and picking it apart looked more and more like a straw worth clutching at.

It ended up not being needed, which is a good thing, but I figured out that it was at least possible, if you are really really really desperate, and have some time on your hands. On the off-chance anyone is really that desperate in the future, I thought I’d at least list out what direction you would need to take, and wish you the best of luck.

(Past this point I’m going to assume an exceptional knowledge of C, a good knowledge of GDB, ELF loading, symbol lookup, the amd64 ABI used by Linux, memory layout and generally how the dynamic linker works. If interested check out the osdev.org wiki, especially on linking, ELF loaders etc and http://www.x86-64.org/documentation/abi.pdf – Or just fake it and read on)

Given you’re at point of desperation, you’re probably going to be limited in either time or what you can do. At the very least you are going to want to get a core dump of the redis process, taking into account that it’s going to have to write to somewhere and if you can’t just do a redis SAVE, then there is a chance that you can’t write to disk.

If you have to write to /tmp using tmpfs, realise that you might run out of memory at some point when writing it, and although the OOM killer probably won’t kick in, be very very sure first. You might be able to use the remote gdb stub/server to dump out the core over the network rather than locally in a pinch.

GDB Helping

You can get gdb to attach to a process, dump out a core file using generate-core-file and then detach from the process and continue executing:

root@hope:~# ps ax | grep redis 8670 pts/1 S+ 0:00 grep redis 23332 ? Ssl 1:53 /usr/bin/redis-server 127.0.0.1:6379 root@hope:~# gdb -p 23332 GNU gdb (Debian 7.7.1+dfsg-5) 7.7.1 Copyright (C) 2014 Free Software Foundation, Inc. Attaching to process 23332 Reading symbols from /usr/bin/redis-server... (gdb) generate-core warning: Memory read failed for corefile section, 8192 bytes at 0x7fff94ade000. Saved corefile core.23332 (gdb) detac Detaching from program: /usr/bin/redis-server, process 23332 (gdb) q

Other things you want to get at this point if you can. Some of these might not be possible, or might not be possible to get off:

A screenshot of when you tool the dump! It includes: Every library that was loaded that gdb found (and didn’t find) symbols for Actual annotated call frame information with symbols and everything.

A copy of /proc/($pid)/maps for the process

The exact version of redis running, including any distribution specific patches (the distribution package number should be enough; You want to get an exact copy of the source code, and in a pinch you might need to reproduce the entire distribution build environment)

A copy of the redis binary for the process running in memory. If you have upgraded the redis-server and not restarted redis, this might not be available, but knowing the distribution package version might be enough to get it from elsewhere

Get this off of the server ASAP and somewhere you can work on it.

GDB Hell

At the time of writing, gdb had a couple of really big frustrating limitations and/or bugs when writing corefiles out.

The biggest is that although at the time of dumping it was quite happy loading in relative symbols from the binaries and libraries, and then translating them to absolute addresses from where the ELF loader had put the individual sections at runtime, the corefile had neither the symbol information, or the section load information (or at least not in a way that gdb could read it itself afterwards)

If we load the corefile we just dumped:

root@hope:~# gdb -c core.23332 GNU gdb (Debian 7.7.1+dfsg-5) 7.7.1 Copyright (C) 2014 Free Software Foundation, Inc. Core was generated by `/usr/bin/redis-server 127.0.0.1:6379'. #0 0x00007f3aa215f08f in ?? () (gdb) bt #0 0x00007f3aa215f08f in ?? () #1 0x0000000000000002 in ?? () #2 0x00007f3aa2d84040 in ?? () #3 0x00007f3aa2d840c0 in ?? () #4 0x0000000000000000 in ?? () (gdb) c

gdb no longer has any idea about what is going on.

If the core was dumped with a recent version of gdb, you might get some success with running gdb like this:

gdb /usr/bin/redis -c <corefile>

But, this wasn’t working on the version shipping with Debian wheezy.

The reason is that gdb just wasn’t saving the section mappings, or annotating them in full in the coredump. Here is the difference between a recent gdb coredump, and the coredump I got from production in the same method:

davidb@hope:~/scratch$ readelf -a recent.gdb.core | grep redis-server /usr/bin/redis-server /usr/bin/redis-server davidb@hope:~/scratch$ readelf -a production.redis.core | grep redis-server davidb@hope:~/scratch$

If you can, use a very recent version of gdb. If you can’t, you’re going to need to:

get /proc/($pid)/maps from earlier

figure out which the .text section is

dump out the symbol table from /usr/sbin/redis-server

rewrite the REL addresses with ABS addresses relative to the loaded * .text section

put it in a format gdb can read

load the symbol table in after you’ve loaded the coredump.

If you don’t have the /proc/($pid)/maps file, you might have to use readelf/objdump or something to disassemble the original binary, disassemble part of your coredump’s text section, and find some code blocks that match.

This isn’t quite as bad as it looks, but almost. If you have a chance to use gdb more on the original machine running redis, dump out the symbol location of a well-known function inside redis-server, and use that to figure out the .text section base. e.g.:

(gdb) p *redisCommand $1 = {<text variable, no debug info>} 0x7f61c9365f80 <redisCommand> (gdb)

Then, you should be able to look up the same symbol in the redis-server binary:

root@hope:/tmp# readelf -s /usr/bin/redis-server | grep redisCommand$ 1274: 000000000006af80 177 FUNC GLOBAL DEFAULT 13 redisCommand root@hope:/tmp#

The difference of which will give you the .text section start.

You still have to rewrite symbol table, or otherwise tell gdb where the .text section for the binary is (although there is an option for gdb to do that when loading symbols, it didn’t actually work for me as documented. gdb is a twisty maze of mostly working code)

Another option is to use the maint print symbol <filename> and other maint print *symbol commands to dump out gdb’s internal state while attached to the original process. It’s in a human, not machine readable format, but will give you the information.

This is probably an hour or two’s work, just to start being able to read the corefile working around gdb’s eccentricities. This is going to be the case with almost any version of gdb, made more fun by the fact that the amd64 arch code isn’t quite as mature as the x86 code.

Getting the source

Now we just need to get the source code for that exact version of redis. If we’re using debian, this is pretty easy if you have the package and version name; Just use apt-get source .

You will probably find it much easier past this point to be running the same development environment that the package was built with, but you will absolutely need the same version of GCC; Older versions of GCC will pack memory differently and optimise differently. You will almost certainly need the same versions of libc, libjemalloc, libpthread etc as were running on the original server.

If you’re really, really lucky, you should at this point be able to build redis-server and produce close to an identical binary as was running on the server (There is actually a reproducable-build patch in debian’s redis package which I suspect helps with this as well). Moreover, you should be able to have access to both the coredump, and the underlying source code and datastructures.

Digging into the corefile

So at this point, hopefully we have a gdb at a point that it not only has read the corefile, but also knows where the symbols are in memory. This is absolutely brilliant, because redis.c has the most awesome global variable ever:

/*================================= Globals ================================= */ /* Global vars */ struct redisServer server; /* server global state */

Because it’s declared as a global variable, it means that the linker has put it in the .bss segment so the ELF loader will zero out the memory for it at load time.

It also means that it has it’s own symbol table entry, so even from the core dump we know where all our redis server state is.

At this point though, gdb doesn’t know the structure of the data there:

(gdb) p server $1 = -1589279768 (gdb)

What we need to do is rebuild redis-server from the same version, and then use the binary without the debugging data stripped out to let gdb figure out where the source code is, and from there gdb will be able to go through the redis source and figure out how to introspect the datastructures.

I just used dpkg-buildpackage in the unpacked debian source for the same version as the redis-server I took a coredump from, and then went into the ‘src/’ directory where the built binaries and source code were sitting. Then we need to get gdb to load in the debugging data from the unstripped redis-server binary without disregarding the existing symbol information etc from the coredump.

This is possible, but gdb can’t figure out on it’s own where the .text section was loaded to in the process we coredumped. We can introspect gdb’s state again and get out the .text base address, and then just tell gdb when loading in the new redis-server binary to offset the .text section by that amount. “info files” will show where gdb has loaded in each file so far, and where it has mapped each file section to.

(gdb) info files Symbols from "/usr/bin/redis-server". Local core dump file: `/tmp/core.23332', file type elf64-x86-64. 0x00007f3aa03ff000 - 0x00007f3aa0bff000 is load1 0x00007f3aa0c00000 - 0x00007f3aa1400000 is load2 0x00007f3aa1400000 - 0x00007f3aa1c00000 is load3 0x00007f3aa1daa000 - 0x00007f3aa1daa000 is load4 (...) 0xffffffffff600000 - 0xffffffffff601000 is load33 Local exec file: `/usr/bin/redis-server', file type elf64-x86-64. Entry point: 0x7f3aa2ae6452 0x00007f3aa2acc238 - 0x00007f3aa2acc254 is .interp 0x00007f3aa2acc254 - 0x00007f3aa2acc274 is .note.ABI-tag (...) 0x00007f3aa2ae5f50 - 0x00007f3aa2b53d62 is .text (...)

In this case, the .text section of /usr/bin/redis-server was loaded in the original process at a base of 0x00007f3aa2ae5f50

We can pass this to add-symbol-file, load the recently build version of redis-server, and if we’re in the same directory then gdb will also load in all the source code.

(gdb) add-symbol-file redis-server 0x00007f3aa2ae5f50 add symbol table from file "redis-server" at .text_addr = 0x7f3aa2ae5f50 (y or n) y Reading symbols from redis-server...done. (gdb)

At this point, we can actually start looking at what was going on inside redis at the time of the core dump:

(gdb) p server $1 = {configfile = 0x0, hz = 10, db = 0x7f3ffdc3c600, commands = 0x7f3ffdc130e0, orig_commands = 0x7f3ffdc13140, el = 0x7f3ffdc121a0, lruclock = 14263069, activerehashing = 1, requirepass = 0x0, pidfile = 0x7f3ffdc0f6a0 "/var/run/redis.pid", arch_bits = 64, cronloops = 300, runid = "96832febd9e76e1d767753f09072ffd1375e6a35" ... (gdb)

struct redisServer is defined in redis.h, but what we’re looking at is the redis data itself. database 0 is located at server.db[0] as a struct redisDb .

The main hashtable for the db is in serverdb[0].ht, with db.h and db.c defining the hash table structure, operations, and a light API around it.

(gdb) p server.db[0] $2 = {dict = 0x7f3ffdc13200, expires = 0x7f3ffdc13260, blocking_keys = 0x7f3ffdc132c0, ready_keys = 0x7f3ffdc13320, watched_keys = 0x7f3ffdc13380, id = 0, avg_ttl = 0} (gdb) p server.db[0].dict $3 = (dict *) 0x7f3ffdc13200 (gdb) p *(server.db[0].dict) $4 = {type = 0x7f3fff5501a0 <dbDictType>, privdata = 0x0, ht = {{table = 0x7f3ffdc66400, size = 64, sizemask = 63, used = 54}, {table = 0x0, size = 0, sizemask = 0, used = 0}}, rehashidx = -1, iterators = 0} (gdb)

This is about as far as I got, but it was enough to start introspecting the data in a structured way. Getting past here you have wonderful options such as:

Scripting gdb to just walk the data for you

Dumping out the memory and then write a wrapper program using the redis headers / code itself to in a hack to walk the datastructures (heck, you might even be able to use the redis SAVE code itself if you are very very lucky)

Using gdb to start up a new running redis process and somehow overlay all the memory referenced to from server.db[0] from the coredump, and then reference it in one of the server.db[n] slots

<

p>The second is probably the best option. The third would be pretty cool, but the details would start being really annoying (like having to rewrite all the pointers in the entire db if you can’t map in the data into exactly the same addresses)