June 05, 2019 posted by Michał Górny

Upstream describes LLDB as a next generation, high-performance debugger . It is built on top of LLVM/Clang toolchain, and features great integration with it. At the moment, it primarily supports debugging C, C++ and ObjC code, and there is interest in extending it to more languages.

In February, I have started working on LLDB, as contracted by the NetBSD Foundation. So far I've been working on reenabling continuous integration, squashing bugs, improving NetBSD core file support and lately extending NetBSD's ptrace interface to cover more register types. You can read more about that in my Apr 2019 report.

In May, I was primarily continuing the work on new ptrace interface. Besides that, I've found and fixed a bug in ptrace() compat32 code, pushed LLVM buildbot to ‘green’ status and found some upstream LLVM regressions. More below.

Adding register read/write tests to ATF tests Last month, I have implemented a number of register reading/writing tests for LLDB. This month I've introduced matching tests inside NetBSD's ATF test suite. This provides the ability to test NetBSD's ptrace implementation directly on the large variety of platforms and kernels supported by NetBSD. With the dynamic development of NetBSD, running LLDB tests everywhere would not be feasible. While porting the tests, I've made a number of improvements, some of them requested specifically by LLDB upstream. Those include: starting to use better input/output operands for assembly, effectively reducing the number of direct register references and redundant code: r359978,

using more readable/predictable constants for register data, read part: r360041, write part: r360154,

using %0 and %1 operands to reference memory portably between i386 and amd64: r360148. The relevant NetBSD commits for added tests are (using the git mirror): general-purpose register reading tests: 7a58d92435a9,

fix to the above: split tests for reading i386 gp registers as not to require MMX: 06be77bbafa6,

r8..r15 amd64 register reading tests: 86f6b1d4dab6,

mm & xmm register reading tests: 3ef02e1666ae,

general-purpose register writing tests: 95bfedcb6a89,

mm & xmm register writing tests: 2c8335920f61. While working on this, I've also noticed that struct fpreg and struct xmmregs are not fully specified on i386. In bbc3f184d470, I've added the fields needed to make use of those structures convenient.

Fixing compat32: request mapping and debug registers Kamil has asked me to look into PR#54233 indicating problems with 32-bit application debugging on amd64. While the problem in question most likely combines multiple issues, one specifically related to my work was missing PT_*DBREGS support in compat32. While working on this, I've found out that the functions responsible for implementing those requests were not called at all. After investigating, I've came to the following conclusion. The i386 userland code has passed PT_* request codes corresponding to i386 headers to the compat32 layer. The compat32 layer has passed those codes unmodified to the common kernel code and compared them to PT_* constants available in kernel code which happened to be amd64 constants. This worked fine for low requests numbers that happened to match on both architectures. However, i386 adds two additional requests ( PT_*XMMREGS ) after PT_SETFPREGS , and therefore all remaining requests are offset. To solve this, I've created a request code mapping function that converts i386 codes coming from userland to the matching amd64 values used in the kernel. For the time being, this supports only requests common to both architectures, and therefore PT_*XMMREGS can't be implemented without further hacking it. Once I've managed to fix compat32, I went ahead to implement PT_*DBREGS in compat32. Kamil has made an initial implementation in the past but it was commented out and lacked input verification. However, I've chosen to change the implementation a bit and reuse x86_dbregs_read() and x86_dbregs_write() functions rather than altering pcb directly. I've also added the needed value checks for PT_SETDBREGS . Both changes were committed to /usr/src: Translate userland PT_* request values into kernel codes,

Implement PT_GETDBREGS and PT_SETDBREGS.

Initial XSAVE work In the previous report, I have been considering which approach to take in order to provide access to the additional FPU registers via ptrace. Eventually, the approach to expose the raw contents of XSAVE area got the blessing, and I've started implementing it. However, this approach proved impractical. The XSAVE area in standard format (which we are using) consists of three parts: FXSAVE-compatible legacy area, XSAVE header and zero or more extended components. The offsets of those extended components turned out to be unpredictable and potentially differing between various CPUs. The architecture developer's manual indicates that the relevant offsets can be obtained using CPUID calls. Apparently both Linux and FreeBSD did not take this into consideration when implementing their API, and they effectively require the caller to issue CPUID calls directly. While such an approach could be doable in NetBSD, it would prevent core dumps from working correctly on a different CPU. Therefore, it would be necessary to perform the calls in kernel instead, and include the results along with XSAVE data. However, I believe that doing so would introduce unnecessary complexity for no clear gain. Therefore, I proposed two alternative solutions. They were to either: copy XSAVE data into custom structure with predictable indices, or implement separate PT_* requests for each component group, with separate data structure each.

Comparison of the two proposed solutions Both solutions are roughly equivalent. The main difference between them is that the first solution covers all extended registers (and is future-extensible) in one request call, while the second one requires new pair of requests for each new register set. I personally prefer the former solution because it reduces the number of ptrace calls needed to perform typical operations. This is especially relevant when reading registers whose contents are split between multiple components: YMM registers (whose lower bits are in SSE area), and lower ZMM registers (whose lower bits are YMM registers). Example code reading the ZMM register using a single request solution would look like: struct xstate xst; struct iovec iov; char zmm_reg[64]; iov.iov_base = &xst; iov.iov_len = sizeof(xst); ptrace(PT_GETXSTATE, child_pid, &iov, 0); // verify that all necessary components are available assert(xst.xs_xstate_bv & XCR0_SSE); assert(xst.xs_xstate_bv & XCR0_YMM_Hi128); assert(xst.xs_xstate_bv & XCR0_ZMM_Hi256); // combine the values memcpy(&zmm_reg[0], &xst.xs_fxsave.fx_xmm[0], 16); memcpy(&zmm_reg[16], &xst.xs_ymm_hi128.xs_ymm[0], 16); memcpy(&zmm_reg[32], &xst.xs_zmm_hi256.xs_zmm[0], 32); For comparison, the equivalent code for the other variant would roughly be: #if defined(__x86_64__) struct fpreg fpr; #else struct xmmregs fpr; #endif struct ymmregs ymmr; struct zmmregs zmmr; char zmm_reg[64]; #if defined(__x86_64__) ptrace(PT_GETFPREGS, child_pid, &fpr, 0); #else ptrace(PT_GETXMMREGS, child_pid, &fpr, 0); #endif ptrace(PT_GETYMMREGS, child_pid, &ymmr, 0); ptrace(PT_GETZMMREGS, child_pid, &zmmr, 0); memcpy(&zmm_reg[0], &fpr.fxstate.fx_xmm[0], 16); memcpy(&zmm_reg[16], &ymmr.xs_ymm_hi128.xs_ymm[0], 16); memcpy(&zmm_reg[32], &zmmr.xs_zmm_hi256.xs_zmm[0], 32); I've submitted a patch set implementing the first solution, as it was easier to convert to from the initial approach. If the feedback indicates the preference of the other solution, a conversion to it should also be easier to the other way around. It is available on tech-kern mailing list: [PATCH 0/2] PT_{GET,SET}XSTATE implementation, WIP v1. The initial implementation should support getting and setting x87, SSE, AVX and AVX-512 registers (i.e. all types currently enabled in the kernel). The tests cover all but AVX-512. I have tested it on native amd64 and i386, and via compat32.