May 02, 2019 posted by Michał Górny

Upstream describes LLDB as a next generation, high-performance debugger . It is built on top of LLVM/Clang toolchain, and features great integration with it. At the moment, it primarily supports debugging C, C++ and ObjC code, and there is interest in extending it to more languages.

In February, I have started working on LLDB, as contracted by the NetBSD Foundation. So far I've been working on reenabling continuous integration, squashing bugs, improving NetBSD core file support and updating NetBSD distribution to LLVM 8 (which is still stalled by unresolved regressions in inline assembly syntax). You can read more about that in my Mar 2019 report.

In April, my main focus was on fixing and enhancing the support for reading and writing CPU registers. In this report, I'd like to shortly summarize what I have done, what I have learned in the process and what I still need to do.

Fixing MM register support The first task in my main TODO was to fix a bug in reading/writing MM registers that was identified earlier. The MM registers were introduced as part of MMX extensions to x86, and they were designed as overlapping with the earlier ST registers used by x87 FPU. For this reason, they are returned by the ptrace() call as a single fx_87_ac array whose elements are afterwards to work on both kinds of registers. The bug in question turned out to be mistaken use of fx_xmm instead of fx_87_ac . As a result, the values of mm0..mm7 registers were mapped to subsets of xmm0..xmm7 registers, rather than the correct set of st(0)..st(7) registers. The fix for the problem in question landed as r358178. However, the fix itself was the easier part. The natural consequence of identifying a problem with the register was to add a regression test for it. This in turn triggered a whole set of events that deserve a section of their own.

Adding tests for register operations Initially, the test for MM and XMM registers consisted of a simple program written in pure amd64 assembler that wrote known patterns to the registers in question, then triggered SIGTRAP via int3 , and a lit test that run LLDB in order to execute the program, read registers and compare their values to expected patterns. However, following upstream recommendations it quickly evolved. Firstly, upstream suggested replacing the assembly file with inline assembly in C or C++ program, in order to improve portability between platforms. As a result, I ended up learning how to use GCC extended inline assembly syntax (whose documentation is not exactly the most straightfoward to use) and created a test case that works fine both for i386 and amd64, and in a wide range of platforms supported by LLDB. Secondly, it was necessary to restrict the tests into native runs on i386 and amd64 hardware. I discovered that lit partially provides for this, by defining native feature whenever LLDB is being built as native executable (vs cross-compiling). It also defined a few platform-related features, so it seemed only natural to extend them to provide explicit target-x86 and target-x86_64 features, corresponding to i386 and amd64 targets. This was done in r358177. Thirdly, upstream asked me to add tests also for other register types, as well as for writing registers. This overlapped with our need to test new register routines for NetBSD, so I've focused on them. The main problem in adding more tests was that I needed to verify whether the processor supported specific instruction sets. For the time being, it seemed reasonable to assume that every possible user of LLDB would have at least SSE, and to filter tests specific to long mode on amd64 platform. However, adding tests for registers introduced by AVX extension required explicit check. I have discussed the problem with Pavel Labath of LLDB upstream, and considered multiple options. His suggestion was to make the test program itself run cpuid instruction, and exit with a specific status if the needed registers are not supported. Then I could catch this status from dotest.py test and mark the test as unsupported. However, I really preferred using plain lit over dotest.py (mostly because it naturally resembled LLDB usage), and wanted to avoid duplicating cpuid code in multiple tests. However, lit does not seem to support translating a specific exit status into 'unsupported'. The 'lit way' of solving this is to determine whether the necessary feature is available up front, and make the test depend on it. Of course, the problem was how to check supported CPU extensions from within lit. Firstly, I've considered the possibility of determining cpuinfo from within Python. This would be the most trivial option, however Python stdlib does not seem to provide appropriate functions and I wanted to avoid relying on external modules. Secondly, I've considered the possibility of running clang from within lit in order to build a simple test program running cpuid, and using it to fill the supported features. Finally, I've arrived at the simpler idea of making lit-cpuid , an additional utility program built as part of LLDB. This program uses very nice cpuid API exposed by LLVM libraries in order to determine the available extensions and print them for lit's use. This landed as r359303 and opened the way for more register tests. To this moment, I've implemented the following tests: tests for mm0..mm7 64-bit MMX registers and xmm0..xmm7 128-bit SSE registers mentioned above, common to i386 and amd64; read: r358178, write: r359681.

tests for the 8 general purpose registers: *AX .. *DX , *SP , *BP , *SI , *DI , in separate versions for i386 (32-bit registers) and amd64 (64-bit registers); read: r359438, write: r359441.

tests for the 8 additional 64-bit general purpose registers r8..r15, and 8 additional 128-bit xmm8..xmm15 registers introduced in amd64; read: r359210, write: r359682.

tests for the 256-bit ymm0..ymm15 registers introduced by AVX, in separate versions for i386 (where only ymm0..ymm7 are available) and amd64; read: r359304, write: r359783.

tests for the 512-bit zmm0..zmm31 registers introduced by AVX-512, in separate versions for i386 (where only zmm0..zmm7 are available) and amd64; read: r359439, write: r359797.

tests for the xmm16..xmm31 and ymm16..ymm31 registers that were implicitly added by AVX-512 (xmm, ymm and zmm registers overlap/extend their predecessors); read: r359780, write: r359797.

Fixing memory reading and writing routine The general-purpose register tests were initially failing on NetBSD. More specifically, the test worked correctly to the point of reading registers but afterwards lldb indicated a timeout and terminated the program instead of resuming it. While investigating this, I've discovered that it is caused by overwriting RBP. Curious enough, it happened only when large values were written to it. I've 'bisected' it to an approximate max value that still worked fine, and Kamil has identified it to be close to vm.maxaddress . GDB did not suffer from this issue. I've discussed it with Pavel Labath and he suggested it might be related to unwinding. Upon debugging it further, I've noticed that lldb-server is apparently calling ptrace() in an infinite loop, and this is causing communications with the CLI process (LLDB is using client-server model internally) to timeout. Ultimately, I've pinpointed it to memory reading routine not expecting read to set piod_len to 0 bytes (EOF). Apparently, this is exactly what happens when you try to read past max virtual memory address. I've made a patch for this. While reviewing it, Kamil also noticed that the routines are not summing up results of multiple split read/write calls as well. I've addressed both issues in r359572

Necessary extension of ptrace interface At the moment, NetBSD implements 4 requests related to i386/amd64 registers: PT_[GS]ETREGS which covers general-purpose registers, IP, flags and segment registers,

PT_[GS]ETFPREGS which covers FPU registers (and xmm0..xmm15 registers on amd64),

PT_[GS]ETDBREGS which covers debug registers,

PT_[GS]ETXMMREGS which covers xmm0..xmm15 registers on i386 (not present on amd64). The interface is missing methods to get AVX and AVX-512 registers, namely ymm0..ymm15 and zmm0..zmm31. Apparently there's struct xsave_ymm for the former in kernel headers but it is not used anywhere. I am considering different options for extending this. Important points worth noting are that: YMM registers extend XMM registers, and therefore overlap with them. The existing struct xsave_ymm seems to use that, and expect only the upper half of YMM register to be stored there, with the lower half being accessible via XMM. Similar fact holds for ZMM vs YMM. AVX-512 increased the register count from 16 to 32. This means that there are 16 new XMM registers that are not accessible via current API. This also opens questions about future extensibility of the interface. After all, we are not only seeing new register types added but also an increase in number of registers of existing types. What I'd really like to avoid is having an increasingly cluttered interface. How are other systems solving it? Linux introduced PT_[GS]ETREGSET request that accepts a NT_* constant identifying register set to operate on, and iovec structure containing buffer location and size. For x86, the constants equivalent to older PT_* requests are available, and a NT_X86_XSTATE that uses full XSAVE area. The interface supports operating on complete XSAVE area only, and requires the caller to identify the correct size for the CPU used beforehand. FreeBSD introduced PT_[GS]ETXSTATE request that operates on full or partial XSAVE data. If the buffer provided is smaller than necessary, it is partially filled. Additionally, PT_GETXSTATE_INFO is provided to get the buffer size for the CPU used. A similar solution would be convenient for future extensions, as the caller would be able to implement them without having the kernel updated. Its main disadvantage is that it requires all callers to implement XSAVE area format parsing. Pavel Labath also suggested that we could further optimize it by supplying an offset argument, in order to support partial XSAVE area transfer. An alternative is to keep adding new requests for new register types, i.e. PT_[GS]ETYMMREGS for YMM, and PT_[GS]ETZMMREGS for ZMM. In this case, it is necessary to further discuss the data format used. It could either be the 'native' XSAVE format (i.e. YMM would contain only upper halves of the registers, ZMM would contain upper halves of zmm0..zmm15 and complete data of zmm16..zmm31), or more conveniently to clients (at the cost of data duplication) whole registers. If the latter, another question is posed: should we provide a dedicated interface for xmm16..xmm31 (and ymm16..ymm31) then, or infer them from zmm16..zmm31 registers?