Welcome to the final installment of how to write a primitive debugger. This post will cover some miscellaneous topics that were not present in the previous articles in order to add some missing core functionality. The topics covered here will be how to display a disassembly listing , how to step over code, i.e. step past a conditional branch, and how to dump and modify arbitrary memory of a process.

Disassembly

In order to display a disassembly dump on x86 and x64, this debugger will take advantage of the BeaEngine disassembly library. This is a very handy library that supports the 16/32/64-bit Intel instruction sets as well as floating point and vector extensions. The project is open source for those interested in looking at the internals of the disassembler. In the example code, it is distributed as DLLs that the code will load and be used at runtime. This is done as a convenience in order to prevent having to possibly recompile static libraries.

The disassembler code will be pretty straightforward to work with. BeaEngine has a DISASM structure that needs to be initialized with the architecture type and an address. This is then passed along to a Disasm function, which fills the structure with information about the instruction at the address. Since the disassembler is dynamically loaded, and is used for x86/x64 in the same code, the function pointer to Disasm needs to be retrieved. All of this initialization code can be handled in the constructor.

Disassembler :: Disassembler ( HANDLE hProcess ) : m_hProcess { hProcess } { memset ( & m_disassembler, 0 , sizeof ( DISASM ) ) ; #ifdef _M_IX86 m_disassembler. Archi = 0 ; if ( m_hDll == nullptr ) { m_hDll = LoadLibrary ( L "BeaEngine_x86.dll" ) ; m_pDisasm = ( pDisasm ) GetProcAddress ( m_hDll, "_Disasm@4" ) ; } #elif defined _M_AMD64 m_disassembler. Archi = 64 ; if ( m_hDll == nullptr ) { m_hDll = LoadLibrary ( L "BeaEngine_x64.dll" ) ; m_pDisasm = ( pDisasm ) GetProcAddress ( m_hDll, "Disasm" ) ; } #else #error "Unsupported architecture" #endif } Disassembler::Disassembler(HANDLE hProcess) : m_hProcess{ hProcess } { memset(&m_disassembler, 0, sizeof(DISASM)); #ifdef _M_IX86 m_disassembler.Archi = 0; if (m_hDll == nullptr) { m_hDll = LoadLibrary(L"BeaEngine_x86.dll"); m_pDisasm = (pDisasm)GetProcAddress(m_hDll, "_Disasm@4"); } #elif defined _M_AMD64 m_disassembler.Archi = 64; if(m_hDll == nullptr) { m_hDll = LoadLibrary(L"BeaEngine_x64.dll"); m_pDisasm = (pDisasm)GetProcAddress(m_hDll, "Disasm"); } #else #error "Unsupported architecture" #endif }

with m_hDll and m_pDisasm being static, since there’s no need to retrieve these per instance. Since the code is meant to work on x86/x64, there are two separate versions of the DLL provided — one for use in x86 applications, the other for x64.

Now that the disassembly engine is loaded and initialized, it is time to actually begin disassembling code. There is an interesting problem that comes up, however. The debugger is attached to another process, but the disassembler is given an address in the current address space to disassemble at, i.e. the user can request disassembly at address 0x00411000 when prompted. The disassembly at address 0x00411000 in the debugger doesn’t have any relation to the disassembly at address 0x00411000 in the target, due to how virtual memory works. So the solution isn’t as easy as setting the target address to disassemble at to 0x00411000 and calling Disasm.

Instead, the memory at 0x00411000 in the target process must be read and that must be disassembled. Something like this was already done when implementing Interrupt Breakpoints; the original byte at the address was saved before replacing it with an 0xCC opcode. For this, it is still as simple as calling ReadProcessMemory and storing the buffer.

const bool Disassembler :: TransferBytes ( const DWORD_PTR dwAddress ) { SIZE_T ulBytesRead = 0 ; bool bSuccess = BOOLIFY ( ReadProcessMemory ( m_hProcess, ( LPCVOID ) dwAddress, m_bytes. data ( ) , m_bytes. size ( ) , & ulBytesRead ) ) ; if ( bSuccess && ulBytesRead == m_bytes. size ( ) ) { return true ; } else { fprintf ( stderr , "Could not read from %p. Error = %X

" , dwAddress, GetLastError ( ) ) ; } return false ; } const bool Disassembler::TransferBytes(const DWORD_PTR dwAddress) { SIZE_T ulBytesRead = 0; bool bSuccess = BOOLIFY(ReadProcessMemory(m_hProcess, (LPCVOID)dwAddress, m_bytes.data(), m_bytes.size(), &ulBytesRead)); if (bSuccess && ulBytesRead == m_bytes.size()) { return true; } else { fprintf(stderr, "Could not read from %p. Error = %X

", dwAddress, GetLastError()); } return false; }

Once that is done, the disassembly process is no more difficult than the BeaEngine example. The target disassembly address is set and the Disasm function is called through the function pointer retrieved from the DLL. This function fills the DISASM structure (m_disassembler in the code), and returns the length of the instruction. This can be added to the previous address to get the address of the next instruction, and the process repeats.

const bool Disassembler :: BytesAtAddress ( DWORD_PTR dwAddress, size_t ulInstructionsToDisassemble /*= 15*/ ) { if ( IsInitialized ( ) ) { SetDisassembler ( dwAddress ) ; bool bFailed = false ; while ( ! bFailed && ulInstructionsToDisassemble -- > 0 ) { int iDisasmLength = m_pDisasm ( & amp ; m_disassembler ) ; if ( iDisasmLength ! = UNKNOWN_OPCODE ) { fprintf ( stderr , "0x%p - %s

" , dwAddress, m_disassembler. CompleteInstr ) ; m_disassembler. EIP + = iDisasmLength ; dwAddress + = iDisasmLength ; } else { fprintf ( stderr , "Error: Reached unknown opcode in disassembly.

" ) ; bFailed = true ; } } } else { fprintf ( stderr , "Could not show disassembly at address. Disassembler Dll was not loaded properly.

" ) ; return false ; } return true ; } const bool Disassembler::BytesAtAddress(DWORD_PTR dwAddress, size_t ulInstructionsToDisassemble /*= 15*/) { if (IsInitialized()) { SetDisassembler(dwAddress); bool bFailed = false; while (!bFailed && ulInstructionsToDisassemble-- > 0) { int iDisasmLength = m_pDisasm(&m_disassembler); if (iDisasmLength != UNKNOWN_OPCODE) { fprintf(stderr, "0x%p - %s

", dwAddress, m_disassembler.CompleteInstr); m_disassembler.EIP += iDisasmLength; dwAddress += iDisasmLength; } else { fprintf(stderr, "Error: Reached unknown opcode in disassembly.

"); bFailed = true; } } } else { fprintf(stderr, "Could not show disassembly at address. Disassembler Dll was not loaded properly.

"); return false; } return true; }

The SetDisassembler function is responsible for setting the correct starting address in the debuggers local copy of the target processes memory at the desired address. The debugger keeps a 4096 byte cache (the default Windows page size) and uses that if the target to disassemble exists within that range. Otherwise, a read is performed again and the cache re-initialized

void Disassembler :: SetDisassembler ( const DWORD_PTR dwAddress ) { bool bIsCached = ( ( dwAddress - m_dwStartAddress ) < m_bytes. size ( ) ) ; bIsCached & = ( dwAddress < m_dwStartAddress ) ; if ( ! bIsCached ) { ( void ) TransferBytes ( dwAddress ) ; m_disassembler. EIP = ( UIntPtr ) m_bytes. data ( ) ; m_dwStartAddress = dwAddress ; } else { m_disassembler. EIP = ( UIntPtr ) & amp ; m_bytes. data ( ) [ dwAddress - m_dwStartAddress ] ; } } void Disassembler::SetDisassembler(const DWORD_PTR dwAddress) { bool bIsCached = ((dwAddress - m_dwStartAddress) < m_bytes.size()); bIsCached &= (dwAddress < m_dwStartAddress); if (!bIsCached) { (void)TransferBytes(dwAddress); m_disassembler.EIP = (UIntPtr)m_bytes.data(); m_dwStartAddress = dwAddress; } else { m_disassembler.EIP = (UIntPtr)&m_bytes.data()[dwAddress - m_dwStartAddress]; } }

And that’s all it takes. The debugger can now print a disassembly listing at any readable address.

Step Over

Step into is the ability to step one instruction at a time as it executes and is something that is supported at the hardware level with the single step flag. Step over is implemented purely in code and is a convenience function that lets the user skip stepping into branches in the code. For example, take the following disassembly listing:

0040108D 81 C4 C0 00 00 00 add esp, 0C0h 00401093 3B EC cmp ebp,esp 00401095 E8 76 03 00 00 call SomeFunction (0401410h) 0040109A 8B E5 mov esp,ebp ...

Assume that you are at a broken state at address 0x0040108D. You know that SomeFunction is not of any interest to you and you don’t want to single step through it. You’d rather get to the more interesting parts at address 0x0040109A and below. So what you do is when you’re at 0x00401093, you set a breakpoint at 0x0040109A and continue execution. This effectively skips the CALL instruction at 0x00401095 and hits your breakpoint at the instruction immediately following it, so you can continue debugging. Step over effectively wraps these steps in to one convenient function provided by a debugger.

In order to perform a step over, the debugger must know what the next instruction is. This is obviously needed because it is the instruction that the user wishes to break at next. The next instruction can be one of a few types:

Invalid A non-branching instruction (i.e. add/mov/lea/push/…) A conditional branching instruction (i.e. jz/jge/jb/…) A non-conditional branching instruction (i.e. call/jmp/ret)

If it’s an invalid instruction, then it’s up to the debugger implementation to decide what to do next. In the second case, the next instruction is simply the address of the current one plus the length of the current instruction. The third case is interesting and is also partially implementation defined. If the user is broken on a conditional branch and wishes to step over, how should that be treated? For example, assume the user is looking at the following disassembly listing and is broken on 0x00401219:

00401213 8B 45 F8 mov eax,dword ptr [a] 00401216 3B 45 EC cmp eax,dword ptr [b] 00401219 7E 05 jle test+60h (0401220h) 0040121B E8 50 FF FF FF call d (0401170h) 00401220 8B F4 mov esi,esp

Assume [a] is greater than [b], so the jump will not be taken and the next instruction will be 0x0040121B. The user decides that they want to step over, so they will land at 0x0040121B, which is correct. Now assume the opposite: that [a] is less than or equal to [b]. This means that the branch will be taken and the next address will be 0x00401220. If the user is at 0x00401219 and decides to step over, then what happens? Since 0x0040121B will not be reached, that step over point isn’t necessary valid. Should execution continue because the step over will not be reached, or should the debugger “fix” it for the user and break at 0x00401220? Different debuggers do different things here. I would personally go with the latter case just to be safe. Especially since the debugger has access to the EFLAGS register and can tell whether the branch will be taken or not prior to execution of the instruction. This particular scenario is left undefined in the example code.

The last scenario is that of an unconditional branch. The two unconditional branches that affect implementing step over are JMP (unconditional jump) and RET (return). Under both of these, the point of execution is guaranteed to change: either to the jump destination or to the return address on the stack. Stepping over a RET instruction is pretty useless, because it won’t be hit. Likewise, stepping over a JMP instruction, in 95% of cases, will also be useless. The point of return from that JMP will most likely not be the instruction following it. For these cases, the example code converts the step over into a step into and follows execution. Having said all of this, the next instruction retrieval function is implemented as follows:

DWORD_PTR Disassembler :: GetNextInstruction ( const DWORD_PTR dwAddress, bool & bIsUnconditionalBranch ) { DWORD_PTR dwNextAddress = 0 ; if ( IsInitialized ( ) ) { SetDisassembler ( dwAddress ) ; int iDisasmLength = m_pDisasm ( & m_disassembler ) ; if ( iDisasmLength ! = UNKNOWN_OPCODE ) { if ( m_disassembler. Instruction . BranchType == RetType || m_disassembler. Instruction . BranchType == JmpType ) { bIsUnconditionalBranch = true ; } else { dwNextAddress = ( dwAddress + iDisasmLength ) ; } } else { fprintf ( stderr , "Could not get next instruction. Unknown opcode at %p.

" ) ; } } else { fprintf ( stderr , "Could not get next instruction. Disassembler Dll was not loaded propertly.

" ) ; } return dwNextAddress ; } DWORD_PTR Disassembler::GetNextInstruction(const DWORD_PTR dwAddress, bool &bIsUnconditionalBranch) { DWORD_PTR dwNextAddress = 0; if (IsInitialized()) { SetDisassembler(dwAddress); int iDisasmLength = m_pDisasm(&m_disassembler); if (iDisasmLength != UNKNOWN_OPCODE) { if (m_disassembler.Instruction.BranchType == RetType || m_disassembler.Instruction.BranchType == JmpType) { bIsUnconditionalBranch = true; } else { dwNextAddress = (dwAddress + iDisasmLength); } } else { fprintf(stderr, "Could not get next instruction. Unknown opcode at %p.

"); } } else { fprintf(stderr, "Could not get next instruction. Disassembler Dll was not loaded propertly.

"); } return dwNextAddress; }

with the full StepOver function being implemented as follows:

const bool Debugger :: StepOver ( ) { CONTEXT ctx = GetExecutingContext ( ) ; bool bIsUnconditionalBranch = false ; #ifdef _M_IX86 DWORD_PTR dwStepOverAddress = m_pDisassembler - > GetNextInstruction ( ctx. Eip , bIsUnconditionalBranch ) ; #elif defined _M_AMD64 DWORD_PTR dwStepOverAddress = m_pDisassembler - > GetNextInstruction ( ctx. Rip , bIsUnconditionalBranch ) ; #else #error "Unsupported platform" #endif if ( bIsUnconditionalBranch ) { return StepInto ( ) ; } else if ( dwStepOverAddress ! = 0 ) { m_pStepPoint - > Disable ( ) ; m_pStepPoint - > ChangeAddress ( dwStepOverAddress ) ; ( void ) m_pStepPoint - > Enable ( ) ; ctx. EFlags & = ~ 0x100 ; ( void ) SetExecutingContext ( ctx ) ; return Continue ( true ) ; } return false ; } const bool Debugger::StepOver() { CONTEXT ctx = GetExecutingContext(); bool bIsUnconditionalBranch = false; #ifdef _M_IX86 DWORD_PTR dwStepOverAddress = m_pDisassembler->GetNextInstruction(ctx.Eip, bIsUnconditionalBranch); #elif defined _M_AMD64 DWORD_PTR dwStepOverAddress = m_pDisassembler->GetNextInstruction(ctx.Rip, bIsUnconditionalBranch); #else #error "Unsupported platform" #endif if (bIsUnconditionalBranch) { return StepInto(); } else if (dwStepOverAddress != 0) { m_pStepPoint->Disable(); m_pStepPoint->ChangeAddress(dwStepOverAddress); (void)m_pStepPoint->Enable(); ctx.EFlags &= ~0x100; (void)SetExecutingContext(ctx); return Continue(true); } return false; }

with m_pStepPoint being a breakpoint to the step over address.

Dump and modify memory

This last piece of functionality is nothing more than an exercise in calling ReadProcessMemory and WriteProcessMemory.

const bool Debugger :: PrintBytesAt ( const DWORD_PTR dwAddress, size_t ulNumBytes /*= 40*/ ) { SIZE_T ulBytesRead = 0 ; std :: unique_ptr < unsigned char [ ] > pBuffer = std :: unique_ptr < unsigned char [ ] > ( new unsigned char [ ulNumBytes ] ) ; const bool bSuccess = BOOLIFY ( ReadProcessMemory ( m_hProcess ( ) , ( LPCVOID ) dwAddress, pBuffer. get ( ) , ulNumBytes, & ulBytesRead ) ) ; if ( bSuccess && ulBytesRead == ulNumBytes ) { for ( unsigned int i = 0 ; i < ulBytesRead ; ++ i ) { fprintf ( stderr , "%02X " , pBuffer. get ( ) [ i ] ) ; } fprintf ( stderr , "

" ) ; return true ; } fprintf ( stderr , "Could not read memory at %p. Error = %X

" , dwAddress, GetLastError ( ) ) ; return false ; } const bool Debugger :: ChangeByteAt ( const DWORD_PTR dwAddress, const unsigned char cNewByte ) { SIZE_T ulBytesWritten = 0 ; const bool bSuccess = BOOLIFY ( WriteProcessMemory ( m_hProcess ( ) , ( LPVOID ) dwAddress, & cNewByte, sizeof ( unsigned char ) , & ulBytesWritten ) ) ; if ( bSuccess && ulBytesWritten == sizeof ( unsigned char ) ) { return true ; } fprintf ( stderr , "Could not change byte at %p. Error = %X

" , dwAddress, GetLastError ( ) ) ; return false ; } const bool Debugger::PrintBytesAt(const DWORD_PTR dwAddress, size_t ulNumBytes /*= 40*/) { SIZE_T ulBytesRead = 0; std::unique_ptr<unsigned char[]> pBuffer = std::unique_ptr<unsigned char[]>(new unsigned char[ulNumBytes]); const bool bSuccess = BOOLIFY(ReadProcessMemory(m_hProcess(), (LPCVOID)dwAddress, pBuffer.get(), ulNumBytes, &ulBytesRead)); if (bSuccess && ulBytesRead == ulNumBytes) { for (unsigned int i = 0; i < ulBytesRead; ++i) { fprintf(stderr, "%02X ", pBuffer.get()[i]); } fprintf(stderr, "

"); return true; } fprintf(stderr, "Could not read memory at %p. Error = %X

", dwAddress, GetLastError()); return false; } const bool Debugger::ChangeByteAt(const DWORD_PTR dwAddress, const unsigned char cNewByte) { SIZE_T ulBytesWritten = 0; const bool bSuccess = BOOLIFY(WriteProcessMemory(m_hProcess(), (LPVOID)dwAddress, &cNewByte, sizeof(unsigned char), &ulBytesWritten)); if (bSuccess && ulBytesWritten == sizeof(unsigned char)) { return true; } fprintf(stderr, "Could not change byte at %p. Error = %X

", dwAddress, GetLastError()); return false; }

Testing the functionality

The same example program as in the previous posts will be used, with minor modifications:

#include void d ( ) { printf ( "d called.

" ) ; } void c ( ) { int i = 0x1234 ; printf ( "c called.

" ) ; printf ( "i is at %p with value %X.

" , & i, i ) ; d ( ) ; printf ( "i is at %p with value %X.

" , & i, i ) ; } void b ( ) { printf ( "b called.

" ) ; c ( ) ; } void a ( ) { printf ( "a called.

" ) ; b ( ) ; } int main ( int argc, char * argv [ ] ) { printf ( "Addresses:

" "a: %p

" "b: %p

" "c: %p

" "d: %p

" , a, b, c, d ) ; getchar ( ) ; while ( true ) { a ( ) ; getchar ( ) ; } return 0 ; } #include void d() { printf("d called.

"); } void c() { int i = 0x1234; printf("c called.

"); printf("i is at %p with value %X.

", &i, i); d(); printf("i is at %p with value %X.

", &i, i); } void b() { printf("b called.

"); c(); } void a() { printf("a called.

"); b(); } int main(int argc, char *argv[]) { printf("Addresses:

" "a: %p

" "b: %p

" "c: %p

" "d: %p

", a, b, c, d); getchar(); while (true) { a(); getchar(); } return 0; }

To test memory modification, the i variable can be modified while the program is in a broken state in the d function. Entered commands are in red.

a [A]ddress or [s]ymbol name? s Name: d Received breakpoint at address 00401170. Press c to continue, s to step into, o to step over. i Enter address to print bytes at: 0x 18fcac 34 12 00 00 CC CC CC CC 0C AD C2 AA 8C FD 18 00 8A 10 40 00 60 FE 18 00 94 FD 18 00 00 E0 FD 7F CC CC CC CC CC CC CC CC e Enter address to change byte at: 0x 18fcac Enter new byte: 0x 12 e Enter address to change byte at: 0x 18fcad Enter new byte: 0x 34 c Received step at address 00401171

Output from the target application:

Addresses: a: 00401000 b: 00401050 c: 004010A0 d: 00401170 a called. b called. c called. i is at 0018FCAC with value 1234 . d called. i is at 0018FCAC with value 3412 .

Disassembly and step over are pretty straightforward to test when lined up with the Visual Studio debugger. For example, below is the disassembly relevant to the a function:

//printf("a called.

"); 00401009 68 48 21 40 00 push 402148h 0040100E FF 15 94 20 40 00 call dword ptr ds:[402094h] 00401014 83 C4 04 add esp,4 //b(); 00401017 E8 14 00 00 00 call b (0401030h) 0040101C 5F pop edi } ...

Setting a breakpoint on 0x00401009 and stepping over shows the following behavior in the debugger:

a [A]ddress or [s]ymbol name? a Breakpoint address: 0x 401009 Received breakpoint at address 00401009 . Press c to continue, s to step into, o to step over. o Could not write back original opcode to address 00000000. Error = 1E7 Received breakpoint at address 0040100E . Press c to continue, s to step into, o to step over. o Received breakpoint at address 00401014 . Press c to continue, s to step into, o to step over. o Received breakpoint at address 00401017 . Press c to continue, s to step into, o to step over. o Received breakpoint at address 0040101C . Press c to continue, s to step into, o to step over.

Lastly, a disassembly listing for all of this can be displayed:

d Enter address to print disassembly at: 0x 401009 0x00401009 - push 00402148h 0x0040100E - call dword ptr [00402094h] 0x00401014 - add esp, 04h 0x00401017 - call 0067D3A3h 0x0040101C - pop edi 0x0040101D - pop esi 0x0040101E - pop ebx 0x0040101F - mov esp, ebp 0x00401021 - pop ebp 0x00401022 - ret 0x00401023 - int3 0x00401024 - int3 0x00401025 - int3 0x00401026 - int3 0x00401027 - int3

which lines up with what Visual Studio gives.

Wrap up

Writing a debugger may seem like a daunting task, but it is certainly attainable. Aside from the disassembly engine — which can be a whole long series of posts in itself — everything was written from scratch in about 2,000 lines of code (doing a ‘

’ regex search on the solution yields 2195 lines). Contained within those lines of code is the ability to

Add/Remove breakpoints

Step into / Step over instructions

Continue execution at a breakpoint or step

Print / Modify registers

Print a call stack

Match symbols to addresses / Dump symbols for a module

Print / Modify memory

Disassemble at an address

While it’s certainly not WinDbg or the Visual Studio debugger, it is an impressive amount for relatively little work. Hopefully those following these series of posts have gained a bit on insight into how the tools that they may use on a frequent basis work and what it takes to develop them. Thanks for reading.

Article Roadmap

The full source code relating to this can be found here. C++11 features were used, so MSVC 2012/2013 is most likely required.

