Recently I’ve got asked about this quite an old unpublished project that I actually forgot about and I decided to finally do something with it. I’ve mentioned it once in this post: http://blog.rewolf.pl/blog/?p=856 and I’ve shared below screen-shot with an annotation that it will be published soon as a part of dirtyJOE:

One and a half year passed and I did nothing regarding this integration. Even more, at some point I came to the conclusion that this project does not fit for dirtyJOE as it’s rather Proof of Concept than a proper solution to the problem (no matter if the proper solution even exists). I’m still not sure if I don’t change my mind and eventually make it a part of the dirtyJOE, but I’m 100% sure that JVM Operand Stack Viewer deserves a release. An open source release with the full description of the method used to obtain JVM operand stack.

So here it is: https://github.com/rwfpl/rewolf-jvm-operand-stack-viewer

Code is a bit ugly, but it fulfils its task. Initial version was working only with x64 JVM, but I’ve spent some more time on it recently and adjusted it to the x86 JVM as well. Actually I was really surprised that it works on x86, as the method that I’ve invented is more like VooDoo.

JVM Operand Stack detection

I’m not using any Java debugging interface, the whole detection is purely based on scanning of processes memory. Speaking more precisely, I scan a stack of each thread running in the system, which is sufficient to correctly identify all Java threads. Pseudo code for this phase:

for ( auto p : Processes ) { for ( auto t : p. Threads ) { CONTEXT ctx ; SuspendThread ( t ) ; GetThreadContext ( t, & ctx ) ; ResumeThread ( t ) ; DWORD_PTR stack [ 0x4000 ] ; SIZE_T readSize ; ReadProcessMemory ( p, ctx. Esp , stack, sizeof ( stack ) , & readSize ) ; doJVMOpStackVooDoo ( stack, readSize / sizeof ( DWORD_PTR ) , ctx. Esp , p ) } } for (auto p : Processes) { for (auto t : p.Threads) { CONTEXT ctx; SuspendThread(t); GetThreadContext(t, &ctx); ResumeThread(t); DWORD_PTR stack[0x4000]; SIZE_T readSize; ReadProcessMemory(p, ctx.Esp, stack, sizeof(stack), &readSize); doJVMOpStackVooDoo(stack, readSize/sizeof(DWORD_PTR), ctx.Esp, p) } }

Yup, I’m aware that there is a race condition and ReadProcessMemory is accessing memory that probably might not be a stack any more, but it actually doesn’t matter, as if the particular thread is debugged it should be in suspended state anyway. Below there is a simplified code of doJVMOpStackVooDoo method:

int doJVMOpStackVooDoo ( DWORD_PTR * stack, size_t cnt, DWORD_PTR ctx_rsp, HANDLE hProc ) { int retStandard = - 1 ; int retMonitor = - 1 ; for ( size_t i = 0 ; i < cnt ; i ++ ) { if ( - 1 == retStandard ) { retStandard = VooDooCheckStandard ( stack, i, ctx_rsp, hProc ) ; if ( - 1 == retMonitor ) retMonitor = VooDooCheckMonitor ( stack, i, ctx_rsp, hProc ) ; } // do some other stuff for each stack entry } return ( - 1 ! = retStandard ) ? retStandard : retMonitor ; } int doJVMOpStackVooDoo(DWORD_PTR* stack, size_t cnt, DWORD_PTR ctx_rsp, HANDLE hProc) { int retStandard = -1; int retMonitor = -1; for (size_t i = 0; i < cnt; i++) { if (-1 == retStandard) { retStandard = VooDooCheckStandard(stack, i, ctx_rsp, hProc); if (-1 == retMonitor) retMonitor = VooDooCheckMonitor(stack, i, ctx_rsp, hProc); } // do some other stuff for each stack entry } return (-1 != retStandard) ? retStandard : retMonitor; }

There are two different detection mechanisms VooDooCheckStandard and VooDooCheckMonitor, both are executed against each stack entry. VooDooCheckStandard is more important and most JVM threads are detected by this method:

int VooDooCheckStandard ( DWORD_PTR * stack, int i, DWORD_PTR ctx_rsp, HANDLE hProc ) { if ( ( 0 ! = i ) && ( stack [ i ] == ctx_rsp + i * sizeof ( DWORD_PTR ) ) ) { for ( int j = i - 1 ; j > 0 ; j -- ) { MEMORY_BASIC_INFORMATION mbi ; VirtualQueryEx ( hProc, ( LPCVOID ) stack [ j ] , & mbi, sizeof ( mbi ) ) ; if ( ( PAGE_EXECUTE_READWRITE == mbi. Protect ) && ( MEM_PRIVATE == mbi. Type ) ) { JVMJITCode jit = { 0 } ; SIZE_T dummy ; ReadProcessMemory ( hProc, mbi. AllocationBase , & jit, sizeof ( jit ) , & dummy ) ; if ( jit. unk_ffffffff == - 1 ) { return j + 1 ; } } } } return - 1 ; } int VooDooCheckStandard(DWORD_PTR* stack, int i, DWORD_PTR ctx_rsp, HANDLE hProc) { if ((0 != i) && (stack[i] == ctx_rsp + i*sizeof(DWORD_PTR))) { for (int j = i - 1; j > 0; j--) { MEMORY_BASIC_INFORMATION mbi; VirtualQueryEx(hProc, (LPCVOID)stack[j], &mbi, sizeof(mbi)); if ((PAGE_EXECUTE_READWRITE == mbi.Protect) && (MEM_PRIVATE == mbi.Type)) { JVMJITCode jit = { 0 }; SIZE_T dummy; ReadProcessMemory(hProc, mbi.AllocationBase, &jit, sizeof(jit), &dummy); if (jit.unk_ffffffff == -1) { return j + 1; } } } } return -1; }

At first, stack entry (stack[i]) is evaluated by the condition: if (ptr == *ptr) (in assembly it would be: cmp esp, [esp]). This is pretty unusual and it’s good initial indicator of JVM stack. Second step requires checking of the stack entries above previously found value (stack[i-1], stack[i-2]…). If any of those entries points to the JITed code (PAGE_EXECUTE_READWRITE memory with MEM_PRIVATE type) then top of the JVM operand stack is one entry below. Let’s say that stack[i-8] points to the JITed code, then the top of the JVM operand stack is located at stack[i-7]. There is one additional check to ensure that memory region which is supposed to be JITed code is really JVM JIT. JVMJITCode structure is used to describe some aspects of JITed code. I didn’t bother to look for the proper definition of this structure in JDK, so my check for jit.unk_ffffffff == -1 may be plain wrong, but so far it works well. This structure can be found at the beginning of the allocated RWX region (mbi.AllocationBase). JVMJITCode structure is defined as follows:

#pragma pack(push, 1) struct JVMJITCode { DWORD_PTR unk01 ; // x64: = 4; x86: = 2 DWORD_PTR unk02 ; // = 1 DWORD_PTR jvm_offset_BufferBlobVtable ; DWORD_PTR jvm_offset_flush_icache_stub ; DWORD unk_tab01 [ 5 ] ; DWORD unk_ffffffff ; // = -1 } ; #pragma pack(pop) #pragma pack(push, 1) struct JVMJITCode { DWORD_PTR unk01; // x64: = 4; x86: = 2 DWORD_PTR unk02; // = 1 DWORD_PTR jvm_offset_BufferBlobVtable; DWORD_PTR jvm_offset_flush_icache_stub; DWORD unk_tab01[5]; DWORD unk_ffffffff; // = -1 }; #pragma pack(pop)

Second detection called VooDooCheckMonitor is a bit more complicated and I believe that it may cause some problems which I’m not aware of. It only affects threads that uses monitorenter/monitorexit JVM opcodes. When monitorenter is executed cmp esp, [esp] condition is no longer true and VooDooCheckStandard fails for such threads. What is even more interesting, monitorexit doesn’t fix the situation. I’ll show how the stack behaves after execution of those opcodes:

before monitorenter: 026AF1D0: E0369580 026AF1E0: 026AF1E0 after monitorenter: 026AF1C8: E0369580 026AF1E0: 026AF1D0 before second monitorenter: 026AF1B8: E036D2F0 026AF1E0: 026AF1D0 after second monitorenter: 026AF1B0: 00000048 026AF1E0: 026AF1C0

It’s now pretty obvious that after each monitorenter bottom of the JVM stack is off by 0x10 (on x64, on x86 it will be 0x8, basically two stack entries). So I’ve added below check to the VooDooCheckMonitor (it’s ugly, I know):

const int maxMonitorDepth = min ( i, 20 ) / 2 ; if ( ( i > 2 ) && ( stack [ i ] < ctx_rsp + i * sizeof ( DWORD_PTR ) ) && ( stack [ i ] >= ctx_rsp + ( i - maxMonitorDepth ) * sizeof ( DWORD_PTR ) ) && ( ( stack [ i ] & BITMASK ) == 0 ) && ( ( ctx_rsp + i * sizeof ( DWORD_PTR ) - stack [ i ] ) % ( 2 * sizeof ( DWORD_PTR ) ) == 0 ) ) { //... } const int maxMonitorDepth = min(i, 20) / 2; if ((i > 2) && (stack[i] < ctx_rsp + i*sizeof(DWORD_PTR)) && (stack[i] >= ctx_rsp + (i - maxMonitorDepth)*sizeof(DWORD_PTR)) && ((stack[i] & BITMASK) == 0) && ((ctx_rsp + i*sizeof(DWORD_PTR) - stack[i]) % (2*sizeof(DWORD_PTR)) == 0)) { //... }

Second thing that can be observed just by looking at the memory pointed by those monitor addresses is that it usually points back to the stack entry above the monitor address. I’ve added this check as well:

// Check only first monitor, maybe it's sufficient. DWORD_PTR monitorAddr = 0 ; SIZE_T dummy = 0 ; ReadProcessMemory ( hProc, ( LPCVOID ) stack [ i - 1 ] , & monitorAddr, sizeof ( monitorAddr ) , & dummy ) ; if ( ( monitorAddr == ctx_rsp + ( i - 2 ) * sizeof ( DWORD_PTR ) ) || ( 0 == stack [ i - 1 ] ) ) { //... } // Check only first monitor, maybe it's sufficient. DWORD_PTR monitorAddr = 0; SIZE_T dummy = 0; ReadProcessMemory(hProc, (LPCVOID)stack[i - 1], &monitorAddr, sizeof(monitorAddr), &dummy); if ((monitorAddr == ctx_rsp + (i - 2)*sizeof(DWORD_PTR)) || (0 == stack[i - 1])) { //... }

(0 == stack[i – 1]) is there to handle some cases after monitorexit, as it seems that those monitors addresses are zeroed, but the bottom of the stack is not adjusted. This empty space is later reused by another monitorenter. Rest of the VooDooCheckMonitor is similar to the VooDooCheckStandard.

Final words

To handle x86 JVM, project should be compiled as a 32-bit application, to handle x64 JVM it should be 64-bit compilation. I’ve tested this method with JRE 7/8 x86/x64 and it works really well. Actually, at the beginning I was using completely different detection, but it stopped working with JRE 8, so I had to rethink all this stuff and prepare something that will work better. That’s it.