Linux (as other UNIXes) allow you to register handlers for signals. Here we are interested in SIGSEGV. This signal is sent to your program when you try to use a memory location you shouldn’t. Typically, when deferencing null.

Language like Java send a NullPointerException, that can be caught and you can recover from it. However, in a system language, you usually get a cryptic « segmentation fault », you cannot recover from it and cannot have any information about it outside a debugger. Let’s see how we can fix this.

As C++ and D are system languages that support exceptions, we will use this mechanism to handle SIGSEGV. I’ll do it in D in this post, but the same is doable in C++. If you understand why it work, it shouldn’t be a problem.

It is not that simple

1

2

3

4

5

6

shared static this ( ) {

sigaction_t action ;

action. sa_sigaction = & handleSignal ;

action. sa_flags = SA_SIGINFO ;

sigaction ( SIGSEGV , & action , null ) ;

}

With this simple sample code, we can register our handler, called handleSignal. But this isn’t as simple as this. When you get into handleSignal, you are not in a standard execution mode. Linux stored the whole state of you application, then called you code, and then will restore thats state when you return. It makes it impossible to throw or get a correct stack trace.

Let’s fool linux into calling our code when returning from the handler !

Well, if we are not able to do whatever we want within the signal handler, then let’s modify the stored context, so linux will restore something different that execute the code we want.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

static REG_TYPE saved_EAX , saved_EDX ;



extern ( C )

void handleSignal ( int signum , siginfo_t * info , void * contextPtr ) {

auto context = cast ( ucontext_t * ) contextPtr ;



// Save registers into global thread local, to allow recovery.

saved_EAX = context. uc_mcontext . gregs [ REG_EAX ] ;

saved_EDX = context. uc_mcontext . gregs [ REG_EDX ] ;



// Hijack current context so we call our handler.

auto eip = context. uc_mcontext . gregs [ REG_EIP ] ;

auto addr = cast ( REG_TYPE ) info._sifields._sigfault. si_addr ;

context. uc_mcontext . gregs [ REG_EAX ] = addr ;

context. uc_mcontext . gregs [ REG_EDX ] = eip ;

context. uc_mcontext . gregs [ REG_EIP ] = & sigsegv_userspace_handler ;

}

OK, so what is happening here ? First of all, linux is passing us a structure or type ucontext_t, that contains system dependent information about the context in which the segfault happened. This context is used by linux to restore the program state after we executed our signal handler. So let’s modify it to call the code we want.

On x86, EIP is the register that store the address of the instruction that is executed. If we modify the value of this register then the code placed at the new value will be executed when linux restore the program state. This is good, but not good enough.

It is also mandatory that we store the old EIP value. If we don’t, then we loose the information about where does the segfault happened. So we store the value of EAX and EDX into thread local global variables. And we put the address that cause the fault into EAX and the old EIP value into EDX. Now our handler have everything needed to generate a stack trace and react according to the faulting address.

Our userspace handler cannot be a regular function, that would be too easy

Our userspace handler will not be called like a regular function. The program will jump start execution its instruction directly into the context of the faulting code, except EAX, EDX ans EIP. We need to manipulate the stack to simulate a function call and save all this state before doing anything.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

void sigsegv_userspace_handler ( ) {

asm {

naked ;



push EDX ; // return address (original EIP).

push EBP ; // old EBP

mov EBP , ESP ;



pushf ; // Save flags.

push ECX ; // ECX is a trash register and must be preserved as local variable.



// Parameter address is already set as EAX.

call sigsegv_userspace_process ;



// Restore register values and return.

call restore_registers ;



pop ECX ;

popf ; // Restore flags.



// Return

pop EBP ;

ret ;

}

}



// The return value is stored in EAX and EDX, so this function restore the correct value for theses registers.

REG_TYPE [ 2 ] restore_registers ( ) {

return [ saved_EAX , saved_EDX ] ;

}

The first 3 instruction are here to simulate a standard function call : the return address is pushed on the stack, then the base pointer (EBP) for the previous function, and finally, the base pointer is modified to reflect the new state of the stack.

Then ECX and flags are stored on he stack. On x86, EAX, ECX and EDX are trash register. It means that a function isn’t required to preserve their content. So, we need to same them before calling anything. And we call sigsegv_userspace_process, a function that will process the segfault into what ever we want.

In case our function do not throw, the assembly code bellow will restore the state of the CPU and return to the faulting address. This is useful in case we want to play with page protection, but I will not explains details of this in this post.

Now we have a routine that make the segfault appear just like a regular function call.

Wait a minute. And what if EIP is causing the segfault ?

One case isn’t handled by our code. We can decide to call or to jump into a memory that is page protected. And now, what do we do ? First of all, unless you do it manually in assembly code, the jump case can’t appear, so let’s not handle it. After all, if you do assembly by yourself, you are supposed to know what you are doing. But, with function pointers, it is possible to call an invalid piece of memory.

1

2

3

4

5

void function ( ) fun = null ;



void main ( ) {

fun ( ) ;

}

In this case, the CPU will push the return address on the stack, but then nothing happen, because it will try to execute what is at an illegal address in memory. Let’s add this extra case in our sigsegv_userspace_handler.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

void sigsegv_userspace_handler ( ) {

asm {

naked ;



// Handle the stack for an invalid function call (segfault at EIP).

push EBP ;

mov EBP , ESP ;



// We jump directly here if we are in a valid function call case.

push EDX ; // return address (original EIP).

push EBP ; // old EBP

mov EBP , ESP ;



// Same code here, not repeated for brevity.

}

}

And we also need to modify our signal handler, to jump at the right address depending on the case.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

static REG_TYPE saved_EAX , saved_EDX ;



extern ( C )

void handleSignal ( int signum , siginfo_t * info , void * contextPtr ) {

auto context = cast ( ucontext_t * ) contextPtr ;



// Save registers into global thread local, to allow recovery.

saved_EAX = context. uc_mcontext . gregs [ REG_EAX ] ;

saved_EDX = context. uc_mcontext . gregs [ REG_EDX ] ;



// Hijack current context so we call our handler.

auto eip = context. uc_mcontext . gregs [ REG_EIP ] ;

auto addr = cast ( REG_TYPE ) info._sifields._sigfault. si_addr ;

context. uc_mcontext . gregs [ REG_EAX ] = addr ;

context. uc_mcontext . gregs [ REG_EDX ] = eip ;

context. uc_mcontext . gregs [ REG_EIP ] = ( eip != addr ) ? ( cast ( REG_TYPE ) & sigsegv_userspace_handler + 0x03 ) : ( cast ( REG_TYPE ) & sigsegv_userspace_handler ) ;

}

So, if the value of EIP is th same as the segfaulting address, then we are in the case where an illegal address is called. If not, then 0x03 is added to EIP to skip the instructions that handle this case. When I said black magic, I meant it !

Note that no restoring is done for such a case : this is impossible anyway and the only exit solution is to throw.

So we finally get somewhere to handle that segfault !

Yes, and now is is easy.

1

2

3

4

5

6

7

8

void sigsegv_userspace_process ( void * address ) {

// The first page is protected to detect null deference.

if ( ( cast ( size_t ) address ) < MEMORY_RESERVED_FOR_NULL_DEFERENCE ) {

throw new NullPointerError ( ) ;

}



throw new Error ( "SIGSEGV" ) ;

}

What about x86_64 ? What about recovering ?

You have all required data to understand the code, so I’ll just let you read it : sigsegv.d

x86_64 assembly code is different, due to different architecture and calling convention. The example also show how you can recover without throwing by protecting a page, and unprotecting it within the userspace handler.

A pull request have been done to include that into the runtime of D : https://github.com/D-Programming-Language/druntime/pull/187

Special thank to Vladimir Panteleev and FeepingCreature for ideas that produces this code and blog post.