DragonFFI: FFI/JIT for the C language using Clang/LLVM



Adrien Guinet

#Clang , #llvm

15 minutes read

Introduction

import pydffi

CU = pydffi . FFI () . cdef ( "int puts(const char* s);" );

CU . funcs . puts ( "hello world!" )



import pydffi

pydffi . dlopen ( "/path/to/libarchive.so" )

CU = pydffi . FFI () . cdef ( "#include <archive.h>" )

a = funcs . archive_read_new ()

assert a

...



$ pip install pydffi



Related work

libffi

cffi

PyCParser

libffi does not support the Microsoft x64 ABI under Linux x64. It isn't that trivial to add a new ABI (hand-written ABI, get the ABI right, ...), while a lot of effort have already been put into compilers to get these ABIs right.

does not support the Microsoft x64 ABI under Linux x64. It isn't that trivial to add a new ABI (hand-written ABI, get the ABI right, ...), while a lot of effort have already been put into compilers to get these ABIs right. PyCParser only supports a very limited subset of C (no includes, function attributes, ...).

lldb

eval()

lldb

it uses Clang to parse header files, allowing direct usage of a C library headers without adaptation;

it support as many calling conventions and function attributes as Clang/LLVM do;

as a bonus, Clang and LLVM allows on-the-fly compilation of C functions, without relying on the presence of a compiler on the system (you still need the headers of the system's libc thought, or MSVCRT headers under Windows);

and this is a good way to have fun with Clang and LLVM! :)

Creating an FFI library for C

Supporting C ABIs

typedef struct {

short a ;

int b ;

} A ;



void print_A ( A s ) {

printf ( "%d %d

" , s . a , s . b );

}







target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

target triple = "x86_64-pc-linux-gnu"



@.str = private unnamed_addr constant [ 7 x i8 ] c "%d %d\0A\00" , align 1



define void @print_A ( i64 ) local_unnamed_addr {

%2 = trunc i64 %0 to i32

%3 = lshr i64 %0 , 32

%4 = trunc i64 %3 to i32

%5 = shl i32 %2 , 16

%6 = ashr exact i32 %5 , 16

%7 = tail call i32 ( i8 *, ...) @printf ( i8 * getelementptr inbounds ([ 7 x i8 ], [ 7 x i8 ]* @.str , i64 0 , i64 0 ), i32 %6 , i32 %4 )

ret void

}



llvm::ArrayRef

Finding the right type abstraction

the function types, and their calling convention

for structures: field offsets and names

for union/enums: field names (and values)

[ ... ]

| -RecordDecl 0x5561d7f9fc20 <a.c:1:9, line:4:1> line:1:9 struct definition

| | -FieldDecl 0x5561d7ff4750 <line:2:3, col:9> col:9 referenced a 'short'

| ` -FieldDecl 0x5561d7ff47b0 <line:3:3, col:7> col:7 referenced b 'int'



target triple = "x86_64-pc-linux-gnu"

%struct.A = type { i16 , i32 }

@.str = private unnamed_addr constant [ 7 x i8 ] c "%d %d\0A\00" , align 1



define void @print_A ( i64 ) local_unnamed_addr !dbg !7 {

%2 = trunc i64 %0 to i32

%3 = lshr i64 %0 , 32

%4 = trunc i64 %3 to i32

tail call void @llvm.dbg.value ( metadata i32 %4 , i64 0 , metadata !18 , metadata !19 ), !dbg !20

tail call void @llvm.dbg.declare ( metadata %struct.A * undef , metadata !18 , metadata !21 ), !dbg !20

%5 = shl i32 %2 , 16 , !dbg !22

%6 = ashr exact i32 %5 , 16 , !dbg !22

%7 = tail call i32 ( i8 *, ...) @printf ( i8 * getelementptr inbounds ([...] @.str , i64 0 , i64 0 ), i32 %6 , i32 %4 ), !dbg !23

ret void , !dbg !24

}



[...]

!7 = distinct !DISubprogram ( name: "print_A" , scope: !1 , file: !1 , line: 6 , type: !8 , [...] , variables: !17 )

!8 = !DISubroutineType ( types: !9 )

!9 = !{ null , !10 }

!10 = !DIDerivedType ( tag: DW_TAG_typedef , name: "A" , file: !1 , line: 4 , baseType: !11 )

!11 = distinct !DICompositeType ( tag: DW_TAG_structure_ type , file: !1 , line: 1 , size: 64 , elements: !12 )

!12 = !{ !13 , !15 }

!13 = !DIDerivedType ( tag: DW_TAG_member , name: "a" , scope: !11 , file: !1 , line: 2 , baseType: !14 , size: 16 )

!14 = !DIBasicType ( name: "short" , size: 16 , encoding: DW_ATE_signed )

!15 = !DIDerivedType ( tag: DW_TAG_member , name: "b" , scope: !11 , file: !1 , line: 3 , baseType: !16 , size: 32 , offset: 32 )

!16 = !DIBasicType ( name: "int" , size: 32 , encoding: DW_ATE_signed )

[...]



Internals

create a type system that gathers only the necessary informations from the metadata tree (we don't need the whole debug informations)

make the public headers of the DragonFFI library free from any LLVM headers (so that the whole LLVM headers aren't needed to use the library)

DFFI FFI ([...]);



CompilationUnit CU = FFI . cdef ( "int puts(const char* s);" , [...]);

NativeFunc F = CU . getFunction ( "puts" );

const char * s = "hello world!" ;

void * Args [] = { & s };

int Ret ;

F . call ( & Ret , Args );



void*

void*

void*

puts





void call_puts ( void * Ret , void ** Args ) {

* (( int * ) Ret ) = puts (( const char * ) Args [ 0 ]);

}



typedef void ( * puts_call_ty )( void * , void ** );

puts_call_ty Wrapper =

Wrapper ( Ret , Args );







libffi

puts

void __dffi_wrapper_0 ( int32_t ( __attribute__ (( cdecl )) * __FPtr )( char * ), int32_t * __Ret , void ** __Args ) {

* __Ret = ( __FPtr )( * (( char ** ) __Args [ 0 ]));

}







DFFI::cdef

DFFI::compile

CompilationUnit::getFunction

Issues with Clang

DFFI::cdef

-g -femit-all-decls

typedef struct {

short a ;

int b ;

} A ;



void print_A ( A s );



$ clang -S -emit-llvm -g -femit-all-decls -o - a.c | grep print_A | wc -l

0



print_A





void __dffi_force_decl_print_A ( A s ) { }



__dffi_force_decl_print_A

DFFI::compile

Python bindings

Project status

user and developer documentations exist!

another foreign language is supported (JS? Ruby?)

the DragonFFI main library API is considered stable

a non negligible list of tests have been added

all the things in the TODO file have been done :)

Various ideas for the future

Parse embedded DWARF information

libarchive

Lightweight debug info?

on libarchive 3.3.2, DWARF goes from 1.8Mb to 536Kb, for an original binary code size of 735Kb

on zlib 1.2.11, DWARF goes from 162Kb to 61Kb, for an original binary code size of 99Kb

debug informations are well supported on every platform nowadays: tools exist to parse them, embed/extract them from binary, and so on

we already got DWARD and PDB: https://xkcd.com/927/

libffi

libffi

JIT code from the final language (like Python) to native function code

arguments are converted from Python to C according to the function type

the function pointer and wrapper and gathered from DragonFFI

the final call is made

void*

Reducing DragonFFI library size

compile DragonFFI, Clang and LLVM using (Thin) LTO, with visibility hidden for both Clang and LLVM. This could have the effect of removing code from Clang/LLVM that isn't used by DragonFFI.

make DragonFFI more modular: - one core module that only have the parts from CodeGen that deals with ABIs. If the types and function prototypes are defined "by hand" (without DFFI::cdef ), that's more or less the only part that is needed (with LLVM obviously) - one optional module that includes the full clang compiler (to provide the DFFI::cdef and DFFI::compile APIs)

libffi

cdef

compile

Conclusion

#dragonffi

Acknowledgments