Random Garbage

A friend of mine was asking about how to write a small interpreter so he can define new AI functions that his project can use at runtime. After his face blanked over when I started explaining how to do it in Common Lisp (really, there isn't that much explanation, it is like a fundamental property of Common Lisp to do such things), he hastily mentioned he wanted it in C.

Oh.

So, I hacked together a trivial demonstration program. This program runs a very small interpreter which allows one to compile a C file into a shared object, then load the shared object, and bind the functions inside of the shared object to a structure full of function pointers that you can then invoke manually. It is intended that one writes their varied functions as different C files that they can load and swap out at runtime.

Here is the lame makefile which compiles a program called stuff.c.

# Makefile stuff: stuff.c gcc -Wall -g stuff.c -o stuff -ldl clean: rm -f stuff *.o *.so

Here is stuff.c. This program is set up using a traditional interpreter design. However, it is totally barebones and I don't deal with the interpreter environment in any meaningful way (other than its reification and global nature) since you can't define new variables or functions in the interpreter. Also, the lexical and parsing analysis of the interpreted forms are horriffic at best. This is because doing such things in C is a pain in the ass unless you use flex and bison or are prepared to write a helluva lot more code. However, if I did that, this wouldn't be the simple demonstration that it is.

Note that I chose to perform the linking to the loaded library functions via an explicit indirection with the f structure in the Env structure. I could have just taken the func_name variable in eval_invoke() and simply performed a dlsym() call upon it and called the resulting pointer with the arguments. If I had done that, I could have called ANY function in the loaded library (well, with the same protoype at any rate). It is generally more general (in some respects) to do such a thing. However, I chose the method I did because through the indirection I can associate functions of different C linkage names to the symbols I use to identify them--such as the different names of the default functions in relation to the functions names as defined in the foo/bar.c codes.

A real world example of why the method I chose is useful would be if I wanted to have multiple implementations of C functions with the exact same name loaded at the same time where I could pick and choose between them. In the method I chose, I could additionally associate a namespace (or package name) with a shared object (meaning I'd pair the f and lib_name fields into a 'Package' structure and have a hash table of them in the Env keyed by package name that is specifed when loading the shared object) and use another syntax in the interpreter to state which function I want to call out of which namespace/package. This would be an exercise for the reader to implement.

/* This is stuff.c */ #include <stdio.h> #include <stdlib.h> #include <string.h> #include <dlfcn.h> #define MATCH 0 #define TRUE 1 #define FALSE 0 #define DONE 0 #define NOT_DONE 1 #define BSIZE 1024 /* the type of the functions we care about in the compiled code */ typedef int (*FUNC)(int a, int b); /* The extension of how to map function fromthe shared library to an API to them have very obvious extensions that what I did here. I didn't do any of them. */ typedef struct Funcs_t { FUNC fun1; FUNC fun2; FUNC fun3; } Funcs; /* The global environmental structure */ typedef struct Env_t { Funcs f; void *lib_handle; } Env; /* The invocable functions in the global environment are defaulted to these functions. */ int stub1(int a, int b) { printf("Default stub1(%d, %d): called.

", a, b); return a + b; } int stub2(int a, int b) { printf("Default stub2(%d, %d): called.

", a, b); return a + b; } int stub3(int a, int b) { printf("Default stub3(%d, %d): called.

", a, b); return a + b; } char *prompt_input(char *buf, int size, FILE *fin) { printf("> "); fflush(NULL); return fgets(buf, size, fin); } int eval_help(char *args, Env *e) { printf( "Help:

" " help This help message

" " quit Quits the program

" " compile <file> Produces shared library of C file, don't provide the .c extension

" " load <file> Loads library named NAME by loading libNAME.so

" ); return NOT_DONE; } int eval_quit(char *args, Env *e) { printf("Quitting!

"); return DONE; } /* compile a source file (without the .c extension) and create a shared object we can load later. The error checking and reporting in this function is criminally bad. */ int eval_compile(char *args, Env *e) { char cmd[BSIZE], file[BSIZE], buf[BSIZE]; int ret; if (sscanf(args, "%s %s", cmd, file) != 2) { printf("eval_compile: bad arity!

"); return NOT_DONE; } /* construct and execute the compilaiton command. I hope everything is in your path. */ sprintf(buf, "gcc -Wall -DPIC -fpic -c %s.c", file); ret = system(buf); if (ret != 0) { printf("Sorry, an error happened during compilation.

"); return NOT_DONE; } else { printf("Compile [%s.c]: OK

", file); } /* Now produce the shared object */ sprintf(buf, "gcc -shared -Wl,-soname,lib%s.so.1 %s.o -lc -o lib%s.so", file, file, file); ret = system(buf); if (ret != 0) { printf("Sorry, an error happened during shared library generation.

"); } else { printf("Library generation [lib%s.so]: OK

", file); } return NOT_DONE; } /* We only allow you to invoke the functions in the Env structure. You denote the names by "fun1" "fun2" and "fun3". This is a bare skeleton of how to do such things since I don't even create a symbol table for the mapping of the interpreter function symbol to actual C functions. */ int eval_invoke(char *cmd, Env *e) { char buf[BSIZE], func_name[BSIZE]; int arg0, arg1; int ret; if (sscanf(cmd, "%s %s %d %d", buf, func_name, &arg0, &arg1) != 4) { printf("eval_invoke: bad arity!

"); return NOT_DONE; } /* now execute the function we wanted to run with the arguments. */ if (strncmp("fun1", func_name, 4) == MATCH) { printf("[Invoking function fun1...]

"); ret = (e->f.fun1)(arg0, arg1); printf("[Result] %d

", ret); } else if (strncmp("fun2", func_name, 4) == MATCH) { printf("[Invoking function fun2...]

"); ret = (e->f.fun2)(arg0, arg1); printf("[Result] %d

", ret); } else if (strncmp("fun3", func_name, 4) == MATCH) { printf("[Invoking function fun3...]

"); ret = (e->f.fun3)(arg0, arg1); printf("[Result] %d

", ret); } else { printf("I'm sorry, there is no function to invoke by that name.

"); } return NOT_DONE; } int eval_load(char *cmd, Env *e) { void *new_lib = NULL; char buf[BSIZE], lib_name[BSIZE]; char name[BSIZE]; if (sscanf(cmd, "%s %s", buf, lib_name) != 2) { printf("eval_load: bad arity!

"); return NOT_DONE; } sprintf(name, "./lib%s.so", lib_name); new_lib = dlopen(name, RTLD_NOW | RTLD_LOCAL); if (new_lib == NULL) { printf("Failed to load library: %s

", name); return NOT_DONE; } /* close any previous one */ if (e->lib_handle != NULL) { dlclose(e->lib_handle); } /* keep a reference to the new one */ e->lib_handle = new_lib; /* "link" the functions in the Env to the ones we just loaded */ e->f.fun1 = dlsym(e->lib_handle, "fun1"); if (e->f.fun1 == NULL) { printf("Warning, unable to resolve fun1() from library %s, " "assuming initial stub1().

", name); e->f.fun1 = stub1; } e->f.fun2 = dlsym(e->lib_handle, "fun2"); if (e->f.fun2 == NULL) { printf("Warning, unable to resolve fun2() from library %s, " "assuming initial stub2().

", name); e->f.fun2 = stub2; } e->f.fun3 = dlsym(e->lib_handle, "fun3"); if (e->f.fun3 == NULL) { printf("Warning, unable to resolve fun3() from library %s, " "assuming initial stub3().

", name); e->f.fun3 = stub3; } printf("Functions Linked!

"); return NOT_DONE; } /* The basic structure of the interpreter */ int eval_command(char *cmd, Env *e) { printf("Evaluating command: '%s'

", cmd); /* check to see what I have and run the appropriate handler */ if (strncmp("help", cmd, 4) == MATCH) { return eval_help(cmd, e); } if (strncmp("quit", cmd, 4) == MATCH) { return eval_quit(cmd, e); } if (strncmp("compile", cmd, 7) == MATCH) { return eval_compile(cmd, e); } if (strncmp("invoke", cmd, 6) == MATCH) { return eval_invoke(cmd, e); } if (strncmp("load", cmd, 4) == MATCH) { return eval_load(cmd, e); } printf("Sorry, I don't know how to do that command.

"); return NOT_DONE; } int main(void) { char buf[BSIZE]; int done = NOT_DONE; char *ret = NULL; char *nl = NULL; Env e; /* set up defaults */ e.f.fun1 = stub1; e.f.fun2 = stub2; e.f.fun3 = stub3; e.lib_handle = NULL; /* run the read/eval/print loop until done */ printf("Welcome to a simple demonstration interpreter.

"); eval_help(NULL, &e); ret = prompt_input(buf, BSIZE, stdin); while(ret != NULL && done == NOT_DONE) { /* I'm not doing any real whitespace trimming, so be VERY careful */ /* get rid of newline */ nl = strstr(buf, "

"); if (nl != NULL) { *nl = '\0'; } done = eval_command(buf, &e); if (done == NOT_DONE) { ret = prompt_input(buf, BSIZE, stdin); } } /* Clean up, if any */ if (e.lib_handle != NULL) { dlclose(e.lib_handle); e.lib_handle = NULL; } return 0; }

Now, here is the first file that we'll be using as a replacement for the stub functions. This file (and bar.c below) must be in the current working directory when you start the stuff program.

/* This is foo.c */ #include <stdio.h> #include <stdlib.h> int fun1(int a, int b) { printf("This is foo.c:fun1()

"); fflush(NULL); return a + b; } int fun2(int a, int b) { printf("This is foo.c:fun2()

"); fflush(NULL); return a + b; } int fun3(int a, int b) { printf("This is foo.c:fun3()

"); fflush(NULL); return a + b; }

And here is bar.c, another definition of the above functions.

/* This is bar.c */ #include <stdio.h> #include <stdlib.h> int fun1(int a, int b) { printf("This is bar.c:fun1()

"); fflush(NULL); return a + b; } int fun2(int a, int b) { printf("This is bar.c:fun2()

"); fflush(NULL); return a + b; } int fun3(int a, int b) { printf("This is bar.c:fun3()

"); fflush(NULL); return a + b; }

Now that we have everything defined, here is an interaction with the program. Notice the compilation of the above C files happens by us asking to compile them in the interpreter. Also notice how the output of the functions "fun1", "fun2", and "fun3" change away from the default to what is defined in each separate C file.

Linux black > ./stuff Welcome to a simple demonstration interpreter. Help: help This help message quit Quits the program compile <file> Produces shared library of C file, don't provide .c load <file> Loads library named NAME by loading libNAME.so

" > invoke fun1 10 10 Evaluating command: 'invoke fun1 10 10' [Invoking function fun1...] Default stub1(10, 10): called. [Result] 20 > invoke fun2 10 10 Evaluating command: 'invoke fun2 10 10' [Invoking function fun2...] Default stub2(10, 10): called. [Result] 20 > invoke fun3 10 10 Evaluating command: 'invoke fun3 10 10' [Invoking function fun3...] Default stub3(10, 10): called. [Result] 20 > compile foo Evaluating command: 'compile foo' Compile [foo.c]: OK Library generation [libfoo.so]: OK > compile bar Evaluating command: 'compile bar' Compile [bar.c]: OK Library generation [libbar.so]: OK > load foo Evaluating command: 'load foo' Functions Linked! > invoke fun1 10 10 Evaluating command: 'invoke fun1 10 10' [Invoking function fun1...] This is foo.c:fun1() [Result] 20 > invoke fun2 10 10 Evaluating command: 'invoke fun2 10 10' [Invoking function fun2...] This is foo.c:fun2() [Result] 20 > invoke fun3 10 10 Evaluating command: 'invoke fun3 10 10' [Invoking function fun3...] This is foo.c:fun3() [Result] 20 > load bar Evaluating command: 'load bar' Functions Linked! > invoke fun1 10 10 Evaluating command: 'invoke fun1 10 10' [Invoking function fun1...] This is bar.c:fun1() [Result] 20 > invoke fun2 10 10 Evaluating command: 'invoke fun2 10 10' [Invoking function fun2...] This is bar.c:fun2() [Result] 20 > invoke fun3 10 10 Evaluating command: 'invoke fun3 10 10' [Invoking function fun3...] This is bar.c:fun3() [Result] 20 > quit Evaluating command: 'quit' Quitting!

Enhancement of the interpreter would go in the direction of allowing all of the functions in the shared object to be discovered and shoved into a symbol table stored in the Env environment so they can be called. In addition, the arguments of the functions would be more flexibly defined so you can pass other data types to them or define them to have different arities. There is definitely more that can be done.

End of Line.