Feb 18, 2016

Lately I’ve been using Jupyter (formerly, IPython) notebooks frequently for reproducible research, and I’ve been wondering how it all works underneath the hood. Furthermore, I’ve needed some custom functionality that IPython doesn’t include by default. Instead of extending IPython, I decided I would take a stab at building my own simple IPython kernel that runs on a remote server where my GPU farm lives. I won’t be worrying about security or concurrency, since I will be the only person with access to the server. The exercise should give you an idea about how server-based coding environments work in Python.

Since this is not a production server, Flask is perfect for our needs. Let’s start with a simple Flask server that does nothing. I’ll include some imports we will need later.

1

2

3

4

5

6

7

8

9

import sys

import traceback

from cStringIO import StringIO

from flask import Flask, jsonify, request



app = Flask(__name__)



if __name__ == "__main__" :

app.run()



Executing Code

There is really only one magical piece to cover here: how does Python take a string of code, execute it, then return the output? Let’s start with the novel approach.

You can execute any Python statement using the exec() command. I’m going to create a Flask endpoint that takes a POST parameter named ‘code’, splits the command by newlines, and runs each command in sequence. Here is what the code looks like.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

import sys

import traceback

from cStringIO import StringIO

from flask import Flask, jsonify, request



app = Flask(__name__)





def kernel () :

code_lns = request.form[ 'code' ].split( '\

' )

for line in code_lns: exec(line)

return 'Success'



if __name__ == "__main__" :

app.run()



Easy enough! You already have a minimal, Python-executing server in 15 lines of code (including unused imports and correct spacing). To test this, I use the POSTMAN client to hit my local server with POST requests.

Send a POST request to http:localhost:5000/ with the POST parameter ‘code’ set to print('hello world') like the picture below and hit ‘Send’. As expected, the server reads, the code, prints out ‘Hello world’, then exits.

Redirecting Output

This isn’t very useful to us yet — although the server successfully receives and executes the code, the client only receives a “Success” message. Ideally, we would want to redirect the output from the program executing back to the client. To achieve this, we must capture what is being written to standard out buffer into a string buffer and return this string to the client. After some research, I determined this could be done by temporarily redirecting standard out to a StringIO buffer, like so:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18



def kernel () :

code_lns = request.form[ 'code' ].split( '\

' )





old_stdout = sys.stdout





sys.stdout = strstdout = StringIO()



for line in code_lns:

exec(line)





sys.stdout = old_stdout





return strstdout.getvalue()



Looking at the output from the Postman Client, we can see that the server is now relaying back the stdout to the client as expected.







Note: Redirecting standard out in this way will redirect the output for all clients connecting. Thus, if you have multiple people running code at the exact same time, the outputs will overlap. Don’t do this. That’s why I noted this was not a production ready server.

Different Environments

There is another major problem in our implementation — everything is executed in the same environment. One of the nice things about IPython is that you can work in several different notebooks at the same time, and none of the variables or functionality overlap. This concept does not exist in our design: if I’m working on two different ideas at the same time, all of the variables between the two scripts would be shared.

The problem lies in the exec() command, which I mentioned was the novel approach earlier. Remember that in Python, everything in the environment (technically a namespace in Python) is just stored as a dict in the __dict__ field (see this post for more information). We can execute code in different environments by doing something like this:

1

2

3

env = {}

code = compile( 'j = 1' , '<string>' , 'exec' )

exec code in env



After these code snippet has executed, env['j'] would have a value of 1 stored. Furthermore, any variable in env is able to be used in our code. We can take advantage of this technique to run code in multiple different environments.

First, let’s introduce some boilerplate functionality for creating, deleting, and getting information about a new environments variable (a dict of dicts containing all of the environments for a given environment id).

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

environments = {}





def create () :

env_id = request.form[ 'id' ]

if env_id not in environments:

environments[env_id] = {}

return jsonify(envs=environments.keys())





def delete () :

env_id = request.form[ 'id' ]

if env_id in environments:

del environments[env_id]

return jsonify(envs=environments.keys())





def getenv () :

env_id = request.form[ 'id' ]

if env_id in environments:

return jsonify(env=environments[env_id].keys())

else :

return jsonify(error= 'Environment does not exist!' )



Now, if I send a POST request to http://localhost:5000/env/create with the POST parameters set to {id: 1} , the server creates a blank dictionary for the environment id and sends me back all environments that have been created. Similarly, I could delete environments or get all available information in the environment.

Hooking this up with our code execution is pretty simple as well.

1

2

3

4

5

6

7

8

9

10

11

12

13



def kernel () :

env_id = request.form[ 'id' ]

if env_id not in environments:

return jsonify(error= 'Kernel does not exist!' )

code_lns = request.form[ 'code' ].split( '\

' )

old_stdout = sys.stdout

sys.stdout = strstdout = StringIO()

for line in code_lns:

code = compile(line, '<string>' , 'exec' )

exec code in environments[env_id]

sys.stdout = old_stdout

return jsonify(message=strstdout.getvalue())



Note that now, I have taken care to execute each code statement in the environment id provided.

Error Handling

There is one last, glaringly obvious bug in our code: our design fails miserably when an error occurs. If you had mistyped anything so far in the tutorial, such as sending prnt('hi') to the server, you would have received a solemn 500 error with no extra information from our server. Ideally, we would much rather receive the stack trace on the client side than a response that is so opaque!

Adding error handling to our server is as simple as catching errors and printing the stack trace to standard out. We can get the stacktrace by calling traceback.format_exc() . Since I like to make it blatantly obvious that an error has occurred, I watch for an error to occur, then send back the stacktrace under the ‘error’ key.

We can modify our kernel method slightly to get the functionality we require.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19



def kernel () :

error = False

env_id = request.form[ 'id' ]

if env_id not in environments:

return jsonify(error= 'Kernel does not exist!' )

code_lns = request.form[ 'code' ].split( '\

' )

old_stdout = sys.stdout

sys.stdout = strstdout = StringIO()

for line in code_lns:

try :

code = compile(line, '<string>' , 'exec' )

exec code in environments[env_id]

except :

print(traceback.format_exc())

error = True

sys.stdout = old_stdout

if error: return jsonify(error=strstdout.getvalue())

else : return jsonify(message=strstdout.getvalue())



Final Thoughts

All in all, this code gets us a long way towards creating our own IPython-like server. Writing up a simple frontend to interact back and forth with the JSON-based server is outside the scope of what I was trying to do here, but it certainly isn’t hard.

As for the issues with concurrency and security, many of these could be resolved by the use of Docker containers, which allow sandboxing and could be spun up or broken down as clients connect. This sandboxing would also fix the standard out redirection issue.

Below is the final code. 52 lines of code for a fully functioning, elegant, session-based Python kernel is not too shabby if I do say so myself. Please let me know if you have any other ideas on how to simplify/improve the code.