2012-09-27

A few months ago, I announced pathod, a pathological HTTP daemon. The project started as a testing tool to let me craft standards-violating HTTP responses while working on mitmproxy. It soon became a free-standing project, and has turned out to be incredibly useful in security testing, exploit delivery and general creative mischief. In the last release, I added pathoc - pathod's malicious client-side twin. It does for HTTP requests what pathod does for HTTP responses, and uses the same hyper-terse specification language.

In this post, I show how pathoc can be used as a very simple fuzzer, by finding issues in a number of major pure-Python webservers. None of the tested servers failed catastrophically - they all caught the unexpected exception and continued serving requests. None the less, I think it's reasonable to say that we've triggered a bug if a) the server returns an 500 Internal Server Error response or terminates the connection abnormally, and b) we see a traceback in our logs. In fact, by this definition, I found bugs in every pure-Python server I tested.

All of the problems I list below are simple failures of validation - what they have in common is that somewhere in the project code is called with input that it doesn't expect and can't handle. This matters - in fact, I'd argue that the majority of security problems fall in this category. It's interesting to ponder why this type of issue is so ubiquitous in Python servers. I have no doubt that part the answer lies in Python's use of exceptions - errors that would be explicit in other languages can be implicit in Python, and code that seems clean and intuitive might in fact be buggy. I think this is especially relevant right now, given the recent flurry of discussion surrounding the Go language and its error handling. It's pretty instructive to read Russ Cox's recent riposte to this post criticizing Go's explicit approach, while looking at the bugs below. I love Python and I think it's a fine language, but I also think the designers of Go probably made the right choice.

Basic fuzzing with pathoc

My methodology for these tests was very simple indeed. I launched each server in turn, and used pathod to fire corrupted GET requests at the daemon until I saw an error. I then looked at the logs, and boiled the distinct cases down to a minimal pathoc specification by hand. This exercises a rather shallow set of features in the server software - mostly parsing of the HTTP lead-in and request headers. It's possible to give software a much, much deeper workout with pathoc, but I'll leave that for a future post.

My pathoc fuzzing command looked something like this:

pathoc -n 1000 -p 8080 -t 1 localhost ' get:/:b@10:ir,"\x00" '

The most important flags here are -n, which tells pathoc to make 1000 consecutive requests, and -t, which tells pathoc to time out after one second (necessary to prevent hangs when daemons terminate improperly). The request specification itself breaks down as follows:

get Issue a GET request / ... to the path / b@10 ... with a body consisting of 10 random bytes ir,"\x00" ... and inject a NULL byte at a random location.

It's that last clause - the random injection - that makes the difference between simply crafting requests and basic fuzzing. Every time a new request is issued, the injection occurs at a different location. I varied the injected character between a NULL byte, a carriage return and a random alphabet letter. Each exposed different errors in different servers. For a complete description of the specification language, see the online docs.

Results

For each bug, I've given a traceback and a minimal pathoc call to trigger the issue. The tracebacks have been edited lightly to shorten file paths and remove irrelevances like timestamps.

CherryPy

pathoc -p 8080 localhost ' get:/:b@10:h"Content-Length"="x" '

ENGINE ValueError("invalid literal for int() with base 10: 'x'",) Traceback (most recent call last): File "cherrypy/wsgiserver/wsgiserver2.py", line 1292, in communicate req.parse_request() File "cherrypy/wsgiserver/wsgiserver2.py", line 591, in parse_request success = self.read_request_headers() File "cherrypy/wsgiserver/wsgiserver2.py", line 711, in read_request_headers if mrbs and int(self.inheaders.get("Content-Length", 0)) > mrbs: ValueError: invalid literal for int() with base 10: 'x'

pathoc -p 8080 localhost ' get:/:i4,"\r"

ENGINE TypeError("argument of type 'NoneType' is not iterable",) Traceback (most recent call last): File "cherrypy/wsgiserver/wsgiserver2.py", line 1292, in communicate req.parse_request() File "cherrypy/wsgiserver/wsgiserver2.py", line 580, in parse_request success = self.read_request_line() File "cherrypy/wsgiserver/wsgiserver2.py", line 644, in read_request_line if NUMBER_SIGN in path: TypeError: argument of type 'NoneType' is not iterable

Tornado

pathoc -p 8080 localhost ' get:/:b@10:h"Content-Length"="x" '

[E 120927 11:42:26 iostream:307] Uncaught exception, closing connection. Traceback (most recent call last): File "tornado/iostream.py", line 304, in wrapper callback(*args) File "tornado/httpserver.py", line 254, in _on_headers content_length = int(content_length) ValueError: invalid literal for int() with base 10: 'x' [E 120927 11:42:26 ioloop:435] Exception in callback <tornado.stack_context._StackContextWrapper object at 0x1012e28e8> Traceback (most recent call last): File "tornado/ioloop.py", line 421, in _run_callback callback() File "tornado/iostream.py", line 304, in wrapper callback(*args) File "tornado/httpserver.py", line 254, in _on_headers content_length = int(content_length) ValueError: invalid literal for int() with base 10: 'x'

pathoc -p 8080 localhost ' get:/:h"h\r

"="x" '

[E iostream:307] Uncaught exception, closing connection. Traceback (most recent call last): File "tornado/iostream.py", line 304, in wrapper callback(*args) File "tornado/httpserver.py", line 236, in _on_headers headers = httputil.HTTPHeaders.parse(data[eol:]) File "tornado/httputil.py", line 127, in parse h.parse_line(line) File "tornado/httputil.py", line 113, in parse_line name, value = line.split(":", 1) ValueError: need more than 1 value to unpack [E ioloop:435] Exception in callback <tornado.stack_context._StackContextWrapper object at 0x1012bd7e0> Traceback (most recent call last): File "tornado/ioloop.py", line 421, in _run_callback callback() File "tornado/iostream.py", line 304, in wrapper callback(*args) File "tornado/httpserver.py", line 236, in _on_headers headers = httputil.HTTPHeaders.parse(data[eol:]) File "tornado/httputil.py", line 127, in parse h.parse_line(line) File "tornado/httputil.py", line 113, in parse_line name, value = line.split(":", 1) ValueError: need more than 1 value to unpack

Twisted

pathoc -p 8080 localhost ' get:/:b@10:h"Content-Length"="x" '

[HTTPChannel,4,127.0.0.1] Unhandled Error Traceback (most recent call last): File "twisted/python/log.py", line 84, in callWithLogger return callWithContext({"system": lp}, func, *args, **kw) File "twisted/python/log.py", line 69, in callWithContext return context.call({ILogContext: newCtx}, func, *args, **kw) File "twisted/python/context.py", line 118, in callWithContext return self.currentContext().callWithContext(ctx, func, *args, **kw) File "twisted/python/context.py", line 81, in callWithContext return func(*args,**kw) --- <exception caught here> --- File "twisted/internet/selectreactor.py", line 150, in _doReadOrWrite why = getattr(selectable, method)() File "twisted/internet/tcp.py", line 199, in doRead rval = self.protocol.dataReceived(data) File "twisted/protocols/basic.py", line 564, in dataReceived why = self.lineReceived(line) File "twisted/web/http.py", line 1558, in lineReceived self.headerReceived(self.__header) File "twisted/web/http.py", line 1580, in headerReceived self.length = int(data) exceptions.ValueError: invalid literal for int() with base 10: 'x'

SimpleHTTP

pathoc -p 8080 localhost ' get:"/\0" '

Exception happened during processing of request from ('127.0.0.1', 54029) Traceback (most recent call last): File "lib/python2.7/SocketServer.py", line 284, in _handle_request_noblock self.process_request(request, client_address) File "lib/python2.7/SocketServer.py", line 310, in process_request self.finish_request(request, client_address) File "lib/python2.7/SocketServer.py", line 323, in finish_request self.RequestHandlerClass(request, client_address, self) File "lib/python2.7/SocketServer.py", line 638, in __init__ self.handle() File "python2.7/BaseHTTPServer.py", line 340, in handle self.handle_one_request() File "lib/python2.7/BaseHTTPServer.py", line 328, in handle_one_request method() File "lib/python2.7/SimpleHTTPServer.py", line 44, in do_GET f = self.send_head() File "lib/python2.7/SimpleHTTPServer.py", line 68, in send_head if os.path.isdir(path): File "lib/python2.7/genericpath.py", line 41, in isdir st = os.stat(s) TypeError: must be encoded string without NULL bytes, not str

Waitress

pathoc -p 8080 localhost ' get:/:i16," " '

ERROR:waitress:uncaptured python exception, closing channel <waitress.channel.HTTPChannel connected 127.0.0.1:62330 at 0x1007ca310> ( <type 'exceptions.IndexError'>:list index out of range [lib/python2.7/asyncore.py|read|83] [lib/python2.7/asyncore.py|handle_read_event|444] [lib/python2.7/site-packages/waitress/channel.py|handle_read|169] [lib/python2.7/site-packages/waitress/channel.py|received|186] [lib/python2.7/site-packages/waitress/parser.py|received|99] [lib/python2.7/site-packages/waitress/parser.py|parse_header|158] [lib/python2.7/site-packages/waitress/parser.py|get_header_lines|247] )

Edit: The first version of this post had examples that were due to the test WSGI application, not waitress. I've replaced them with the traceback above, which has been reformatted for clarity.

Werkzeug

pathoc -p 8080 localhost ' get:/:h"Host"="n\r\0" '