Python and Chunked transfer encoding

Update

Problem solved in upcoming uWSGI release, yay!

Original

The best way to tell my story is to tell you what my day looked like. I recently implemented a RPC handler in Flask — today was the day we were finally deploying it to production. Everything worked great in development environment, so I was mostly looking forward to it being out there, I was proud.

Then sysadmin showed me an error in the log that I haven’t seen before.

RuntimeError: Parser error: < XML_ERR_DOCUMENT_END >

Panic. SOMETHING IS WRONG. WHY IS THIS ALWAYS HAPPENING TO ME! Yes, the RPC we use is a bit different from classic XML-RPC, but should be compatible, so I expected things to be very similiar. Well they weren’t. At first I suspected the caller that they have something different in production config, as it wasn’t our call of the RPC that caused this.

request.data

After few hours of tring out different things, I learned that flask.request.data was empty, because the content-type wasn’t known to it. How did this not happen in dev? Well the library calling it is smart and remembers from previous response if it the server can accept this content-type - it might have happened, but we didn’t see the error and the library adjusted, leaving us thinking it was working.

Simple fix right? Flask puts the raw body into request.stream a file-like object — surely we can read from a file right? WRONG!

Content-Length

Fast forward about 4–5 hours of me asking all the people for help and always coming out with request.stream empty we found out that werkzeug (a layer below Flask) has this cute thing in it:

if content_length is None:

return safe_fallback and _empty_stream or stream

Great. So our libary is not sending Content-Length header with the request right? Let me just file an issue… Nope, it was pointed out to me that there’s this wizardry chunked transfer encoding going on and that it doesn’t use Content-Length, as it’s sending the body chunk by chunk and that there has to be another way to do this. Easy for them to say this right? Their handler for aiohttp was working perfectly.

wsgi.input_terminated

If I was paying more attention I’d notice the line above the return empty stream one:

if environ.get('wsgi.input_terminated'):

return stream

But things aren’t that simple with me, I had to google for quite a while, find a proposal for this to be implemented in werkzeug, then I facepalmed. Easy then! Let’s just set that to True and get the stream. Except that all the stream is is environ[‘wsgi.input’] from which I’ve already tried reading and came up short-handed. I had to find something that supports input_terminated.

I’ll save you the search. Here’s a funny twitter thread that’s not that old (21st July) by Armin Ronacher (Creator of Flask), David Lord (maintainer of Flask) and Graham Dumpleton. If you don’t want to read it, here’s tl;dr:

Never became adopted

So how did I deal with this? I decided to go around Flask and Werkzeug, locked myself into uWSGI (until I figure out how to do it others) and just used uwsgi.chunked_read() to retrieve the request body. But I have the feeling that something that’s a part of HTTP1.1 specification should not be this problematic. I am still not exactly which technology is to blame, but let’s deal with this, shall we? Tell me how I can help and let’s do this.