Three Bytes and a Space

or, Rust Bugs, Non-Compliance, and How I Learned to Love IRC.

I’ve been experimenting with Rust for the past couple of weeks, often learning in my downtime at work. After experimenting a while and getting familiar with the language, I was desperately searching for a project. In my search, I thought of a script we use at my workplace, which for the purposes of this story I’ll refer to as snap.py .

snap.py is a Python script with a very simple use: access a few web pages with tables, examine the table therein, and pull out data from columns in each line. It collects them, logs them, and then outputs to the terminal. This seems simple enough. However, it had a few problems that didn’t really sit well with me:

No true multi-threading, due to Python’s GIL

Memory hogging, due to Object creations for every row

Thrashing on low memory systems

Runs fairly slowly, due to it’s workload

Sitting at my desk, I thought to myself that Rust may be able to fix many, if not all those issues. If it were good enough, it may even catch the eye of my seniors on development teams.

And thus, with a long, uneventful shift of work ahead of me:

cargo new --bin snaprs

The Journey Begins

I figured I’d start off by using reqwest and kuchiki for downloading the data and parsing through it, respectively. My script started off rather simply:

extern crate kuchiki;

extern crate reqwest; use std::io::Read;

use kuchiki::parse_html;

use kuchiki::traits::*;

use reqwest::get; static URL: &’static str = “ http://internal.company.url/table1 "; fn main() {



let mut response = get(URL).unwrap();

let mut body = String::new();

response.read_to_string(&mut body).unwrap(); }

I thought here would be a good place to stop and compile, because if nothing else, there was no way I could mess this up.

thread ‘main’ panicked at ‘called `Result::unwrap()` on an `Err` value: Http(Status)’, src/libcore/result.rs:799

Down the Rabbit Hole

I commented out the lines for body , removed the unwrap() on response , and wrote a quick little match function to output the Error message I was getting. Omitting warning messages:

$ cargo run

Compiling snaprs v0.1.0 (file:///path/to/snaprs)

Finished debug [unoptimized + debuginfo] target(s) in 2.27 secs

Running `target/debug/snaprs`

Invalid Status provided

According to the documentation for reqwest , and by extension hyper , this error message is returned when the HTTP client receives a non-standard HTTP status code. Even by making a proper client using reqwest::Client::new() and even resorting to using hyper , I was still getting the same error.

As sanity checks, I dropped down into ipython , imported requests , and tested the same URL, which received a 200, and downloaded the data without a problem. Loading the URL in my web browser with Developer Tools open, I could see that the page was downloading and rendering, again with a 200. Finally, using curl , I was able to confirm that the Response headers for the same page was showing I was receiving a 200. Only with Rust was I failing out.

Frustrated and flustered, I got on IRC and pleaded in #rust-beginners for help. There was some basic troubleshooting and explaining my situation, which I’ll spare everyone from reading. Most of the troubleshooting was done by me, through the guidance of the chat, since it accessed a web server internal to my company. Eventually, we got around to having me test using strace to see what snaprs was actually receiving. With a little help:

JoshTriplett - bushidoboy: So, the response you got seems completely reasonable; it has a valid status. JoshTriplett - bushidoboy: “HTTP/1.1 200\r

” [Truncated] JoshTriplett - bushidoboy: So, the error from hyper means that it thought it got an invalid status code. JoshTriplett - bushidoboy: That error in turn gets set from httparse. bushidoboy - Yeah, I just don’t know why it looks at a 200 and thinks it’s an error bushidoboy - Like, that’s the literal HTTP textbook definition of success

At the very least, I did feel a little better knowing that I was receiving a 200. Now, all that was left was to figure out why that wasn’t good enough for the client. What was going on? What was the hangup?

JoshTriplett - httparse’s parse_code just reads three literal bytes and expects something in the range 000–999. JoshTriplett - Wait, no… JoshTriplett - Oh. JoshTriplett - Hang on, I think I might have figured it out.

I was at the edge of my seat. I had long been clocked out of my shift, waiting and staying to find out what the underlying issue was to all of this.

We were finally here.

This was it.

JoshTriplett - bushidoboy: “200” is not the textbook definition of success. “200 OK” is. :)

… That was it?

JoshTriplett - bushidoboy: httparse chokes on the lack of a space after 200. JoshTriplett - bushidoboy: curl, on the other hand, puts up with it. JoshTriplett - bushidoboy: You’re not *required* to use the literal string “OK”, but you have to have a space and then a “reason”. JoshTriplett - bushidoboy: https://www.w3.org/Protocols/rfc2616/rfc2616-sec6.html#sec6.1 JoshTriplett - bushidoboy: No provision for leaving out the last SP. * misdreavus *slow clap*

Reading those lines, I mused that this simple failure was actually rather representative of my life up to this point.

But Wait! There’s More!

Though I was still quite upset, I was happy to know that it was no fault of my own that this wasn’t working. But still, it was strange to me that out of the times I’ve used reqwest , this is the first time I’ve come across a non-standard error code. No other site out on the open internet I had tested ever had this problem.

Then it hit me:

bushidoboy - Oh goodness. bushidoboy - You just reminded me bushidoboy - We run a custom server we forked. wyvern - yikes.

Yep. Our internally developed server returns non-compliant HTTP status lines. Two bugs in a single day. One in httparse , and one on my company’s custom webserver. Don’t I feel special.

The denizens of #rust-beginners were also very sympathetic to my plight.

misdreavus - this just keeps getting better and better * misdreavus reaches for the popcorn JoshTriplett - httpopcorn * cmyr : refills popcorn Xion - HTTP POP /corn

Conclusions

Thus ends my tale of my first real bug report for Rust. I have since submitted an issue on the httparse repo, so hopefully this can be fixed soon.

Update Dec. 4, 2016: Sean has pulled in a commit to allow HTTP statuses that omit reason-phrases — huzzah!

I would like to extend my deepest thanks to the folks in #rust-beginners — especially JoshTriplett and misdreavus. The community has been a big help through my entire journey with Rust.

As frustrating as this was, I really did learn a lot from it, and I feel happier going through this than if I hadn’t. There’s much to be done, and much more to learn.