Librsvg gets Rusty

I've been wanting to learn Rust for some time. It has frustrated me for a number of years that it is quite possible to write GNOME applications in high-level languages, but for the libraries that everything else uses ("the GNOME platform"), we are pretty much stuck with C. Vala is a very nice effort, but to me it never seemed to catch much momentum outside of GNOME.

After reading this presentation called "Rust out your C", I got excited. It *is* possible to port C code to Rust, small bits at a time! You rewrite some functions in Rust, make them linkable to the C code, and keep calling them from C as usual. The contortions you need to do to make C types accessible from Rust are no worse than for any other language.

I'm going to use librsvg as a testbed for this.

Librsvg is an old library. It started as an experiment to write a SAX-based parser for SVG ("don't load the whole DOM into memory; instead, stream in the XML and parse it as we go"), and a renderer with the old libart (what we used in GNOME for 2D vector rendering before Cairo came along). Later it got ported to Cairo, and that's the version that we use now.

Outside of GNOME, librsvg gets used at Wikimedia to render the SVGs all over Wikipedia. We have gotten excellent bug reports from them!

Librsvg has a bunch of little parsers for the mini-languages inside SVG's XML attributes. For example, within a vector path definition, "M10,50 h20 V10 Z" means, "move to the coordinate (10, 50), draw a horizontal line 20 pixels to the right, then a vertical line to absolute coordinate 10, then close the path with another line". There are state machines, like the one that transforms that path definition into three line segments instead of the PostScript-like instructions that Cairo understands. There are some pixel-crunching functions, like Gaussian blurs and convolutions for SVG filters.

It should be quite possible to port those parts of librsvg to Rust, and to preserve the C API for general consumption.

Every once in a while someone discovers a bug in librsvg that makes it all the way to a CVE security advisory, and it's all due to using C. We've gotten double free()s, wrong casts, and out-of-bounds memory accesses. Recently someone did fuzz-testing with some really pathological SVGs, and found interesting explosions in the library. That's the kind of 1970s bullshit that Rust prevents.

I also hope that this will make it easier to actually write unit tests for librsvg. Currently we have some pretty nifty black-box tests for the whole library, which essentially take in complete SVG files, render them, and compare the results to a reference image. These are great for smoke testing and guarding against regressions. However, all the fine-grained machinery in librsvg has zero tests. It is always a pain in the ass to make static C functions testable "from the outside", or to make mock objects to provide them with the kind of environment they expect.

So, on to Rustification!

I've started with a bit of the code from librsvg that is fresh in my head: the state machine that renders SVG markers.

SVG markers

This image with markers comes from the official SVG test suite:

SVG markers let you put symbols along the nodes of a path. You can use them to draw arrows (arrowhead as an end marker on a line), points in a chart, and other visual effects.

In the example image above, this is what is happening. The SVG defines four marker types:

A purple square that always stays upright.

A green circle.

A blue triangle that always stays upright.

A blue triangle whose orientation depends on the node where it sits.

The top row, with the purple squares, is a path (the black line) that says, "put the purple-square marker on all my nodes".

The middle row is a similar path, but it says, "put the purple-square marker on my first node, the green-circle marker on my middle nodes, and the blue-upright-triangle marker on my end node".

The bottom row has the blue-orientable-triangle marker on all the nodes. The triangle is defined to point to the right (look at the bottommost triangles!). It gets rotated 45 degrees at the middle node, and 90 degrees so it points up at the top-left node.

This was all fine and dandy, until one day we got a bug about incorrect rendering when there are funny paths paths. What makes a path funny?

For the code that renders markers, a path is not in the "easy" case when it is not obvious how to compute the orientation of nodes. A node's orientation, when it is well-behaved, is just the average angle of the node's incoming and outgoing lines (or curves). But if a path has contiguous coincident vertices, or stray points that don't have incoming/outgoing lines (imagine a sequence of moveto commands), or curveto commands with Bézier control points that are coincident with the nodes... well, in those cases, librsvg has to follow the spec to the letter, for it says how to handle those things.

In short, one has to walk the segments away from the node in question, until one finds a segment whose "directionality" can be computed: a segment that is an actual line or curve, not a coincident vertex nor a stray point.

Librsvg's algorithm has two parts to it. The first part takes the linear sequence of PostScript-like commands (moveto, lineto, curveto, closepath) and turns them into a sequence of segments. Each segment has two endpoints and two tangent directions at those endpoints; if the segment is a line, the tangents point in the same direction as the line. Or, the segment can be degenerate and it is just a single point.

The second part of the algorithm takes that list of segments for each node, and it does the walking-back-and-forth as described in the SVG spec. Basically, it finds the first non-degenerate segment on each side of a node, and uses the tangents of those segments to find the average orientation of the node.

The path-to-segments code

In the C code I had this:

typedef struct { gboolean is_degenerate; /* If true, only (p1x, p1y) are valid. If false, all are valid */ double p1x, p1y; double p2x, p2y; double p3x, p3y; double p4x, p4y; } Segment;

P1 and P4 are the endpoints of each Segment; P2 and P3 are, like in a Bézier curve, the control points from which the tangents can be computed.

This translates readily to Rust:

struct Segment { is_degenerate: bool, /* If true, only (p1x, p1y) are valid. If false, all are valid */ p1x: f64, p1y: f64, p2x: f64, p2y: f64, p3x: f64, p3y: f64, p4x: f64, p4y: f64 }

Then a little utility function:

/* In C */ #define EPSILON 1e-10 #define DOUBLE_EQUALS(a, b) (fabs ((a) - (b)) < EPSILON) /* In Rust */ const EPSILON: f64 = 1e-10; fn double_equals (a: f64, b: f64) -> bool { (a - b).abs () < EPSILON }

And now, the actual code that transforms a cairo_path_t (a list of moveto/lineto/curveto commands) into a list of segments. I'll interleave C and Rust code with commentary.

/* In C */ typedef enum { SEGMENT_START, SEGMENT_END, } SegmentState; static void path_to_segments (const cairo_path_t *path, Segment **out_segments, int *num_segments) { /* In Rust */ enum SegmentState { Start, End } fn path_to_segments (path: cairo::Path) -> Vec<Segment> {

The enum is pretty much the same; Rust prefers CamelCase for enums instead of CAPITALIZED_SNAKE_CASE. The function prototype is much nicer in Rust. The cairo::Path is courtesy of gtk-rs, the budding Rust bindings for GTK+ and Cairo and all that goodness.

The C version allocates the return value as an array of Segment structs, and returns it in the out_segments argument (... and the length of the array in num_segments). The Rust version returns a mentally easier vector of Segment structs.

Now, the variable declarations at the beginning of the function:

/* In C */ { int i; double last_x, last_y; double cur_x, cur_y; double subpath_start_x, subpath_start_y; int max_segments; int segment_num; Segment *segments; SegmentState state; /* In Rust */ { let mut last_x: f64; let mut last_y: f64; let mut cur_x: f64; let mut cur_y: f64; let mut subpath_start_x: f64; let mut subpath_start_y: f64; let mut has_first_segment : bool; let mut segment_num : usize; let mut segments: Vec<Segment>; let mut state: SegmentState;

In addition to having different type names (double becomes f64), Rust wants you to say when a variable will be mutable, i.e. when it is allowed to change value after its initialization.

Also, note that in C there's an "i" variable, which is used as a counter. There isn't a similar variable in the Rust version; there, we will use an iterator. Also, in the Rust version we have a new "has_first_segment" variable; read on to see its purpose.

/* In C */ max_segments = path->num_data; /* We'll generate maximum this many segments */ segments = g_new (Segment, max_segments); *out_segments = segments; last_x = last_y = cur_x = cur_y = subpath_start_x = subpath_start_y = 0.0; segment_num = -1; state = SEGMENT_END; /* In Rust */ cur_x = 0.0; cur_y = 0.0; subpath_start_x = 0.0; subpath_start_y = 0.0; has_first_segment = false; segment_num = 0; segments = Vec::new (); state = SegmentState::End;

No problems here, just initializations. Note that in C we pre-allocate the segments array with a certain size. This is not the actual minimum size that the array will need; it is just an upper bound that comes from the way Cairo represents paths internally (it is not possible to compute the minimum size of the array without walking it first, so we use a good-enough value here that doesn't require walking). In the Rust version, we just create an empty vector and let it grow as needed.

Note also that the C version initializes segment_num to -1, while the Rust version sets has_first_segment to false and segment_num to 0. Read on!

/* In C */ for (i = 0; i < path->num_data; i += path->data[i].header.length) { last_x = cur_x; last_y = cur_y; /* In Rust */ for cairo_segment in path.iter () { last_x = cur_x; last_y = cur_y;

We start iterating over the path's elements. Cairo, which is written in C, has a peculiar way of representing paths. path->num_data is the length of the path->data array. That array has elements in path->data[] that can be either commands, or point coordinates. Each command then specifies how many elements you need to "eat" to take in all its coordinates. Thus the "i" counter gets incremented on each iteration by path->data[i].header.length; this is the "how many to eat" magic value.

The Rust version is more civilized. Get a path.iter() which feeds you Cairo path segments, and boom, you are done. That civilization is courtesy of the gtk-rs bindings. Onwards!

/* In C */ switch (path->data[i].header.type) { case CAIRO_PATH_MOVE_TO: segment_num++; g_assert (segment_num < max_segments); /* In Rust */ match cairo_segment { cairo::PathSegment::MoveTo ((x, y)) => { if has_first_segment { segment_num += 1; } else { has_first_segment = true; }

The C version switch()es on the type of the path segment. It increments segment_num, our counter-of-segments, and checks that it doesn't overflow the space we allocated for the results array.

The Rust version match()es on the cairo_segment, which is a Rust enum (think of it as a tagged union of structs). The first match case conveniently destructures the (x, y) coordinates; we will use them below.

If you recall from the above, the C version initialized segment_num to -1. This code for MOVE_TO is the first case in the code that we will hit, and that "segment_num++" causes the value to become 0, which is exactly the index in the results array where we want to place the first segment. Rust *really* wants you to use an usize value to index arrays ("unsigned size"). I could have used a signed size value starting at -1 and then incremented it to zero, but then I would have to cast it to unsigned — which is slightly ugly. So I introduce a boolean variable, has_first_segment, and use that instead. I think I could refactor this to have another state in SegmentState and remove the boolean variable.

/* In C */ g_assert (i + 1 < path->num_data); cur_x = path->data[i + 1].point.x; cur_y = path->data[i + 1].point.y; subpath_start_x = cur_x; subpath_start_y = cur_y; /* In Rust */ cur_x = x; cur_y = y; subpath_start_x = cur_x; subpath_start_y = cur_y;

In the C version, I assign (cur_x, cur_y) from the path->data[], but first ensure that the index doesn't overflow. In the Rust version, the (x, y) values come from the destructuring described above.

/* In C */ segments[segment_num].is_degenerate = TRUE; segments[segment_num].p1x = cur_x; segments[segment_num].p1y = cur_y; state = SEGMENT_START; break; /* In Rust */ let seg = Segment { is_degenerate: true, p1x: cur_x, p1y: cur_y, p2x: 0.0, p2y: 0.0, p3x: 0.0, p3y: 0.0, p4x: 0.0, p4y: 0.0 // these are set in the next iteration }; segments.push (seg); state = SegmentState::Start; },

This is where my lack of Rust idiomatic skills really starts to show. In C I put (cur_x, cur_y) in the (p1x, p1y) fields of the current segment, and since it is_degenerate, I'll know that the other p2/p3/p4 fields are not valid — and like any C programmer who wears sandals instead of steel-toed boots, I leave their memory uninitialized. Rust doesn't want me to have uninitialized values EVER, so I must fill a Segment structure and then push() it into our segments vector.

So, the C version really wants to have a segment_num counter where I can keep track of which index I'm filling. Why is there a similar counter in the Rust version? We will see why in the next case.

/* In C */ case CAIRO_PATH_LINE_TO: g_assert (i + 1 < path->num_data); cur_x = path->data[i + 1].point.x; cur_y = path->data[i + 1].point.y; if (state == SEGMENT_START) { segments[segment_num].is_degenerate = FALSE; state = SEGMENT_END; } else /* SEGMENT_END */ { segment_num++; g_assert (segment_num < max_segments); segments[segment_num].is_degenerate = FALSE; segments[segment_num].p1x = last_x; segments[segment_num].p1y = last_y; } segments[segment_num].p2x = cur_x; segments[segment_num].p2y = cur_y; segments[segment_num].p3x = last_x; segments[segment_num].p3y = last_y; segments[segment_num].p4x = cur_x; segments[segment_num].p4y = cur_y; break; /* In Rust */ cairo::PathSegment::LineTo ((x, y)) => { cur_x = x; cur_y = y; match state { SegmentState::Start => { segments[segment_num].is_degenerate = false; state = SegmentState::End; }, SegmentState::End => { segment_num += 1; let seg = Segment { is_degenerate: false, p1x: last_x, p1y: last_y, p2x: 0.0, p2y: 0.0, p3x: 0.0, p3y: 0.0, p4x: 0.0, p4y: 0.0 // these are set below }; segments.push (seg); } } segments[segment_num].p2x = cur_x; segments[segment_num].p2y = cur_y; segments[segment_num].p3x = last_x; segments[segment_num].p3y = last_y; segments[segment_num].p4x = cur_x; segments[segment_num].p4y = cur_y; },

Whoa! Buts let's piece it apart bit by bit.

First we set cur_x and cur_y from the path data, as usual.

Then we roll the state machine. Remember we got a LINE_TO. If we are in the state START ("just have a single point, possibly a degenerate one"), then we turn the old segment into a non-degenerate, complete line segment. If we are in the state END ("we were already drawing non-degenerate lines"), we create a new segment and fill it in. I'll probably change the names of those states to make it more obvious what they mean.

In C we had a preallocated array for "segments", so the idiom to create a new segment is simply "segment_num++". In Rust we grow the segments array as we go, hence the "segments.push (seg)".

I will probably refactor this code. I don't like it that it looks like

case move_to: start possibly-degenerate segment case line_to: are we in a possibly-degenerate segment? yes: make it non-degenerate and remain in that segment... no: create a new segment, switch to it, and fill its first fields... ... for both cases, fill in the last fields of the segment

That is, the "yes" case fills in fields from the segment we were handling in the *previous* iteration, while the "no" case fills in fields from a *new* segment that we created in the present iteration. That asymmetry bothers me. Maybe we should build up the next-segment's fields in auxiliary variables, and only put them in a complete Segment structure once we really know that we are done with that segment? I don't know; we'll see what is more legible in the end.

The other two cases, for CURVE_TO and CLOSE_PATH, are analogous, except that CURVE_TO handles a bunch more coordinates for the control points, and CLOSE_PATH goes back to the coordinates from the last point that was a MOVE_TO.

And those tests you were talking about?

Well, I haven't written them yet! This is my very first Rust code, after reading a pile of getting-started documents.

Already in the case for CLOSE_PATH I think I've found a bug. It doesn't really create a segment for multi-line paths when the path is being closed. The reftests didn't catch this because none of the reference images with SVG markers uses a CLOSE_PATH command! The unit tests for this path_to_segments() machinery should be able to find this easily, and closer to the root cause of the bug.

What's next?

Learning how to link and call that Rust code from the C library for librsvg. Then I'll be able to remove the corresponding C code.

Feeling safer already?