ZOMG! I’m in Japan!

This year, I am fortunate enough to be one of the 750 attendees at Ruby Kaigi in Tokyo. If you are among one of them, I’d love to meet you and chat about Ruby, Rails, etc. Just look for my name tag!

For those of you who couldn’t make it this year, don’t feel left out! I’ll be writing a three-part blog post (one for each day) to cover the things I’ve learned here, so you can watch the talks online that interestes you when they become available. (You can also watch the live streams!)

Without further ado, here is day 1 at Ruby Kaigi!

Cool Stuff

All the talks are real-time translated into English and Japanese via these headsets.

You can purchase some Ruby Kaigi goodies from this veding machine.

Bento boxes for lunch!

Rule #1 and #2 of Ruby Kaigi – don’t upgrade your iPhone with the conference wifi!

CRuby Committers Who’s Who in 2014

@nogachika, who is known for his ruby-trunk-changes commentary (it’s like Rails weekly but for Ruby trunk), gave a behind-the-scene view of CRuby development.

There are 84 registered accounts on SVN (including a bot), 50 of which who had at least one commit since Ruby Kaigi 2013. @nogachika introduced some of them in his talk, including:

@matz who invented Ruby (and mruby), whose only commit of the year was to bump the Ruby version (Matz focuses more on the design aspect of Ruby as well as mruby development these days).

@ko1 who introduced incremental GC in Ruby 2.2

@nari who introduced Symbol GC, which played a key role in Rails’ plan to target 2.2+ exclusively in Rails 5

@tmm1 who introduced many performance optimizations, including speeding up Hash#[]

@kazu the typo fixer

Watch the talk online to get to know more of these people who built the language we love!

Building the Ruby interpreter – What is easy and what is difficult?

@ko1 works on Matz’s Ruby team at Heroku. Over the years, he has accumulated an impressive list of contributions to the Ruby language/implementation, including native thread locking, fibers, the new method cache, flonum, RGenGC and incremental GC in Ruby 2.2 (more about it in vol 0048 of the Rubyist Magazine and his talk at the upcoming RubyConf 2014).

Tradeoffs

The overarching theme of his talk is about making tradeoffs. Like many things in life, most decisions we make as programmers are tradeoffs that affects multiple intertwined goals that are often in conflict with each other. As engineers, it is our job to understand about these tradeoffs, carefully consider the factors in-play and overcome these challenges.

Parallel Execution

One of the specific example @ko1 talked about is the problem of supporting parallel execution in Ruby.

Simply providing parallel threads (i.e. getting rid of the GVL) to enable parallel execution is actually a relatively easy task. The difficulties here is to maintain good programming experience, good serial execution performance (due to increased synchronization) and CRuby source code quality/maintainability.

Running multiple threads in parallel under the typical “share everything” programming model often results in subtle bugs that falls under the following categories: race conditions, atomicity violation and order violation. Making matters worse, the non-deterministic nature of parallel execution make these bugs very difficult to reproduce. All of these problems contribute negatively to the “programmer happiness”, which is a very important cornerstone of the Ruby language.

@ko1 believes that the tradeoff here is not unlike other problems we faced in the past, such as the safety/performance tradeoff of garbage collection vs manual memory management. In this case, parallel threads is like manual memory management – while it’s highly performant and flexible, it’s also extremely error-prone and very difficult to reason about. He believes that Ruby’s job is to provide an alternative programming model for parallel execution that is safe and programmer-friendly, as well as good debugging tools. Ultimately these will be preferred over utilizing threads directly for most applications, just like how garbage collection is now preferred by many programmers.

He then briefly introduced a few ideas for implementing such models in Ruby and some academic research on making parallel execution deterministic.

GC, Benchmarking, Community and more!

In the rest of his talk, @ko1 did a similar analysis on other topics such as Ruby garbage collection, making measurements and becoming a CRuby committer. If those topics interests you, be sure to check out this talk!

Symbol GC

@nari is a CRuby committer from NaCl. He gave an overview of the “Symbol GC” feature he implemented for Ruby 2.2.

As I mentioned earlier, this is one of the key drivers for the Rails team to target Ruby 2.2 in Rails 5. A common misconception about this feature is to think that all symbols will be GC-ed so we don’t have to worry about it anymore. As @nari shows us in this talk, it turns out this is not entirely correct.

To understand this, we first have to understand the problems associated with using symbols in CRuby 2.1 and below. Currently, once a symbol is allocated, they are associated with a fixed numeric ID internally, and are never garbage collected. This includes symbol literals ( :a_literal in Ruby code), dynamic symbols created in Ruby land (e.g. "a string".to_sym ) and other side-effects (e.g. defining a method “foo” allocates a symbol :foo for use in the methods table).

The pitfall associated with this approach is that over the execution time of your program, you could accumulate many symbols that you no longer need, thus leaking memory. This is particularly problematic for programs that have a long lifetime and handles a lot of user inputs, as it has the potential to be exploited by malicious users and cause DOS attacks.

Web applications written in Ruby (e.g. Rails applications) happens to fit this description perfectly. For example, the following piece of pseudo-code is bad:

value = store.fetch(params[:key].to_sym)

As you can see, a user supplied string is being converted into a symbol (a dynamic symbol allocation). This is problematic, because an attack could send a bunch of different strings under this parameter and eventually cause the server to crash when it runs out of memory. In fact, this has been a common source of security issue in Rails (e.g. CVE-2012-3424), which explains the desire to require Ruby 2.2 in Rails 5.

Currently, symbols cannot be GC’ed because they need to maintain the same ID for the C code that depends on it (for example, if the corresponding symbol has been GC’ed as the C code looks it up via ID2SYM , then things would break).

To solve this problem, symbols are now classified into two categories – immortal symbols and mortal symbols. As the name implies, immortal symbols are not garbage collected by the runtime and have a stable ID, so baiscally how symbols work today. Mortal symbols, on the other hand, can be garbage collected by the runtime just like any other objects. They don’t have an ID, so they are usually only useful in Ruby-land.

As it becomes necessary, mortal symbols will be “pinned-down” by the runtime and becomes immortal symbols. Because they are marked as “uncollectable”, their memory address becomes their IDs and hence they can be used in C code. For example, consider this piece of code:

define_method(method_name.to_sym) { ... }

Here, the runtime first allocates a mortal symbol (from method_name.to_sym ) that can be garbage collected. As it is passed into define_method , however, the runtime would convert it into a immortal symbol so that it could be used in the methods table. Once this happens, the symbol can no longer be garbage collected – it is impossible to convert an immortal symbol back into a mortal symbol. Also, if an immortal symbol with the same name has previously been allocated from a different place, the same symbol will be reused here instead.

While this approach eliminates a entire class of potential bugs, it also introduces some new pitfalls. In particular, if a dynamic symbol (mortal by default) is converted into an immortal symbol, your program will suffer from the same vulnerability as before.

Therefore, as application programmers, we still need to be mindful of when a dynamic symbol might be converted into an immortal symbol. While you probably aren’t defining dynamic methods based on user inputs (if you are, you might have bigger problems to worry about!), it is possible that passing a dynamic symbol into a C function might cause it to become immortalized (e.g. when the C code calls SYM2ID on it). While most of the CRuby C code has been refactored to avoid this landmine, it is still a common problem in third-party extensions. @nari noted that our ecosystem is still undergoing a transition period, and things would get better overtime (Rails 5 is probably going to help accelerate that as well).

I highly recommend that you watch this talk, especially if you maintain a gem and/or C-extension.

Transactional Memory and Ruby

@ReiOdaira and @brucehsu did two different talks on how transactional memory can benefit the Ruby language.

Transactional memory is similar to how a database transaction works. It allows the programmer to place a block of code inside a memory transaction, during which all memory operations will be perceived as a single atomic operation from other threads (or fails and the operations are rolledback and aren’t observable from outside of the thread that requested the transaction).

@ReiOdaira’s talk focuses on his research at IBM that utilize hardware-based transactional memory to reduce Ruby’s dependency on the global VM lock (GVL) thus improving the degree of achievable parallization.

@brucehsu’s talk focuses on software-based transactional memory techniques, and showed the world how it could be done by writing a new Ruby VM in Go, aptly called Gobies.

If you are interested in how these cutting edge technologies can be used in Ruby, these talks are for you!

Wrapping Up

That about wraps up my notes for Day 1 of Ruby Kaigi 2014. Of course, there are many other talks that are equally interesting. However, it’s already 3AM here in Japan and I should probably get some rest ;) I encourage you to check out the conference schedule for a full list of talks.

Stay tuned for part 2 & 3!