April 9, 2020

For an internal project at work, I recently had to parse the names of Heroku review applications to retrieve some data. The application names looked like this:

<project_name>-pr-<pull_request_id>

At first, since each part I needed was separated by a dash, I had some code that looked like this:

* project_name , _ , pull_request_id = application_name . split ( '-' ) project_name = project_name . join ( '-' )

Because the project name could also have some dashes in it, I needed to rejoin it after extracting the pull request data. At first, for a prototype, this worked fine. But when this internal project transitioned into being an important part of my team’s tooling, I started looking at a better and cleaner way to achieve the same result.

Since we were already validating the format of the application name with a regular expression, I figured I’d use it to also retrieve the data using named captures.

Regular expressions in Ruby

For a refresher on a regular expressions, I highly recommend this article by Dan Eden.

As a reminder, there are multiple ways to create regular expressions in Ruby:

Using /xxxx/

Using percent literal : %r{}

Using the class initializer: Regexp#new

With your newly created regular expression, there are two main ways to check if a string matches a regular expression:

Calling String#match with the regular expression as argument: 'abc' . match ( /a/ ) # => #<MatchData "a">

with the regular expression as argument: Calling Regexp#match on the regular expression with the string as argument: /a/ . match ( 'abc' ) # => #<MatchData "a">

If the String matches the regular expression, it will return a MatchData object, otherwise it will return nil . The MatchData object encapsulates the result of matching a String against a Regexp, including the different submatches. It also contains the eventual captures and named captures.

Named captures

Named captures allow you to describe submatches of a regular expression and then retrieve them from the resulting MatchData object. In our case, our regular expression looked like this:

/.*-pr-\d+/

To use named captures, we first need to add capture them into groups to our regular expressions. Adding capture groups is as simple as wrapping them inside parentheses:

/(.*)-pr-(\d+)/

Finally, name the different captures. To do this, we need to prefix the content of the capture group with its name:

/(?<project_name>.*)-pr-(?<pull_request_id>\d+)/

Now that we’ve done this, we can easily retrieve the data we want from the application name using our resulting object:

expression = /(?<project_name>.*)-pr-(?<pull_request_id>\d+)/ application_name = 'my_app-pr-1234' matches = expression . match ( application_name ) matches [ :project_name ] # => 'my_app' matches . named_captures # => {"project_name"=>"my_app", "pull_request_id"=>"1234"}