On the first Tuesday of each month our Boston office hosts Boston.rb’s project night. This is an opportunity for the community to get together and hack away on each other’s projects. At a recent project night we encountered an interesting problem. We went through several iterations, before we found a solution that satisfied us.

We’re implementing the ability to search events. Search terms should be matched against a set of columns. If no matches are found in the first column, search the next column. If no results are found in any column, fall back to a fuzzy search.

Here is the initial implementation:

def self . search ( query ) matching_column = [ :name , :city , :title ]. detect do | column | Event . where ( column => query ). presence end if matching_column Event . where ( matching_column => query ) else Event . fuzzy_search ( query ) end end

First we query all the columns and find the first column that contains results. Then query the column again and return the results. If no column contains any results, perform a fuzzy search.

The presence method returns the object it is called on or nil if it is empty.

This solution performs a duplicate query after finding which column to search. In addition the implementation is a little awkward.

Our first attempt was to refactor the original implementation to avoid the extra queries.

def self . search ( query ) result = nil [ :name , :city , :title ]. detect do | column | result = Event . where ( column => query ). presence end if result result else Event . fuzzy_search ( query ) end end

We’ve added the local variable result to cache the query results. This takes care of the extra query. However, by initializing an empty variable and mutating it over the course of a loop we’ve introduced an anti-pattern.

This code is still tough to read.

We decided to go back to the drawing board and think about the problem in abstract terms. This led to:

def self . search ( query ) exact_search ( :name , query ). presence || exact_search ( :city , query ). presence || exact_search ( :title , query ). presence || fuzzy_search ( query ) end def self . exact_search ( field , query ) Event . where ( field => query ) end

It would be tempting here to see all these || s and attempt to construct a SQL OR statement.

def self . search ( query ) Event . where ( "name = :query OR city = :query OR title = :query" , query: query ) end

However, SQL OR is not equivalent to logical || . || will only return the first non- nil set. OR will return all results that match any of the criteria. The SQL query above is equivalent to:

def self . search ( query ) exact_search ( :name , query ) + exact_search ( :city , query ) + exact_search ( :title , query ) end def self . exact_search ( field , query ) Event . where ( field => query ) end

This would return a mix of name, city, and title results rather than results for the first field that returned any values.

The boolean approach was a clean solution to our problem. However, a new requirement emerged that the exact search should occur on an arbitrary array of columns.

We could have iterated on the previous solution by throwing in some meta programming but decided to explore some other approaches. Here we pull out some old-style procedural programming:

def self . search ( query , columns ) index = 0 results = Event . none while index < columns . length results = Event . where ( columns [ index ] => query ) break if results . exists? index += 1 end results . presence || Event . fuzzy_search ( query ) end

This solution is very un-Ruby like but does meet our new requirement. This uses a while loop to iterate over each element in the columns array and breaks if results are found.

This reintroduces the temporary variable + loop anti-pattern mentioned above.

We discovered that we really wanted was an array of queries that would only be evaluated as they were accessed. This led to building a custom Enumerator:

def self . search ( query , column_names ) results ( column_names ). detect ( & :exists? ) end def self . results ( column_names ) columns = column_names . each Enumerator . new do | yielder | loop do yielder << Event . where ( columns . next => query ) end yielder << Event . fuzzy_search ( query ) end end

To understand the code above, let’s dig into Ruby’s Enumerator . Enumerator implements the internal and external iterator patterns.

[ 1 ] pry ( main ) > results = Enumerator . new do | yielder | [ 1 ] pry ( main ) * yielder << "first" [ 1 ] pry ( main ) * yielder << "second" [ 1 ] pry ( main ) * end => #<Enumerator: ...> [ 2 ] pry ( main ) > results . next => "first" [ 3 ] pry ( main ) > results . next => "second" [ 4 ] pry ( main ) > results . next StopIteration : iteration reached an end from ( pry ): 7 :in `next'

Calling next on an enumerator will run the code inside its block until a value is sent to the yielder . The code block is then suspended. Subsequent invocations of next will continue running the block from the previous location. When the block reaches its end, a StopIteration error is raised.

Some collections are not just a set of hard-coded values.

[ 1 ] pry ( main ) > even_numbers = Enumerator . new do | yielder | [ 1 ] pry ( main ) * n = 1 [ 1 ] pry ( main ) * loop do [ 1 ] pry ( main ) * yielder << n if n . even? [ 1 ] pry ( main ) * n += 1 [ 1 ] pry ( main ) * end [ 1 ] pry ( main ) * end => #<Enumerator: ...> [ 2 ] pry ( main ) > even_numbers . next => 2 [ 3 ] pry ( main ) > even_numbers . next => 4 [ 4 ] pry ( main ) > even_numbers . next => 6 [ 5 ] pry ( main ) > even_numbers . take ( 10 ) => [ 2 , 4 , 6 , 8 , 10 , 12 , 14 , 16 , 18 , 20 ]

The code inside the loop block will be run until a value is sent to the yielder. The code inside loop could be executed multiple times before a value gets sent to yielder . In this case, it only yields on every other run. We’ve created an infinite series of even numbers. take will calculate the first n numbers of the series.

[ 1 ] pry ( main ) > evens = Enumerator . new do | yielder | [ 1 ] pry ( main ) * yielder << 'first' [ 1 ] pry ( main ) * n = 1 [ 1 ] pry ( main ) * loop do [ 1 ] pry ( main ) * if n < 7 [ 1 ] pry ( main ) * yielder << n if n . even? [ 1 ] pry ( main ) * else [ 1 ] pry ( main ) * raise StopIteration [ 1 ] pry ( main ) * end [ 1 ] pry ( main ) * n += 1 [ 1 ] pry ( main ) * end [ 1 ] pry ( main ) * yielder << 'last' [ 1 ] pry ( main ) * end => #<Enumerator: ...> [ 2 ] pry ( main ) > evens . to_a => [ "first" , 2 , 4 , 6 , "last" ]

This combines the two examples above and introduces the StopIteration error. This allows use to break our loop giving us a finite enumerator.

The final piece needed to understand our solution is to realize that calling each without a block returns an Enumerator .

[ 1 ] pry ( main ) > enum = [ 1 , 2 ]. each => #<Enumerator: ...> [ 2 ] pry ( main ) > enum . next => 1 [ 3 ] pry ( main ) > enum . next => 2 [ 4 ] pry ( main ) > enum . next StopIteration : iteration reached an end from ( pry ): 4 :in `next'

Now that we know about enumerators, let’s take a closer look at the proposed solution:

def self . search ( query , column_names ) results ( column_names ). detect ( & :exists? ) end def self . results ( column_names ) columns = column_names . each Enumerator . new do | yielder | loop do yielder << Event . where ( columns . next => query ) end yielder << Event . fuzzy_search ( query ) end end

By calling each without a block on column_names we get an enumerator. Inside the loop we call next on that enumerator to retrieve a column name. Once there are no more column names columns.next will raise a StopIteration error, breaking the loop . We add our default fuzzy search as the last call to yielder .

results now behaves like an array of queries that only get evaluated if they are accessed. Just what we set out to build.

We can now use detect to find the first result that exists. detect evaluates each value until one matches the given expression. The remaining values are not evaluted.

It turns out the solution above is quite similar to lazy enumerators introduced by Ruby 2.0:

def self . search ( query , column_names ) queries = column_names . lazy . map do | column | Event . where ( column => query ) end queries . detect ( & :exists? ) || Event . fuzzy_search ( query ) end

Lazy enumerators will pass a value through the entire chain of methods before evaluating the next value.

[ 1 ] pry ( main ) > def query ( q ) [ 1 ] pry ( main ) * puts "querying #{ q } " [ 1 ] pry ( main ) * q [ 1 ] pry ( main ) * end => :query [ 2 ] pry ( main ) > [ 'title' , 'name' , 'city' ]. map { | n | query ( n ) }. detect { true } querying title querying name querying city => "title" [ 3 ] pry ( main ) > [ 'title' , 'name' , 'city' ]. lazy . map { | n | query ( n ) }. detect { true } querying title => "title"

As we can see above, lazy.map.detect only calls query with ‘title’ while map.detect calls query for every element in the array.