I have a collection of people, named peeps:

db.peeps.insert({ UserName : 'BOB' })

db.peeps.insert({ UserName : 'bob' })

db.peeps.insert({ UserName : 'Bob' })

db.peeps.insert({ UserName : 'Sally' })



And I want to be able to find the user named “Bob”. Except I don’t know if the user name is lower, upper or mixed case:

db.peeps.find({ UserName : 'bob' }).count()

db.peeps.find({ UserName : 'BOB' }).count()

db.peeps.find({ UserName : 'Bob' }).count()



But I don’t really want to require my users to type the name the same case as was entered into the database…

db.peeps.find({ UserName : /bob/i }).count()



Me: Yey, Regex!!!

MongoDB: Ahem! Not so fast… Look at the query plan.

db.peeps.find({ UserName : /bob/ }).explain()









Me: Oh, I’ll create an index!



db.peeps.createIndex({ UserName : 1 });



db.peeps.find({ UserName : /bob/ }).explain()















db.peeps.find({ UserName : /bob/i }).explain()













Me: Yey!!!

MongoDB: Dude, dig deeper… and don’t forget to left-anchor your query.



db.peeps.find({ UserName : /^bob/i }).explain( true )















Me: Yey?

MongoDB: Each key in the index was examined! That’s not scalable… for a million documents, mongo will have to evaluate a million keys.

Me: But, but, but…

db.peeps.find({ UserName : /^bob/ }).explain( true )















Me: This is back to exact match :-) Only one document returned. I want case insensitive match!

Old MongoDB: ¯\(ツ)/¯… Normalize string case for that field, or add another field where you store a lowercase version just for this comparison, then do an exact match?

Me:

New MongoDB: Dude: Collation!

Me: Oh?

Me: (Googles MongoDB Collation frantically…)

Me: Ahh!

db.peeps.createIndex({ UserName : -1 }, { collation : { locale : 'en' , strength : 2 } )



db.peeps.find({ UserName : 'bob' }).collation({ locale : 'en' , strength : 2 })

















Me: Squee!

MongoDB: Indeed.

Collation is a very welcome addition to MongoDB.

You can set Collation on a whole collection, or use it in specific indexing strategies.

The main pain point it solves for me is the case-insensitive string match, which previously required either changing the schema just for that (ick!), or using regex (index supported, but not nearly as efficient as exact match).

Beyond case-sensitivity, collation also addresses character variants, diacritics, and sorting concerns. This is a very important addition to the engine, and critical for wide adoption in many languages.

Check out the docs: Collation