Those Poor Maligned ENUMs

Tag: SQL

I think it is safe to say that the majority of web developers these days are not excessively familiar with sql and databases. This is perfectly understandable. When I was starting out, we did client/server. A VB/Powerbuilder/Delphi front end with a database. Not knowing sql meant not knowing half your job. These days there is a huge assortment of services and skills the average developer needs to keep abreast of while at the same time more and more tools and patterns that isolate them from the actual database.

What this has lead to, unfortunately, is a few cases where oversimplified and occasionally incorrect "rules of thumb" are used rather than knowledgeable decisions making. The "N+1" rule is an example that comes to mind; for years you would read people recommending loops instead of joins because "joins are slow, and MYSQL is really fast at connections". More likely, the writer is simply uncomfortable writing joins. This sort of ill-informed advice does no one any good.

Another one is the topic in hand - ENUMs. "Top 12 Reasons Why ENUMs are Evil!" I don't think so. Strings are Evil if your complaint is that you can't multiply them. ENUMs are a tool, and if you understand how they work, then you understand when they are the correct tool. So let's do that - let's study the ENUM a little, see how we do things, what we can't do easily, and get comfortable with them so we can make informed decisions.

CREATE TABLE `enumtest` ( `id` int(11) NOT NULL, `test` enum('delta','alpha','gamma','beta') NOT NULL ) ENGINE=InnoDB; INSERT INTO `enumtest` (`id`, `test`) VALUES (1, 'delta'), (2, 'alpha'), (3, 'gamma'), (4, 'gamma'), (5, 'beta'), (6, 'delta');

The code above is valid for Mysql. One of the criticisms of ENUMs is that it is not SQL-92 compliant. This means that that code won't necessarily work for other databases - each has its own particular albeit similar syntax and underlying code. You'll have to decide if this is important to you.

Does your shop require sql compliance? Then its a no-starter. If not, are you writing many scripts that may need to be ported someday? My guess is, probably not. Most people, whether via their framework or directly, are using some form of ORM and migrations library that is handling the internal syntax for them. For that reason, I don't personally see it as being an issue for most people, just something you need to be aware of before starting.

Let's look at a few sorting queries to get at the crux of what many people dislike. Run this:

SELECT * FROM `enumtest` ORDER BY test; // Results: 1 - delta 6 - delta 2 - alpha 3 - gamma 4 - gamma 5 - beta

Right. What happened there? Let's try one more query to figure it out.

SELECT id, test, test + 0 FROM `enumtest` ORDER BY test; // Results: 1 - delta - 1 6 - delta - 1 2 - alpha - 2 3 - gamma - 3 4 - gamma - 3 5 - beta - 4

Now we see what's going on. `enum('delta','alpha','gamma','beta')` is what controls the ordering of our field. This is because we actually have a sort of hash table laid over the top of this field and in the field itself, the enum's indexes are being stored. (More technical readers please note - I am trying to explain in a manner useful to the majority here. It will vary by database implementation and might be a C++ ENUM structure or other solutions. The effect for the "end user" is an array.) When we do an ORDER BY, we are sorting on the field's values - only they aren't what we thought they were.

SELECT id, test, test + 0 FROM `enumtest` ORDER BY CAST(test AS CHAR); // Results: 2 - alpha - 2 5 - beta - 4 1 - delta - 1 6 - delta - 1 3 - gamma - 3 4 - gamma - 3

That's what we wanted. Casting the ENUM as a string allows us sort on it in the expected fashion. There are some other interesting things you can do in your queries.

SELECT id, test, test + 0 FROM `enumtest` WHERE test IN ('alpha', 4); // Results: 2 - alpha - 2 5 - beta - 4

WHERE clauses will search for the values as either strings or indexes. Because the values stored are integers, the filtering itself is integer-based and faster than than searching strings. A quick lookup to get the integer indexes is run, then the main query. Storage space, as well, is the same as a foreign key on a lookup table.

This helps us understand how to use them, but let's consider one more important case before deciding if we should. What happens if we want to add additional values to the ENUM? Let's look at what happens if we put it in the middle of our list:

ALTER TABLE `enumtest` CHANGE `test` `test` ENUM('delta','epsilon','alpha','gamma','beta'); INSERT INTO `enumtest` (`id`, `test`) VALUES (NULL, 'epsilon'); SELECT id, test, test + 0 FROM `enumtest` ORDER BY test; // Results: 1 - delta - 1 6 - delta - 1 8 - epsilon - 2 2 - alpha - 3 3 - gamma - 4 4 - gamma - 4 5 - beta - 5

You see that the index values have updated. This didn't use to be the case - and was one of the biggest criticisms of ENUMs (although I don't really know why, as the need to insert a value in the middle of the list is negligible in my opinion).

So what can't ENUMs do? When should we avoid them?

They cannot hold meta data. If you need additional information about those test values, use a normal relational table

They are not really intended to be updated or inserted/deleted from something like an admin panel (although there are no technical restrictions preventing it)

The list can be up to 65,535 in MySql. But seriously. Use this for small, fixed lists

If the values are meant to be used in more than one table, use a relational table (Postgres seems to work by creating an ENUM object, like a User Defined Datatype, which could be reused. Still, I think a table for these values is probably going to serve you better)

That's not every single advantage and restriction, but I hope it gives you enough to go on when deciding if it is appropriate for your use case.