MigrationWTF: Specified key was too long; max key length is 3072 bytes

Published on Saturday 2, December 2017

I recently hit a curveball with a custom migration to import location data into Drupal. I am helping a family friend merge their bespoke application into Drupal. Part of the application involves managing Locations. Locations have a parent location, so I used taxonomy terms to harness the relationship capabilities. Importing levels one, two, three, and four worked great! Then, all the sudden, when I began to import levels five and six I ran into an error: Specified key was too long; max key length is 3072 bytes

False positives

My first reaction was a subtle table flip and bang on the table. I had just spent the past thirty minutes writing a Kernel test which executed my migrations and verified the imported data. Everything was green. Term hierarchy was preserved. Run it live: table flip breakage. But, why?

When I run tests I do so with PHPUnit directly to just execute my one test class or a single module. Here is the snippet from my project's phpunit.xml

Turns out when running SQLite there are no constraints on the index key length, or there's some other deeper explanation I didn't research. The actual site is running on MySQL/MariaDB. As soon as I ran my Console command to run the migrations the environment change revealed the failure.

Finding a possible fix

Following the wrong path to fixing a bug, I went in a frantic Google search trying to find an answer which asserted by expected fix: there is some bogus limitation on the byte size that I can somehow bypass. So reviewed the fact that InnoDB has the limit of 3072 bytes, and that MyISAM has a different one (later, I found it is worse at 1000 bytes and one reason why we're all using MyISAM.)

I found an issue on Drupal.org which was reported about the index key being too long. The bug occurs by having too many keys listed in the Migration source plugin. Keys identify unique rows in the migration source. They are also used as the index in the create migrate_map_MIGRATION_ID. So if you require too many keys for identifying unique rows, you will experience this error. In most cases, you can probably break up your CSV and normalize it to be it easier to parse.

Indexes were added in Drupal 8.3 to improve performance. So, I had to find a fix. First I tried swapping back to MyISAM and realizing that was a fool's errand, but I was desperate.

I thought about trying to normalize the CSV data and make different, smaller, files. But there was a problem: some locations share the same name but have different parent hierarchies. A location in Illinois could share the same name as a location in a Canadian territory. I needed to preserve the Country -> Administrative Area -> Locality values in a single CSV.

A custom idMap plugin to the rescue

If you have worked with the Migrate module in Drupal you are familiar with process plugins and possibly familiar with source plugins. The former helps you transform data and the latter brings data into the migration. The migrate also has an id_map plugin. There is one single plugin provided by Drupal core: sql. I never knew or thought about this plugin because it is not defined. In fact, we never have to.

/** * {@inheritdoc} */ public function getIdMap() { if (!isset($this->idMapPlugin)) { $configuration = $this->idMap; $plugin = isset($configuration['plugin']) ? $configuration['plugin'] : 'sql'; $this->idMapPlugin = $this->idMapPluginManager->createInstance($plugin, $configuration, $this); } return $this->idMapPlugin; }

If a migration does not provide an idMap definition it defaults to the core default sql mapping.

Hint: if you want to migrate into a non-SQL database you're going to need a custom id_map plugin!

Once I found this plugin I was able to find out where it created its table and the index information. Bingo! \Drupal\migrate\Plugin\migrate\id_map\Sql::ensureTables

/** * Create the map and message tables if they don't already exist. */ protected function ensureTables() {

In this method, it creates the schema that the table will use. There is some magic in here. By default, all keys are treated as a varchar field with a length of 64. But, then, it matches up those keys with known destination field values. So if you have a source value going to a plain text field it will change to a length of 255.

// Generate appropriate schema info for the map and message tables, // and map from the source field names to the map/msg field names. $count = 1; $source_id_schema = []; $indexes = []; foreach ($this->migration->getSourcePlugin()->getIds() as $id_definition) { $mapkey = 'sourceid' . $count++; $indexes['source'][] = $mapkey; $source_id_schema[$mapkey] = $this->getFieldSchema($id_definition); $source_id_schema[$mapkey]['not null'] = TRUE; } $source_ids_hash[static::SOURCE_IDS_HASH] = [ 'type' => 'varchar', 'length' => '64', 'not null' => TRUE, 'description' => 'Hash of source ids. Used as primary key', ]; $fields = $source_ids_hash + $source_id_schema; // Add destination identifiers to map table. // @todo How do we discover the destination schema? $count = 1; foreach ($this->migration->getDestinationPlugin()->getIds() as $id_definition) { // Allow dest identifier fields to be NULL (for IGNORED/FAILED cases). $mapkey = 'destid' . $count++; $fields[$mapkey] = $this->getFieldSchema($id_definition); $fields[$mapkey]['not null'] = FALSE; }

This was my issue. The destination fields are the term name field, which has a length of 255. Great. But there is no way to interject here and change that value. All of the field schemas are coming from field and typed data information.

The solution? Make my own plugin. The following is my sql_large_key ID mapping class.

<?php namespace Drupal\mahmodule\Plugin\migrate\id_map; use Drupal\migrate\Plugin\migrate\id_map\Sql; /** * Defines the sql based ID map implementation. * * It creates one map and one message table per migration entity to store the * relevant information. * * @PluginID("sql_large_key") */ class LargeKeySql extends Sql { protected function getFieldSchema(array $id_definition) { $schema = parent::getFieldSchema($id_definition); if ($schema['type'] == 'varchar') { $schema['length'] = 100; } return $schema; } }

The following is an example migration using my custom idMap definition.

id: company_location6 status: true migration_tags: - company idMap: plugin: sql_large_key source: plugin: csv_by_key path: data/company.csv header_row_count: 1 # Need each unique key to build proper hierarchy keys: - Location1 - Location2 - Location3 - Location4 - Location5 - Location6 process: name: plugin: skip_on_empty method: row source: Location6 vid: plugin: default_value default_value: locations # Find parent using key from previous level migration. parent_id: - plugin: migration migration: - company_location5 source_ids: company_location5: - Location1 - Location2 - Location3 - Location4 - Location5 parent: plugin: default_value default_value: 0 source: '@parent_id' destination: plugin: 'entity:taxonomy_term' migration_dependencies: { }

And, voila! 6,000 locations later there is a preserved hierarchy!