At Dia, we’ve built much of our own backend operations tools from the ground up such as Inventory and Warehouse Management Systems on Rails, Postgres, and AWS.

In our Inventory Management System we manage thousands of products and styles. We also keep track of every single item of the several million garments that have been entered into our inventory system.

The feature: Mass upload through CSV

We were tasked to build a feature to allow our merchandising team to mass upload new items that have been purchased from various vendors through a CSV file.

Before this feature, our merchandise team had to manually add new styles into our system one by one.

Example CSV

Complexity & handling associations

Our styles have many required properties including color, size, pattern, brand, etc.

Having this logic built into the CSV creates a lot of room for error when we import because many of the properties of our styles are not just string columns in our database — they are often associations represented by a foreign key.

For example, in the image above, the data in the pattern column is represented by a name. When we upload the CSV we must query our patterns table independently to ensure a pattern exists with that name.

There are many scenarios where things can go wrong in the CSV import. We wanted the unit tests to be robust while keeping things flexible and maintainable.

Saying no to fixtures

The traditional way to test CSV files in rails is with fixtures. Fixtures are a way of organizing data that you want to test against — they are also saved as a separate file in your repository.

We quickly found that fixtures were not flexible enough, because:

We would need dozens of fixture files to handle the many different cases.

to handle the many different cases. Fixtures can be hard to maintain. If we add an extra column or change the value of a column from string to an integer, we would have to update all of the fixture files.

If we add an extra column or change the value of a column from string to an integer, we would have to update all of the fixture files. It can be tough to grok your test cases since the data is stuffed in another file.

A more flexible solution

We came up with a solution to build up CSV files on the fly, and define our test data inside of our spec file.

At a high level, we:

Define the rows and columns of a potential CSV scenario. Write that data to an actual CSV file. Run that CSV through our CSV import class. Test the results with RSpec. Delete the file.

This allows us to spin up hundreds of different CSV scenarios in one go.

Breaking down the code

Defining the columns and rows with RSpec:

This allows for a lot of flexibility. For example, we might only want to change one thing in row2 in another scenario. To do this, we can add one more let call inside of a new context:

Building up the CSV:

With our defined rows, we build up a CSV file inside the filesystem.

We save it in the /tmp directory under test.csv .

Actually testing the CSV:

This is the actual class that we are testing, and all we have to do is pass in the file path since we are always saving our file there right before this code is run.

Deleting the file:

After every test we delete the file just to be safe.

Testing the results

Having that code in place, we wrote tests actual tests on the business logic.

For us, this meant testing at the database level to ensure records are getting created/updated as expected. And for the negative scenarios, we tested that proper validation errors are being captured and presented to the user.

Limitations of our approach

With most technical solutions there are often downsides.

This solution was a byproduct of one single CSV upload trying to do too many things. When you have so many different scenarios of data input, the code can get intricate, leading to errors — as most of the business logic is encapsulated in one Ruby class.

For this reason, we are working with our product team to break the upload into several smaller, more specific uploads so we can separate out some of the CSV processor logic into more maintainable chunks.

As a feature becomes more complex, it’s important to embrace this separation of concerns.