Adventures in Ecto and PostgreSQL: Partial unique indexes and upserts with associations Mathias Polligkeit Follow Feb 16 · 8 min read

Let’s say you were asked by your government to implement a national dog registry in order to help them to enforce their new one-dog-policy. Everyone who wants to buy a dog will need to register it at the municipal government.

Since your government cares a lot about the well-being of the nation’s dogs and generally invests a lot of time and resources into research endeavours, they want to collect some additional data on all the dogs and want all dog owners to check in once a year to keep the information up to date. Because a year is a long time between updates to conduct proper research and also to keep the feedback loop short regarding new dog-happiness policies, the government also plans to equip dogs and dog-related products with sensors in the future.

Setup

Versions: Elixir 1.10.0, ecto 3.3.3, ecto_sql 3.3.4, PostgreSQL 11.5

Let’s start off with this Ecto migration:

defmodule DogRegistry.Repo.Migrations.CreateTables do

use Ecto.Migration def change do

create table(:owners, primary_key: false) do

add :id, :binary_id, primary_key: true

add :name, :string, null: false

add :city, :string, null: false timestamps()

end create table(:dogs, primary_key: false) do

add :id, :binary_id, primary_key: true

add :name, :string, null: false

add :status, :string, null: false add :owner_id, references(:owners, on_delete: :nilify_all, type: :uuid) timestamps()

end create table(:dog_toys, primary_key: false) do

add :id, :binary_id, primary_key: true

add :product, :string

add :rating, :integer add :dog_id, references(:dogs, on_delete: :nilify_all, type: :uuid) timestamps()

end create index(:dog_toys, :dog_id)

end

end

With this we create an owners table, a dogs table and a dog_toys table. A dog belongs to an owner and a dog toy belongs to a dog. The government will need a lot more information on the dogs, of course, including various psychological measures and health statistics, but let’s keep it simple for now.

We also create Ecto schemas for each of these tables, but since they are quite unremarkable, I’ll only give you the Dog schema:

defmodule DogRegistry.Dog do

use Ecto.Schema import Ecto.Changeset alias DogRegistry.DogToy

alias DogRegistry.Owner

alias DogRegistry.Repo @primary_key {:id, :binary_id, autogenerate: true}

@foreign_key_type :binary_id

schema "dogs" do

field :name, :string

field :status, :string belongs_to :owner, Owner

has_many :toys, DogToy, on_replace: :delete timestamps()

end def changeset(dog, attrs) do

dog

|> cast(attrs, [:name, :owner_id, :status])

|> cast_assoc(:toys)

|> validate_required([:name, :owner_id, :status])

|> validate_inclusion(:status, ["alive", "dead"])

|> foreign_key_constraint(:owner_id)

end

end

Partial unique indexes

You decided to handle both dog registrations and updates to existing dogs with a single function. Don’t ask me why, you didn’t tell me.

This means we need to implement this as an upsert. The first thing we need is a unique index on the dogs table to prevent the clerk from inserting a second dog with the same owner. However, one-dog-policy does not mean that you may only have one single dog throughout your entire lifetime. Once your dog died, you can get a new one. This means the unique index may only apply if the dog is alive.

PostgreSQL allows you to define partial indexes to do just that.

In SQL that index would be defined like this:

CREATE UNIQUE INDEX dogs_owner_id_alive_ix ON dogs (owner_id, status)

WHERE status = 'alive';

Ecto.Migration.unique_index/3 allows you to add a where option, so the statement above looks like this in your application:

defmodule DogRegistry.Repo.Migrations.DogsOwnerIdAliveIx do

use Ecto.Migration def change do

create unique_index(:dogs, [:owner_id, :status],

name: "dogs_owner_id_alive_ix",

where: "status = 'alive'"

)

end

end

The first tests

We create a test module that uses the DataCase module Phoenix creates and sets up some fixtures. The complete code example is linked below. The first describe block only has a simple test to make sure you can insert a new dog into the database:

describe "create_or_update/1" do

test "creates a dog" do

owner = insert_owner!(@owner) assert {:ok, %Dog{}} = Dog.create_or_update(params(@living_dog_1, owner))

assert [%Dog{}] = Repo.all(Dog)

end test "allows to add multiple dead dogs" do

owner = insert_owner!(@owner) assert {:ok, %Dog{}} = Dog.create_or_update(params(@dead_dog_1, owner))

assert {:ok, %Dog{}} = Dog.create_or_update(params(@dead_dog_2, owner))

assert {:ok, %Dog{}} = Dog.create_or_update(params(@living_dog_1, owner))

assert {:ok, %Dog{}} = Dog.create_or_update(params(@dead_dog_3, owner))

assert [_, _, _, _] = Repo.all(Dog)

end

end

The params/2 function does nothing but add the ID of the second argument to the map that is passed as the first argument. I know that you know that there is a pipe operator.

This is the function that passes the tests:

def create_or_update(attrs) do

%__MODULE__{}

|> changeset(attrs)

|> Repo.insert()

end

The upsert test

Now on to the more interesting test case.

test "replaces previous dog if alive" do

owner = insert_owner!(@owner) assert {:ok, %Dog{}} = Dog.create_or_update(params(@living_dog_1, owner))

assert {:ok, %Dog{}} = Dog.create_or_update(params(@living_dog_2, owner))

assert [%Dog{name: name}] = Repo.all(Dog)

assert name == @living_dog_2.name

end

This looks similar as the previous test, but now you ensure that there is still only one dog in the database after calling the function a second time. This means if there is already a living dog for the owner in the database, the entry will be replaced silently. You slowly come to the conclusion that this behaviour is probably not what you would want in this scenario, but since you are already invested in this example, you decide to continue.

Unsurprisingly, this test fails with the current implementation. To perform an upsert, you can set the :on_conflict option when calling Repo.insert/2 . When using PostgreSQL, you also need to set the :conflict_target option to tell PostgreSQL which constraints you expect. This can be a list of columns or the constraint name itself. Since we named the unique index, we can try this:

def create_or_update(attrs) do

%__MODULE__{}

|> changeset(attrs)

|> Repo.insert(

on_conflict: :replace_all,

conflict_target: {:constraint, :dogs_owner_id_alive_ix}

)

end

However, now we're getting this error in all the tests:

** (Postgrex.Error) ERROR 42704 (undefined_object) constraint "dogs_owner_id_alive_ix" for table "dogs" does not exist

What’s going on?

Ecto turns the options above into this SQL fragment:

ON CONFLICT ON CONSTRAINT "dogs_owner_id_alive_ix"`

This is syntactically correct, but PostgreSQL requires you to specify the constraint including the whole WHERE clause. So we need to find a way to tell Ecto to do that. Unfortunately, there is no shortcut for this at the moment, but instead of passing the constraint name, Ecto also allows us to pass an unsafe SQL fragment.

def create_or_update(attrs) do

%__MODULE__{}

|> changeset(attrs)

|> Repo.insert(

on_conflict: :replace_all,

conflict_target:

{:unsafe_fragment, "(owner_id, status) WHERE status = 'alive'"}

)

end

This works!

Associations

Your plan was to update all the associations of the dog (except for the owner) with that same function call. So far, the test fixtures look like this:

@dead_dog_1 %{name: "Clarence", status: "dead"}

@dead_dog_2 %{name: "Penelope", status: "dead"}

@dead_dog_3 %{name: "Christopher", status: "dead"} @living_dog_1 %{name: "Dwayne", status: "alive"}

@living_dog_2 %{name: "Claire", status: "alive"}

Very simple, no associations. Let’s give Dwayne some toys and see what happens.

@living_dog_1 %{

name: "Dwayne",

status: "alive",

toys: [

%{product: "round chewie thing", rating: 5},

%{product: "long catchie thing", rating: 4}

]

}

Running the tests again, we get a foreign key constraint error for the test create_or_update/1 replaces previous dog if alive :

1) test create_or_update/1 replaces previous dog if alive (DogRegistry.DogTest)

test/dog_registry/dog_test.exs:50

** (Ecto.ConstraintError) constraint error when attempting to insert struct: * dog_toys_dog_id_fkey (foreign_key_constraint) If you would like to stop this constraint violation from raising an

exception and instead add it as an error to your changeset, please

call `foreign_key_constraint/3` on your changeset with the constraint

`:name` as an option. The changeset defined the following constraints: * dogs_owner_id_fkey (foreign_key_constraint)

The other tests pass, though. So inserting a dog including toys is no problem, but updating an existing dog with toys is.

Let’s remove the toys from the fixtures for a moment and update the test like this:

test "replaces previous dog if alive" do

owner = insert_owner!(@owner) assert {:ok, %Dog{id: id_1}} =

Dog.create_or_update(params(@living_dog_1, owner)) assert {:ok, %Dog{id: id_2}} =

Dog.create_or_update(params(@living_dog_2, owner)) IO.inspect(id_1, label: "ID1")

IO.inspect(id_2, label: "ID2") assert [_] = Repo.all(Dog)

end

You will get an output similar to this:

ID1: "6a274d9b-f0c1-4f06-b8db-1ef3c7103da4"

ID2: "7e9f7b9c-7a64-4a96-a0a3-2aaa08360b33"

As it happens, passing on_conflict: :replace_all to Repo.insert/2 will not only replace the values in the changeset, but also generate new values for all autogenerated fields including timestamps and the ID. If we want to keep some of the existing values, we can pass {:replace_all_except, fields} or {:replace, fields} as the option value. Let's change the on_conflict option of create_or_update/1 to:

on_conflict: {:replace_all_except, [:id, :inserted_at]}

We want to keep the original ID, but we also want to keep the inserted_at value, while we are happy to get an updated updated_at value.

If we run the test again, we will still see two different IDs. Why is that?

We configured our application with UUIDs as primary keys. Unlike auto-incrementing integer IDs, UUIDs are auto-generated by Ecto, not in the database (unless you set autogenerate to false in your schema and configure your database to generate them). And since Ecto already knows all values it generated or set itself, it doesn’t ask them from the database when running the query. So in our case, Ecto generates a UUID, sends an insert statement to the database, the database sees a conflict and makes a replacement instead, but keeps the old ID, and this existing ID is not returned back to Ecto.

The fix is easy, though. Repo.insert/2 accepts another option called :returning , which specifies which fields should be returned from the database. If you set returning to true , all fields will be returned.

So the final version of create_or_update/1 looks like this:

def create_or_update(attrs) do

%__MODULE__{}

|> changeset(attrs)

|> Repo.insert(

returning: [:id],

on_conflict: {:replace_all_except, [:id, :inserted_at]},

conflict_target:

{:unsafe_fragment, "(owner_id, status) WHERE status = 'alive'"}

)

end

Now the IDs in the test output will match and the tests will also pass with fixtures including associations.

However, our tests still don't have any assertions about the associations. Let's change that by adding this test:



owner = insert_owner!( test "replaces all associations" doowner = insert_owner!( @owner

assert {:ok, %Dog{}} = Dog.create_or_update(params( assert {:ok, %Dog{}} = Dog.create_or_update(params( @living_dog_1 , owner))assert {:ok, %Dog{}} = Dog.create_or_update(params( @living_dog_2 , owner)) assert [%DogToy{product: "small squeaky thing"}] = Repo.all(DogToy)

end

Similar to the previous test, except that we actually have a look at the dog toys after the upsert. @living_dog_1 had two toys, @living_dog_2 had only one, so we would expect to be only the single toy of dog 2 in the database after the update. But this test fails, the database includes both toys of dog 1 and the toy of dog 2.

The problem is that PostgreSQL can handle the replacement of the dog entry itself, but it doesn't know anything about the associations, and since we only passed an empty struct as the first argument to Ecto.Changeset.changeset/2 in our create_or_update/2 function, Ecto has no way of tracking the associations either.

Now, we could change that by 1) changing the function signature of create_or_update/2 and 2) passing a fully preloaded struct to the function, but that would defeat the purpose of the upsert function, because we could then as easily call a regular update function with that preloaded resource. Which means that if you want to do a complete replacement of the existing associations, you shouldn't use an upsert function after all.

Complete source code: https://github.com/woylie/article-dog-registry

Don't look directly into the sun.