Imagine that you’re implementing a system for a large bookstore. Many operations have to take place at the same time, multiple customers may simultaneously want to purchase the same book, prices of some books may change, new products are still being delivered, etc. As you know, a single action done by a user is run as a transaction in a database. Let's see what can happen if you allow multiple transactions to work on the same data.

For this example, we will work on the product table, which looks like this:

At first, the content of this table is:

id name price description 1 Effective Java 80.00 Are you looking for a deeper understanding of the Java programming language so that... 2 Java Transaction Processing 50.00 Transaction processing is at the heart of... 3 Drive: The Surprising Truth About What Motivates Us 100.00 Most people believe that the best way to motivate...

Dirty Reads

Look at this sequence of operations:

Transaction 1 Transaction 2 /* Query 1 */ UPDATE product SET price = 90.00 WHERE id = 1; /* No commit here */ /* Query 2 */ SELECT * FROM product WHERE id = 1; id name price description 1 Effective Java 90.00 Are you looking for a deeper understanding of the Java programming language... ROLLBACK;

What is happening here? Transaction 1 changes the price of “Effective Java.” Transaction 2 then read the new price of 90.00. But Transaction 1 is then rolled back: maybe the bookstore staff changed their mind, or maybe there was an error which prevented the transaction from being committed to the database. But Transaction 2 has already read the new uncommitted price. Perhaps there was even an order made with the new price!

This phenomenon is called dirty reads. It happens when a row is changed by Transaction 1, but the changes are not committed. Next, Transaction 2 reads the uncommitted data. If the changes are rolled back, the view of data in the records of Transaction 2 may be wrong.

Nonrepeatable Reads

Now look at this sequence of operations:

Transaction 1 Transaction 2 /* Query 1 */ SELECT * FROM product WHERE id = 1; id name price description 1 Effective Java 80.00 Are you looking for a deeper understanding of the Java programming language... /* Query 2 */ UPDATE product SET price = 90.00 WHERE id = 1; COMMIT; /* Query 3 */ SELECT * FROM product WHERE id = 1; id name price description 1 Effective Java 90.00 Are you looking for a deeper understanding of the Java programming language so that... COMMIT;

Transaction 1 selects info about the book “Effective Java.” In the meantime, Transaction 2 changes the price of the book. Transaction 1 reads the “Effective Java” details again and it sees that the price is now different. How come? Transaction 1 has not changed the data, how could it have changed?

We can imagine a situation where a user of our system saw the old price of a book so he ordered it. Now imagine how surprised he will be when he finds a different price on the bill!

This phenomenon is called nonrepeatable reads. It occurs when a transaction re-reads a single record and finds that it has been changed or deleted.

Phantom Reads

Let’s look at the example:

Transaction 1 Transaction 2 /* Query 1 */ SELECT * FROM product WHERE price BETWEEN 50.00 AND 90.00; id name price description 2 Java Transaction Processing 50.00 Transaction processing is at the heart of... 1 Effective Java 90.00 Are you looking for a deeper understanding of the Java programming language... /* Query 2 */ INSERT INTO product VALUES (4, 'Screw It, Lets Do It',60.00, 'Global entrepreneur Sir Richard Branson has built a business empire and made billions…'); COMMIT; /* Query 3 */ SELECT * FROM product WHERE price BETWEEN 50.00 AND 90.00; id name price description 2 Java Transaction Processing 50.00 Transaction processing is at the heart of... 1 Effective Java 90.00 Are you looking for a deeper understanding of the Java programming language so that... 4 Screw It, Let's Do It 60.00 Global entrepreneur Sir Richard Branson has built a business empire and made billions... COMMIT;

Transaction 1 looks for books with prices between 50 and 90 and finds out there are two books satisfying this condition. In the meantime, Transaction 2 adds a new book with a price of 60. When Transaction 1 queries the data again it finds out that there are now three books.

Let’s consider a scenario in which you find out that books from the mid-price segment are not selling very well. You make a decision to try selling books priced between 50 and 90 at a discount: you run an appropriate database update. At the same moment, a new book is inserted into the system by your employee. It belongs to the mid-price segment but is also likely to become a best-seller. Your price update will – surprisingly for you – also include the newly inserted book.

This phenomenon is called phantom reads. It occurs when a transaction performs the same query twice and sees new, unseen beforehand results.

Note that this is a different phenomenon than non-repeatable read. With non-repeatable read you see the same set of rows but some of the data in the rows are changed.

Transaction Isolation Levels: a Trade-Off Between Concurrency and Efficiency

The three phenomena illustrated above show that when all operations are immediately visible to other transactions, problems may occur. You want to have some sort of isolation between transactions: the intermediate, 'dirty' states should not be visible to other transactions. But the more isolation you enforce on a database, the less concurrency you may have.

On one hand, you may have no isolation whatsoever. All changes can be seen immediately after they are performed. On the other side of the spectrum, you could run transactions in a sequential manner: one transaction after the other is finished. You would have absolutely no concurrency. The efficiency of the system would be very low but no read phenomena would occur.

None of these solutions is easily acceptable in practice. That is why intermediate solutions are applied, with different degrees of separation. According to how we separate transactions from each other, various anomalies can be expected.

ANSI/ISO SQL Standard Levels of Transaction Isolation

The ANSI/ISO SQL standard defines four levels of transaction isolation. The four levels are different: one sequence of operations can result in different outcomes, depending on the level of isolation.

Read uncommitted – transactions can see data that are not yet committed. This is the least restrictive level of isolation. It does not protect against any of the phenomena we talked about in this article.

– transactions can see data that are not yet committed. This is the least restrictive level of isolation. It does not protect against any of the phenomena we talked about in this article. Read commited , which guarantees that any data read by a transaction has been committed. It prevents the reader from seeing any read that is uncommitted (think: intermediate, ‘dirty’). Still, it does not exclude the appearance of phantoms or non-repeatable reads.

, which guarantees that any data read by a transaction has been committed. It prevents the reader from seeing any read that is uncommitted (think: intermediate, ‘dirty’). Still, it does not exclude the appearance of phantoms or non-repeatable reads. Repeatable reads guarantees all of the above and it additionally guarantees that once the data are read, they won’t change. If the transaction reads the same data again, it will find the previously read data unchanged.

guarantees all of the above and it additionally guarantees that once the data are read, they won’t change. If the transaction reads the same data again, it will find the previously read data unchanged. Serializable guarantees that all its predecessors do and subsequent reads can’t see any new data. This is the most restrictive isolation level of all. It also allows for the least concurrency. In this level no read phenomena are possible.

You can see all that summed up in the table below:

Isolation Level Dirty read Nonrepeatable Read Phantom Read Efficiency Read uncommited Possible Possible Possible Very high Read commited Not possible Possible Possible High Repeatable read Not possible Not possible Possible Medium Serializable Not possible Not possible Not possible Low

That’s all for now. I hope that this article clarifies the issue of isolation in databases. In the next article, I will deal with database isolation levels in different database implementations.