Modification Anomalies

Once our E-R model has been converted into relations, we may find that some relations are not properly specified. There can be a number of problems: Deletion Anomaly : Deleting one fact or data point from a relation results in other information being lost. Insertion Anomaly : Inserting a new fact or tuple into a relation requires we have information from two or more entities – this situation might not be feasible. Update Anomaly : Updating one fact in a relation requires us to update multiple tuples.



Anomaly Example 1

Here is an example to illustrate these anomalies: Consider a very common CUSTOMER relation: CUSTOMER( CustomerID , CustomerName, Street, City, State, PostalCode)

In the United States, the PostalCode (or ZipCode) references a specific City and State so one might have data such as:

CustomerID Name Street City State PostalCode C101 Bill Smith 123 First St. New Brunswick NJ 07101 C102 Mary Green 11 Birch St. Old Bridge NJ 07066 C103 Ted Jones 3 Academy St. Old Bridge NJ 07066 C104 Sally Taylor 446 First Ave. New Brunswick NJ 07101 C105 Mary Miller 44 Toga Ct. Farmingdale NY 11735 Insertion Anomaly: What happens if we go to add a new Customer: C106, Joe Feldman, 99 Ninth St., Springfield, NJ

What we know about Joe is that he lives in Springfield, NJ (one fact) but we may not know his PostalCode.

We will need to get that additional fact (the fact that the PostalCode for Springfield, NJ is 07081.

What we know about Joe is that he lives in Springfield, NJ (one fact) but we may not know his PostalCode. We will need to get that additional fact (the fact that the PostalCode for Springfield, NJ is 07081. Deletion Anomaly: What happens if we delete customer C105 : Then we not only remove the customer information but we also remove (lose) the fact that Farmingdale, NY has postal code 11735.

: Then we not only remove the customer information but we also remove (lose) the fact that Farmingdale, NY has postal code 11735. Modification Anomaly: It is possible that when a town grows in population, the zip code will be split into two (or more) new zip codes.

For example, if Old Bridge, NJ splits its zip code, then we will have to update many different tuples even though we are only changing one “fact” about Old Bridge’s zip code.

Anomaly Example 2

Here is another example to illustrate anomalies: A company has a Purchase Order form:





Our dutiful consultant creates the E-R Model directly matching the purchase order:





When we follow the steps to convert to a set of relations this results in two relations (keys are underlined): PO_HEADER ( PO_Number , PODate, Vendor, Ship_To, ...) LINE_ITEMS ( PO_Number , ItemNum , PartNum, Description, Price, Qty)

Consider some sample data for the LINE_ITEMS relation:

PO_Number ItemNum PartNum Description Price Qty O101 I01 P99 Plate $3.00 7 O101 I02 P98 Cup $1.00 11 O101 I03 P77 Bowl $2.00 6 O102 I01 P99 Plate $3.00 5 O102 I02 P77 Bowl $2.00 5 O103 I01 P33 Fork $2.50 8

What are some of the problems with this relation ? What happens if we want to add the fact that Order O103 has quantity 5 of part P99 ? What happens when we delete item I02 from Order O101 ? What happens if we want to change the price of the Plate (P99)?

These problems occur because the relation in question contains data about 2 or more themes.

Typical way to solve these anomalies is to split the relation in to two new relations – This is part of the Process called Normalization discussed next.

On the next page we will formally define the Normal Forms and the Normalization Process.