Most existing big data storages based on HDFS are lack of feature upsert(if exists then update otherwise add). This means you may suffer from many situations:

1. You just can not update records. For incremental data sync, the ability of upsert is required.

2. When you are updating data, people may not be able to update or even read the data.

3. A huge number of small files will impact your storage memory and performance seriously. In order to reduce the number of small files, you may create a job to compact small files to big files. But this behavior will trigger situation 2, when you are doing compaction job, this may take a long time, and the other jobs will be forced to stop otherwise they may throw exceptions.

[Delta Lake](https://delta.io) is an open-source storage layer that brings ACID transactions to Apache Spark™ and big data workloads.

It includes the following key features :

1. ACID Transactions.

2. Scalable Metadata Handling

3. Time Travel(data versioning)

4. Open Format(parquet)

5. Unified Batch/Streaming Source and Sink

6. Schema Enforcement

7. Schema Evolution

With the help of these features, you may write/update/read the same data in parallel. You can do a compaction job without affecting the other jobs who write/read the same data collection. This is really incredible. However, the bad news is, the latest version of delta is 0.2.0, and the Upsert/Delete/Compaction features are still working in progress. The good news is that I have created a new open-sourced project [delta-plus](https://github.com/allwefantasy/delta-plus) which has already added Upsert/Delete/Compaction features based on delta 0.2.0.

So in this post, I will also talk about the features which are available on [delta-plus](https://github.com/allwefantasy/delta-plus).

The design of delta is really amazing, and it’s simple, but it works. A delta table is just a directory contains two parts of file collection: A bunch of parquet files and a bunch of metafiles with both JSON/Parquet formats. This can be explained by the following picture:

The first question is, with this design, how can we add new data into the table?