1 Tb Firebird database: preliminary report

Dmitry Kuzmenko, last update 31-03-2014



Translations of this document: Portuguese Russian Chinese

Read our article about even bigger (1.7 Terabyte) database: More Details About 1.7 Terabyte database.

Why to create terabyte Firebird database?

Many companies work with big Firebird databases and rely on them to support important business operations. Some of Firebird databases are already hundreds of gigabytes in size and continue to grow (see “Who's Big?” section), and it's easy to predict the moment when they become 2, 3 or 5 times bigger. So database administrators and vendors are interested to investigate Firebird behaviour with big databases and get some recommendations how to manage them.

Also the important reason we had in mind while creating 1Tb Firebird database was the final elimination of prevalent perception of Firebird as database engine for “small databases”. This myth seems to be dead now, but some analysts and journalists raise it from the grave regularly, and we hope to finish with this ridiculous perception at last.

Hardware Firebird is well-known for its incredible scalability and this investigation has confirmed it once again. The initial purpose of this experiment was just creation of Firebird database 1Tb in size, so we used usual desktop computer: Table 1: Hardware Component Parameters CPU AMD Athlon 64 x2 5200 RAM 4GB Motherboard MSI K9N Platinum HDD1 (operation system and temp) ST3160815AS, 160GB, SATA II HDD2 (auxilary) HDT721064SLA360, 640GB, SATA II HDD2 (auxilary) HDS728080PLA380, 80GB, SATA I HDD3 (database) ST31500341AS, 1.5TB, SATA II (Firmware CC1H)

In essence, we've put the 1.5Tb HDD into the one of our office desktops, without any other modifications. This HDD was formatted with cluster size 16Kb (the same as page size of database, as you can see below). S oftware As it's a desktop computer, operation system is Windows XP Professional SP3, 32bit. To actually perform the test we used the loader from TPC-based toolkit (download it from http://ibdeveloper.com/tests/tpc-c/, both binaries and sources are available). We'd like to emphasize that loader inserts data as it would be inserted in real life scenario: records are being inserted and situated inside database (and at physical disk areas) into several master-detail-subdetail tables, not table by table. Table 2: Software Software Version Operation system Windows XP Professional SP3, 32bit Firebird 2.1.3 SuperServer (snapshot) Loader Custom loader from tpc-based test Who's big? There are several companies who handle significantly big Firebird databases. Here we’ve listed three of them as examples of real-world big Firebird databases in 3 key industries: retail, finance and medicine. Bas-X Bas-X (http://www.basx.com.au/, Australia) is a leading provider of enterprise information technologies to independent retailers, particularly multi-site operators and management groups. Bas-X is a true leader in Firebird-based development: two of their customers have Firebird databases with size more than 450Gb, and several others have databases more than 200Gb. Interesting thing about Bas-X is that they offer Software-as-a-Service solutions and using Firebird as database management system for thousands of clients. This is definitely one of the brightest examples of real-life cloud computing offerings, and it uses Firebird, which is absolutely good enough for this hard job. Watermark Technologies Watermark Technologies (http://www.watermarktech.co.uk/, UK) is the great example of utilizing Firebird technology to serve enterprises in Finance and Government sectors. Watermark Technologies produces software which uses Firebird for document management which includes indexed OCR for full text search. It is used by financial advisers, insurance companies and so on. There are several 300+Gb Firebird databases deployed at present. Free Firebird licensing is one of the things which enables Watermark Technologies offer flexible subscription model for end customers, so they can avoid upfront payments and pay as they go. Profitmed Profitmed (http://www.profitmed.net/, Russia) joint stock company is one the largest Russian pharmaceutical distributors. They have relatively small database (only ~40Gb), but we decided to mention them as they have extremely high load in terms of simultaneous active connections, serving hundreds of small resellers and drug store shops across Russia. Though this database looks smaller than others, it contains very dense data inside: SKUs of medicines, warehouse movements are represented as numbers, and, thanks to Firebird data compression mechanism, these data consume very modest amount of disk storage.

Plan

We had very straightforward plan for this experiment:

Create database and load it with 1Tb data, without indices Create primary keys and appropriate indices (so actually database size is more than 1Tb) Gather database statistics Run several SQL queries and estimate database performance

Database and Firebird server configuration

Database has page size 16384 bytes, the same as HDD cluster, to maximize disk throughout performance (to read/write 1 page at one I/O cycle).

In the Firebird configuration we have configured additional directory for temp space and point it to the disk with 640Gb (where ~300Gb was free).

Loading step

Data were loaded into this database by several steps. Computer was used during loading operations as usual desktop (we have MS Office, Firefox, IBAnalyst, etc – about 8-12 programs ran at the same time). If we would dedicate the hardware for this task only, probably it would be faster, so please consider these values only as a low-end example; they are definitely not the top results.

Table 3: Loading operations

Description Value Time to load ~70 hours Total records inserted 6.2 billions Average insertion speed 24500 records/second Average record size 146 bytes (min 13 bytes, max – 600 bytes) Transactions 646489

We spent ~4 days at loading, and after that we had Firebird database with exactly 1Tb size (i.e. 1 099 900 125 184 bytes).

Below you can see database growth and transactions dynamics in FBDataGuard Viewer:

Indices

We created indices one by one and counted their creations' time and appropriate size of temp file used for sorting.

The biggest index was created for table ORDER_LINE. Its primary key contains four fields (Smallint, Smallint, Integer and Smallint). Temp file for this sorting index was 182Gb, and final index size in the database is 29.3Gb.

It's interesting to see that even index for table with 3.8 billion records has depth = 3, because page size was 16384 bytes, so there is no overhead while searching data using primary key for this table.

Statistics

After that we have gathered database statistics. It took 7 hours 32 mins 45 secs.

We've put key statistics information into the one table and included some queries and time measurements:

Table 4: Consolidated statistics for 1Tb database

Table name Record counts Size, gb Execution time of select count(*) Index creation time Tmp file size, Gb Index size, Gb WAREHOUSE 1240 0.002 0s 0 0 0.0 ITEM 100000 0.012 0.7s - - 0.0 DISTRICT 124000 0.017 0.7s 6 - 0.0 NEW_ORDER 111600000 32 20m 00s 23m 00s 4.56 0.8 CUSTOMER 372000000 224 - 41m 00s - 2.6 customer_last 1h 52m 32s 12.4 2.3 fk_cust_ware 2h 10m 51s - 2.3 HISTORY 372000000 32 - - - - ORDERS 372000000 25 32m 00s 45m 41s 15.2 2.5 STOCK 1240000000 404 - 3h 34m 44s 41.5 9.2 ORDER_LINE 3720051796 359 - 12h 6m 18s 182.0 29.3

Database statistics can be downloaded from here.

You can use free FBDataGuard Community Edition Viewer to interpret text data and see not only database performance metrics, but also CPU and memory consumption.

Queries

First of all, we've run select count(*) queries on several tables (see 4-th column in Table 4 above). As you know, due to the multi-version nature of Firebird select count(*) for whole table is an expensive operation for server because it requires visiting of every page, and experienced Firebird developers does not use select count(*), but we used it to demonstrate the overall performance ratio of database and hardware.

After select count queries we have run queries from real-life scenario and, to be honest, we were amazed with so good results. See for yourself:

Query Statistics Description select w_id, w_name, c_id, c_last from WAREHOUSE, customer where c_w_id = w_id PLAN JOIN (WAREHOUSE NATURAL, CUSTOMER INDEX (FK_CUST_WARE)) ------ Performance info ------ Prepare time = 15ms Execute time = 79ms Avg fetch time = 6.08 ms Current memory = 272 264 476 Max memory = 272 514 048 Memory buffers = 16 384 Reads from disk to cache = 82 Writes from cache to disk = 0 Fetches from cache = 3 648 Simple join of tables with 12400 and 372000000 records, no WHERE conditions. “Avg fetch time = 6.08 ms” is for fetching the first row. select w_id, w_name, c_id, c_last from WAREHOUSE, customer where c_w_id = w_id and c_w_id = 10000 PLAN JOIN (WAREHOUSE INDEX (WAREHOUSE_PK), CUSTOMER INDEX (FK_CUST_WARE)) ------ Performance info ------ Prepare time = 16ms Execute time = 78ms Avg fetch time = 6.00 ms Current memory = 272 266 148 Max memory = 272 514 048 Memory buffers = 16 384 Reads from disk to cache = 88 Writes from cache to disk = 0 Fetches from cache = 3 656 Join of the same tables with condition which forces selection of recent records. “Avg fetch time = 6.00 ms” is for fetching the first row. select count(*) from WAREHOUSE, customer where c_w_id = w_id and c_w_id = 10000 Result = 30000 PLAN JOIN (WAREHOUSE INDEX (WAREHOUSE_PK), CUSTOMER INDEX (FK_CUST_WARE)) ------ Performance info ------ Prepare time = 0ms Execute time = 453ms Avg fetch time = 453.00 ms Current memory = 272 263 844 Max memory = 272 514 048 Memory buffers = 16 384 Reads from disk to cache = 1 048 Writes from cache to disk = 0 Fetches from cache = 60 024 Count records for previous query SELECT * FROM ORDER_LINE WHERE OL_W_ID = 500 Plan PLAN (ORDER_LINE INDEX (ORDER_LINE_PK)) ------ Performance info ------ Prepare time = 0ms Execute time = 94ms Avg fetch time = 7.23 ms Current memory = 136 445 536 Max memory = 136 592 176 Memory buffers = 8 192 Reads from disk to cache = 150 Writes from cache to disk = 0 Fetches from cache = 2 402 Query to the biggest table (3.8B records). “Avg fetch time = 7.23 ms” is for fetching the first row. Plan PLAN (ORDER_LINE INDEX (ORDER_LINE_PK)) ------ Performance info ------ Prepare time = 0ms Execute time = 3s 438ms Avg fetch time = 0.01 ms Current memory = 136 445 496 Max memory = 136 592 176 Memory buffers = 8 192 Reads from disk to cache = 1 840 Writes from cache to disk = 0 Fetches from cache = 598 636 SELECT * FROM ORDER_LINE WHERE OL_W_ID = 500 The same query to the biggest table (3.8B records), but at this time we have fetched all records (299245 records fetched). select w_id, w_name, c_id, c_last from WAREHOUSE, customer where c_w_id = w_id and (c_w_id > 8000) and (c_w_id < 10000) Plan PLAN JOIN (WAREHOUSE INDEX (WAREHOUSE_PK), CUSTOMER INDEX (FK_CUST_WARE)) ------ Performance info ------ Prepare time = 0ms Execute time = 125ms Avg fetch time = 9.62 ms Current memory = 272 270 824 Max memory = 272 514 048 Memory buffers = 16 384 Reads from disk to cache = 91 Writes from cache to disk = 0 Fetches from cache = 3 659 Join tables with 1240 records and 372M records. select count(*) from WAREHOUSE, customer where c_w_id = w_id and (c_w_id > 8000) and (c_w_id < 10000) Result = 59 970 000 Plan PLAN JOIN (WAREHOUSE INDEX (WAREHOUSE_PK), CUSTOMER INDEX (FK_CUST_WARE)) ------ Performance info ------ Prepare time = 0ms Execute time = 13m 4s 718ms Avg fetch time = 784 718.00 ms Current memory = 272 268 532 Max memory = 272 514 048 Memory buffers = 16 384 Reads from disk to cache = 2 332 583 Writes from cache to disk = 0 Fetches from cache = 119 977 902 Count records for previous query

Summary

In this experiment Firebird shows the following results

Undoubted ability to handle big databases. We are pretty sure that it's possible to create and use 32 Tb database on appropriate hardware, and Firebird will show the same high performance as it shows for smaller databases (i.e., 1Tb and below).

Good scalability and amazingly small footprint. 1Tb database was created at usual desktop computer and, more important, it can be used to perform general queries: if you don't fetch millions of records, query’s speed is the same as it is for moderate size databases (10-15Gb).

This is not the end of this experiment: we intend to run some queries, gather additional statistics and publish more detailed report shortly. Please stay tuned.

Contacts

Send all your questions and enquiries to [email protected]