BSBM V3.1 Results (April 2013)

Contents



Document Version: 0.9

Publication Date: 04/22/2013

1. Introduction

The Berlin SPARQL Benchmark (BSBM) is a benchmark for comparing the performance of storage systems that expose SPARQL endpoints. Such systems include native RDF stores, Named Graph stores, systems that map relational databases into RDF, and SPARQL wrappers around other kinds of data sources. The benchmark is built around an e-commerce use case, where a set of products is offered by different vendors and consumers have posted reviews about products.

We note that the data and query generator used here are using an updated version of the original BSBM (http://sf.net/projects/bibm) which provides several modifications in the test driver and the data generator. These changes have been adopted in the official V3.1 BSBM benchmark definition. The changes are as follows:





The test driver reports more and more detailed metrics including "power" and "throughput" scores.



The test driver has a drill down mode that starts at a broad product category, and then zooms in subsequent queries into smaller categories. Previously, the product category query parameter was picked randomly for each query; if this was a broad category, the query would be very slow; if it is a very specific category it would run very fast. This made it hard to compare individual query runs; and also introduced large variation in the overall result metric. The drill down mode makes it more stable and also tests a query pattern (drill down) that is common in practice.

One query (i.e., BI Q6) was removed that returned a quadratic result. This query would become very expensive in the 1G and larger tests, so its performance would dominate the result.

The text data in the generated strings is more realistic. This means you can do (more) sensible keyword queries on it.

The new generator was adapted to enable parallel data generation. Specifically, one can let it generate a subset of the data files. By starting multiple data generators on multiple machines one can thus hand-parallelize data generation. This is quite handy for the larger-size tests, which literally otherwise takes weeks.





As the original BSBM benchmark, the test driver can run with single-user run or multi-user run.



This document presents the results of a April 2013 BSBM experiment in which the Berlin SPARQL Benchmark Version 3.1 was used to measure the performance of:

BigData (rev. 6528 as of July 02 2012)

BigOwlim (version 5.2.5524) + BigOwlim (version 5.3.5777) for the cluster edition

TDB (version 0.9.4)

Virtuoso (06.04.3132-pthreads for Linux as of May 14 2012)

Virtuoso (07.00.3202-pthreads for Linux as of Jan 1 2013)



The stores were benchmarked with datasets of by up to 150 billion triples. Details about the dataset sizes are shown as in following table.





Single Machine

Cluster

Use Cases

Explore

BI

Explore & BI

Datasets (million triples)

100, 200, 1000

10, 100, 1000

10000, 50000, 150000



scale: this is the first time that RDF store benchmark results on such a large size have been published. The previous published BSBM results published were on 200M triples, the 150B experiments thus mark a 750x increase in scale.

this is the first time that RDF store benchmark results on such a large size have been published. The previous published BSBM results published were on 200M triples, the 150B experiments thus mark a increase in scale. workload: this is the first time that results on the Business Intelligence (BI) workload are published. In contrast to the Explore workload, which features short-running "transactional" queries, the BI workload consists of queries that go through possibly billions of tuples, grouping and aggregating them (using the respective functionality, new in SPARQL1.1). In contrast to one year ago, we find that now the majority of the RDF stores is able to run the BI workload.

this is the first time that results on the (BI) workload are published. In contrast to the Explore workload, which features short-running "transactional" queries, the BI workload consists of queries that go through possibly billions of tuples, grouping and aggregating them (using the respective functionality, new in SPARQL1.1). In contrast to one year ago, we find that now the majority of the RDF stores is able to run the BI workload. architecture: this is the first time that RDF store technology with cluster functionality has been publicly benchmarked. These experiments include tests using the Virtuoso7 Cluster Edition as well as the BIGOWLIM 5.3 cluster edition.

These results extend the state-of-the-art in various dimensions:

2. Benchmark Datasets

We ran the benchmark using the Triple version of the BSBM dataset. The benchmark was run for different dataset sizes. The datasets were generated using the BIBM data generator and fulfill the characteristics described in section the BSBM specification.

Details about the benchmark datasets are summarized in the following table:



Number of Triples

10M 100M 200M 1B 10B 50B 150B Number of Products 28480 284800 569600 2848000 28480000 142400000 427200000 Number of Producers 559 5623 11232 56288 563142 2815554 8446788 Number of Product Features 19180 47531 93876 167836 423832 796470 1593390 Number of Product Types 585 2011 3949 7021 22527 42129 84259 Number of Vendors 284 2838 5675 28439 284610 1421729 4264028 Number of Offers 569600 5696000 11392000 56960000 569600000 2848000000 8544000000 Number of Reviewers 14613 145961 291923 1459584 14599162 72989573 218974622 Number of Reviews 284800 2848000 5696000 28480000 284800000 1424000000 4272000000 Exact Total Number of Triples* 10119864 100062249 199945456 999700717 9967546016 49853640808 149513009920 File Size Turtle (unzipped) 467 MB 4.6 GB 9.2 GB 48 GB 568 GB 2.8 TB 8.6 TB

(*: As datasets 10B, 50B, 150B are generated in parallel in 8 machines, the number of triples is computed approximately by multiplying the number of triples generated in one machine with 8)



Note: All datasets were generated with the -fc option for forward chaining.

The BSBM dataset generator and test driver can be downloaded from SourceForge.



The RDF representation of the benchmark datasets can be generated in the following way:



To generate the 100M dataset as Turtle file type the following command in the BSBM directory:



./generate -fc -s ttl -fn dataset_100M -pc 284826 -pareto





To generate the 150B dataset in 1000 Turtle files in multiple machines (e.g., 8 machines, each machine has 125 files),

type the following command in the BSBM directory:



./generate -fc -s ttl -fn dataset150000m -pc 427200000 -nof 1000 -nom 8 -mId <machineID> -pareto



(The <machineID> will be 1, 2, 3, …, 8 according to which machine in 8 machines that the command is run)





Variations:



* generate N-Triples instead of Turtle:



use -s nt instead of -s ttl



* generate update dataset for the Explore and Update use case:



add -ud



* Generate multiple files instead of one, for example 100 files:



add -nof 100



* Write test driver data to a different directory (default is td_data), for example for the 100M dataset:



add -dir td_data_100M



3. Benchmark Machine

We used CWI Scilens (www.scilens.org) cluster for the benchmark experiment. This cluster is designed for high I/O bandwidth, and consists of multiple layers of machines. In order to get large amounts of RAM, we used only the “bricks” layer, which contains its most powerful machines. The machines were connected by Mellanox MCX353A-QCBT ConnectX3 VPI HCA card (QDR IB 40Gb/s and 10GigE) through an InfiniScale IV QDR InfiniBand Switch (Mellanox MIS5025Q). Each machine has the following specification.



Hardware : (8 machines)

: (8 machines) Processors: 2 x Intel(R) Xeon(R) CPU E5-2650, 2.00GHz (8 cores & hyperthreading), Sandy Bridge architecture



Memory: 256GB



Hard Disks: 3 x 1.8TB (7,200 rpm) SATA in RAID 0 (180MB/s sequential throughput).



Software :

: Operating System: Linux version 3.3.4-3.fc16.x86_64 Filesystem: ext4





Java Version and JVM: Version 1.6.0_31, 64-Bit Server VM (build 20.6-b01).



BSBM generator and test driver version: bibm-0.7.8



The total cost of this configuration was EUR 70,000; when acquired in 2012.

4. Benchmark Results for the Explore Use Case



This section reports the results of running the Explore use case of the BSBM benchmark against:



BigData (rev. 6528)

BigOwlim (version 5.2.5524)

TDB (version 0.9.4)

Virtuoso6 (06.04.3132-pthreads for Linux as of May 14 2012)

Virtuoso7 (07.00.3202-pthreads for Linux as of Jan 1 2013)



Test Procedure

The load performance of the systems was measured by loading the Turtle representation of the BSBM datasets into the triple stores. The loaded datasets were forward chained and contained all rdf:type statements for product types. Thus the systems under test did not have to do any inferencing.

The query performance of the systems was measured by running 500 BSBM query mixes against the systems over the SPARQL protocol. The test driver and the system under test (SUT) were running on the same machine in order to reduce the influence of network latency. In order to measure sustainable performance of the SUTs, a large number of warm-up runs are executed before actual single-client test runs (as a ramp-up period). Drill down mode is used for all tests.

We applied the following test procedure to each store:

Load data into the store.

Shutdown store, (optional: clear OS caches and swap), restart store. Execute single-client test run (500 mixes performance measurement, randomizer seed: 9834533) with 2000 warm-up runs.

./testdriver -seed 9834533 –w 2000 –runs 500 –drill –o result_single.xml http://sparql-endpoint Execute multi-client runs (4, 8 and 64 clients; randomizer seeds: 8188326, 9175932 and 4187411). For each run add two times the number of clients of warm up query mixes.

For example for a run with 4 clients execute:



./testdriver -seed 8188326 -w 8 -mt 4 -drill -o results_4clients.xml http://sparql-endpoint



The different runs use distinct randomizer seeds for choosing query parameters. This ensures that the test driver produces distinctly parameterized queries over all runs and makes it harder for the stores to apply query caching.

An overview of load times for SUTs and the different datasets are given in the following table (in hh:min:sec):

SUT 10M

100M 200M 1B

BigData

00:2:39 00:25:35 00:59:25 -

BigOwlim

00:2:31 00:22:47 00:47:19 4:9:39 TDB

00:9:41 1:37:55 3:34:59 -

Virtuoso6

00:7:06 00:19:26 00:31:30 1:10:30 Virtuoso7

-

00:03:09 -

00:27:11

* The dataset was split into 1, 10, 20, 100 Turtle files respectively

.

- We do not test/load this dataset with the SUT



4.1 BigData





BigData homepage

4.1.1 Configuration

The following changes were made to the default configuration of the software:



BigData: Version rev. 6528

Copy the bibm3 into bigdata-perf directory.

For loading and starting the server the ANT script in the directory "bigdata-perf/bibm3" was used.





4.1.2 Load Time

The table below summarizes the load times Turtle files (in hh:mm:ss) :

100M 200M 00:25:35 00:59:25





4.1.3 Benchmark Query results: QpS (Queries per Second)

The table below summarizes the query throughput for each type of query over all 500 runs (in QpS):

4.1.4 Benchmark Overall results: QMpH for the 100M and 200M datasets for all runs

For the 100M and 200M datasets we ran a test with 1, 4, 8 and 64 clients. The results are in Query Mixes per Hour (QMpH) meaning that larger numbers are better.



1 4 8

64 100M 12512.278

17949.632

19574.007

20422.626

200M 10059.940 9762.856

11572.433

12935.595





4.1.5 Result Summaries



BigData 100M: Number of clients Single

4

8

64

Download links xml xml xml xml

BigData 200M: Number of clients Single

4

8

64

Download links xml xml xml xml





4.2 BigOwlim





Owlim homepage

4.2.1 Configuration

The following changes were made to the default configuration of the software:



BigOwlim : Version 5.2.5524



: Version 5.2.5524 Tomcat: Version 7.0.30



Modified heap size:



JAVA_OPTS="-Dinfo.aduna.platform.appdata.basedir=`pwd`/data -Xmx200G "





Sesame : Version 2.6.8



: Version 2.6.8 Config files:

Bigowlim template for Sesame







4.2.2 Load Time

The table below summarizes the load times Turtle files (in hh:mm:ss) :

100M 200M 1B

00:22:47 00:47:19 4:9:39



4.2.3 Benchmark Query results: QpS (Queries per Second)

The table below summarizes the query throughput for each type of query over all 500 runs (in QpS):

4.2.4 Benchmark Overall results: QMpH for the 100M, 200M, 1B datasets for all runs

For the 100M, 200M, 1B datasets we ran a test with 1, 4, 8 and 64 clients. The results are in Query Mixes per Hour (QMpH) meaning that larger numbers are better.



1 4 8

64 100M 14029.453

17184.314

11677.860

8321.202

200M 9170.083

8130.137

5614.489

5150.768

1B

1669.899

2246.865

1081.508

912.518





4.2.5 Result Summaries



Bigowlim 100M: Number of clients Single

4

8

64

Download links xml xml xml xml





Bigowlim 200M: Number of clients Single

4

8

64

Download links xml xml xml xml





Bigowlim 1000M: Number of clients Single

4

8

64

Download links xml xml xml xml

4.3 TDB





TDB homepage



Fuseki homepage



4.3.1 Configuration

The following changes were made to the default configuration of the software:



TDB: Version 0.9.4



Loading was done with tdbloader2



Statistics for the BGP optimizer were generated with the "tdbconfig stats" command and copied into the database directory.



Fuseki : Version 0.2.5

Started server with: ./fuseki-server --loc /database/tdb /bsbm





4.3.2 Load Time

The table below summarizes the load times Turtle files (in hh:mm:ss) :

100M 200M 1:37:55 3:34:59





4.3.3 Benchmark Query results: QpS (Queries per Second)

The table below summarizes the query throughput for each type of query over all 500 runs (in QpS):

4.3.4 Benchmark Overall results: QMpH for the 100M and 200M datasets for all runs

For the 100M and 200M datasets we ran a test with 1, 4, 8 and 64 clients. The results are in Query Mixes per Hour (QMpH) meaning that larger numbers are better.



1 4 8

64 100M 15381.857

19036.097

24646.705

14838.483

200M 10573.858

9540.452

18610.896

8265.151







4.3.5 Result Summaries



TDB 100M: Number of clients Single

4

8

64

Download links xml xml xml xml





TDB 200M: Number of clients Single

4

8

64

Download links xml xml xml xml

4.4 Virtuoso6 & Virtuoso7





Virtuoso homepage

4.4.1 Configuration

The following changes were made to the default configuration of the software:



Virtuoso6 : Version 06.04.3132-pthreads for Linux as of May 14 2012

: Version Virtuoso7 : Version 07.00.3202-pthreads for Linux as of Jan 1 2013

Loading of datasets:



The loading was done by running multiple loading process (call rdf_loader_run() function).

For the 100M, 200M, 1B datasets 10, 20, 100 files were generated, respectively.



For the configuration see the "virtuoso.ini" file.







4.4.2 Load Time

The table below summarizes the load times Turtle files (in hh:mm:ss) :

Virtuoso6



100M 200M 1B

00:19:26 00:31:30 1:10:30





Virtuoso7

100M

1B

00:03:09

00:27:11







4.4.3 Benchmark Query results: QpS (Queries per Second)

The table below summarizes the query throughput for each type of query over all 500 runs (in QpS):

Virtuoso6



Virtuoso7



4.4.4 Benchmark Overall results: QMpH for the 100M, 200M, 1B datasets for all runs

For the 100M, 200M, 1B datasets we ran a test with 1, 4, 8 and 64 clients. The results are in Query Mixes per Hour (QMpH) meaning that larger numbers are better.

Virtuoso6



1 4 8

64 100M 37678.319 64885.747

112388.811

20647.413

200M 32969.006

31387.107

77224.941

14480.812

1B

8984.789

15637.439

14343.728

2800.053







Virtuoso7



1 4 8 64 100M

47178.820

91505.200

188632.144

216118.852

1B 27933.682

56714.875

79261.626

132685.957



4.4.5 Result Summaries

Virtuoso6 100M: Number of clients Single

4

8

64

Download links xml xml xml xml





Virtuoso6 200M: Number of clients Single

4

8

64

Download links xml xml xml xml

Virtuoso6 1B: Number of clients Single

4

8

64

Download links xml xml xml xml

Virtuoso7 100M : Number of clients Single

4

8

64

Download links xml xml xml xml





Virtuoso7 1B : Number of clients Single

4

8

64

Download links xml xml xml xml





5. Benchmark Results for the BI Use Case

This section reports the results of running the BI use case of the BSBM benchmark against:



BigData (rev. 6528)

BigOwlim (version 5.2.5524)

TDB (version 0.9.4)

Virtuoso6 (06.04.3132-pthreads for Linux as of May 14 2012)

Virtuoso7 (07.00.3202-pthreads for Linux as of Jan 1 2013)



Test Procedure

The load process is the same as for the Explore use case. (See section 4)

The test procedure is similar to that for the Explore use case, however, for the single-client run, we only run 25 warm-up runs. Since running a BI query mix touches most of the data, few warm-up runs can make the SUTs sufficiently warm and they can have sustainable performance after that.



We applied the following test procedure to each store:

Load data into the store.

Shutdown store, (optional: clear OS caches and swap), restart store. Execute single-client test run (10 mixes performance measurement, randomizer seed: 9834533) with 25 warm-up runs.

./testdriver -seed 9834533 -uc bsbm/bi –w 25 –runs 10 –drill –o result_single.xml http://sparql-endpoint Execute multi-client runs (4, 8 and 64 clients; randomizer seeds: 8188326, 9175932 and 4187411). For each run add two times the number of clients of warm up query mixes.

For example for a run with 4 clients execute:



./testdriver -seed 8188326 -uc bsbm/bi -w 8 -mt 4 -drill -o results_4clients.xml http://sparql-endpoint



The different runs use distinct randomizer seeds for choosing query parameters. This ensures that the test driver produces distinctly parameterized queries over all runs and makes it harder for the stores to apply query caching.

5.1 BigData





BigData homepage

5.1.1 Configuration

The following changes were made to the default configuration of the software:



BigData: Version rev. 6528

Copy the bibm3 into bigdata-perf directory.

For loading and starting the server the ANT script in the directory "bigdata-perf/bibm3" was used.





5.1.2 Load Time

The table below summarizes the load times Turtle files (in hh:mm:ss) :

10M 00:2:39





5.1.3 Benchmark Query results: QpS (Queries per Second)

The table below summarizes the query throughput for each type of query over all 10 runs (in QpS):

5.1.4 Benchmark Overall results: QMpH for the 10 dataset for all runs

For the 10M dataset we ran a test with 1, 4, 8 and 64 clients. The results are in Query Mixes per Hour (QMpH) meaning that larger numbers are better.



1 4 8

64 10M 7.290

16.222

16.439

18.812





5.1.5 Result Summaries



BigData 10M: Number of clients Single

4

8

64

Download links xml xml xml xml





5.2 BigOwlim





Owlim homepage

5.2.1 Configuration

The following changes were made to the default configuration of the software:



BigOwlim : Version 5.2.5524



: Version 5.2.5524 Tomcat: Version 7.0.30



Modified heap size:



JAVA_OPTS="-Dinfo.aduna.platform.appdata.basedir=`pwd`/data -Xmx200G "





Sesame : Version 2.6.8



: Version 2.6.8 Config files:

Bigowlim template for Sesame







5.2.2 Load Time

The table below summarizes the load times Turtle files (in hh:mm:ss) :

10M 100M

1B

00:2:31 00:22:47 4:9:39

5.2.3 Benchmark Query results: QpS (Queries per Second)

The table below summarizes the query throughput for each type of query over all 10 runs (in QpS):

5.2.4 Benchmark Overall results: QMpH for the 10M, 1B datasets for all runs

For the 10M, 1B datasets we ran a test with 1, 4, 8 and 64 clients. The results are in Query Mixes per Hour (QMpH) meaning that larger numbers are better.



1 4 8

64 10M 121.841

265.294

177.338

218.678

100M

15.512

33.986

20.263

15.076

1B

1.400

3.465

2.323

*



( *: No error was found, but this 64-client run was stopped when it ran for more than 2 days.)





5.2.5 Result Summaries



Bigowlim 10M: Number of clients Single

4

8

64

Download links xml xml xml xml

Bigowlim 100M: Number of clients Single

4

8

64

Download links xml xml xml xml





Bigowlim 1B: Number of clients Single

4

8

Download links xml xml xml

5.3 TDB





TDB homepage



Fuseki homepage



5.3.1 Configuration

The following changes were made to the default configuration of the software:



TDB: Version 0.9.4



Loading was done with tdbloader2



Statistics for the BGP optimizer were generated with the "tdbconfig stats" command and copied into the database directory.



Fuseki : Version 0.2.5

Started server with: ./fuseki-server --loc /database/tdb /bsbm





5.3.2 Load Time

The table below summarizes the load times Turtle files (in hh:mm:ss) :

10M 00:9:41



5.3.3 Benchmark Query results: QpS (Queries per Second)

The table below summarizes the query throughput for each type of query over all 10 runs (in QpS):

5.3.4 Benchmark Overall results: QMpH for the 10 dataset for all runs

For the 10M dataset we ran a test with 1, 4, 8 and 64 clients. The results are in Query Mixes per Hour (QMpH) meaning that larger numbers are better.



1 4 8

64 10M 7.468

17.698

9.503

8.414





5.3.5 Result Summaries



TDB 10M: Number of clients Single

4

8

64

Download links xml xml xml xml

5.4 Virtuoso6 & Virtuoso7





Virtuoso homepage

5.4.1 Configuration

The following changes were made to the default configuration of the software:



Virtuoso6 : Version 06.04.3132-pthreads for Linux as of May 14 2012

: Version Virtuoso7 : Version 07.00.3202-pthreads for Linux as of Jan 1 2013

Loading of datasets:



The loading was done by running multiple loading process (call rdf_loader_run() function).

For the 100M, 200M, 1B datasets 10, 20, 100 files were generated, respectively.



For the configuration see the "virtuoso.ini" file.







5.4.2 Load Time

The table below summarizes the load times Turtle files (in hh:mm:ss) :

Virtuoso6



10M 100M

1B

00:7:06 00:19:26 1:10:30





Virtuoso7

100M

1B

00:03:09

00:27:11









5.4.3 Benchmark Query results: QpS (Queries per Second)

The table below summarizes the query throughput for each type of query over all 10 runs (in QpS):

Virtuoso6

Virtuoso7

5.4.4 Benchmark Overall results: QMpH for the 10M, 1B datasets for all runs

For the 10M, 1B datasets we ran a test with 1, 4, 8 and 64 clients. The results are in Query Mixes per Hour (QMpH) meaning that larger numbers are better.

Virtuoso6



1 4 8

64 10M 431.465

2667.657

3915.854

1401.186

100M

35.342

191.431

268.428

99.321

1B

2.383

17.777

21.457

8.355







Virtuoso7



1 4 8 64 100M

996.795

5644.323

6402.190

7132.212

1B 75.236

348.666

361.205

134.459



5.4.5 Result Summaries

Virtuoso6 10M: Number of clients Single

4

8

64

Download links xml xml xml xml

Virtuoso6 100M: Number of clients Single

4

8

64

Download links xml xml xml xml

Virtuoso6 1B: Number of clients Single

4

8

64

Download links xml xml xml xml

Virtuoso7 100M : Number of clients Single

4

8

64

Download links xml xml xml xml





Virtuoso7 1B : Number of clients Single

4

8

64

Download links xml xml xml xml

6. Benchmark Results for the Cluster Edition



This section reports the results of running the Explore and BI use cases of the BSBM benchmark with the cluster editions of



BigOwlim (version 5.2.5524)

Virtuoso7 (07.00.3202-pthreads for Linux as of Jan 1 2013)



Test Procedure

Load data into the store.

Shutdown store, (optional: clear OS caches and swap), restart store. Execute single-client test run of Explore use case (100 mixes performance measurement, randomizer seed: 9834533) with 100 warm-up runs.

./testdriver -seed 9834533 -uc bsbm/explore –w 100 –runs 100 –drill –o result_single.xml http://sparql-endpoint Execute multi-client runs with 8 clients of Explore use case (randomizer seed 9175932) and 16 warm-up runs.

./testdriver -seed 8188326 -uc bsbm/explore -w 16 -mt 8 -drill -o results_8clients.xml http://sparql-endpoint



Execute single-client test run of BI use case (1 mix performance measurement, randomizer seed: 9834533) no warm-up run

./testdriver -seed 9834533 -uc bsbm/bi –runs 1 –drill –o result_single_bi.xml http://sparql-endpoint Execute multi-client runs with 8 clients of BI use case (randomizer seed 9175932), no warm-up run.

./testdriver -seed 8188326 -uc bsbm/bi -mt 8 -drill -o results_8clients_bi.xml http://sparql-endpoint





6.1 BigOwlim



For the case of 10B triples dataset, we applied the following test procedure to each store:The cases of 50B and 150B triples datasets are performed with Virtuoso7 cluster version only. For these datasets, with BI use case, no specific warm-up was used and the single user run was run immediately following a cold start of the multi-user run.



Owlim homepage

6.1.1 Configuration

The following changes were made to the default configuration of the software:



BigOwlim : Version 5.3.5777

Modified heap size and cache-memory in example.sh



-Xmx200G -Xms160G -Dcache-memory=100G





6.1.2 Load Time

We use the application in getting-started directory for loading the data.



The dataset is first generated into 100 .nt files (~100 million triples/file), and then copy to the getting-started/preload for loading. For Bigowlim, the data generator is also modified so that it writes the first 100 million triples to the first file, then writes the next 100 million triples to the second file and so on. (Note: The original data generator writes triples to 100 files in a round robin style, e.g., first triple go to first file, next triple go to second file, ...., 100th triple go the 100th file, 101th triple go to the first file, and so on).



However, since we had to stop and resume loading process many times for tuning the parameters and solving problems happened during the loading process, it is hard to calculate the loading time.



After the getting-started app had finished loading process, the built database is manually copied to each worker node. With 8 machines in the cluster, we have 8 replications.



6.1.3 Benchmark Query results: QpS (Queries per Second)

Explore use case

BI use case



The table below summarizes the query throughput for each type of query in single-client runs (in QpS):

6.1.4 Benchmark Overall results: QMpH for the 10B datasets for all runs

For the 10B datasets we ran a test with 1, 8 clients. The results are in Query Mixes per Hour (QMpH) meaning that larger numbers are better.

Explore use case



1 8

10B 16.506

257.399



BI use case



1 8

10B 0.044

0.120









6.1.5 Result Summaries



Bigowlim 10B Explore use case: Number of clients Single

8

Download links xml xml





Bigowlim 10B BI use case : Number of clients Single

8

Download links xml xml

6.2 Virtuoso7 cluster





Virtuoso homepage

6.2.1 Configuration

The following changes were made to the default configuration of the software:



Virtuoso7 : Version 07.00.3202-pthreads for Linux as of Jan 1 2013

- Loading of datasets:



The loading was done by executing multiple loading process in all cluster node (call cl_exec (' rdf_ld_srv ()' )).



For all datasets 1000 files were generated (125 files in each node).

This means that multiple files are read at the same time by the multiple cores of each CPU.





- The best performance was obtained with 7 loading threads per server process.

Hence, with two server processes per machine and 8 machines, 112 files were being read at the same time.





6.2.2 Load Time

The table below summarizes the load times Turtle files (in hh:mm:ss) :

10B

50B

150B

01:05:00 06:28:00

*











*: The largest load (150B) was slowed down by one machine showing markedly lower disk write throughput than the others. On the slowest machine iostat showed a continuous disk activity of about 700 device transactions per second, writing anything from 1 to 3 MB of data per second. On the other machines, disks were mostly idle with occasional flushing of database buffers to disk producing up to 2000 device transactions per second and 100MB/s write throughput. Since data is evenly divided and 2 of 16 processes were not runnable because the OS had too much buffered disk writes, this could stop the whole cluster for up to several minutes at a stretch. Our theory is that these problems were being caused by hardware malfunction. To complete the 150B load, we interrupted the stalling server processes, moved the data directories to different drives, and resumed the loading again. The need for manual intervention, and the prior period of very slow progress makes it hard to calculate the total time it took for the 150B load.





6.2.3 Benchmark Query results: QpS (Queries per Second)

We configured the BSBM driver to use 4 sparql endpoints for these query tests, so not all clients connect through the same machine.

The table below summarizes the query throughput for each type of query in single-client runs (in QpS):

Explore use case

BI use case



6.2.4 Benchmark Overall results: QMpH for the 10B, 50B, 150B datasets for all runs

For the 10B we ran a test with 1 and 8 clients. For 50B and 150B datasets, we ran tests with 1 and 4 clients. The results are in Query Mixes per Hour (QMpH) meaning that larger numbers are better.

Explore use case





1 4 8

10B

2360.210

-

4978.511

50B

4253.157

2837.285

-

150B

2090.574

1471.032

-







BI use case





1 4 8 10B 13.078

-

20.554

50B

0.964

1.588

-

150B

0.285

0.480

-



6.2.5 Result Summaries

Virtuoso7 10B - Explore use case: Number of clients Single

8

Download links xml xml

Virtuoso7 10B - BI use case: Number of clients Single

8

Download links xml xml

Virtuoso7 50B - Explore use case: Number of clients Single

4

Download links xml xml

Virtuoso7 50B - BI use case: Number of clients Single

4

Download links xml xml





Virtuoso7 150B - Explore use case: Number of clients Single

4

Download links xml xml

Virtuoso7 150B - BI use case: Number of clients Single

4

Download links xml xml





This section compares the SPARQL query performance of the different stores.

7.1 Query Mixes per Hour for Single Clients



Running 500 query mixes against the different stores resulted in the following performance numbers (in QMpH). The best performance figure for each dataset size is set bold in the tables.



7.1.1 QMpH: Explore use case



The complete query mix is given here.

100M 200M 1B

BigData 12512.278 10059.940 -

BigOwlim 14029.453 9170.083 1669.899 TDB 15381.857 10573.858 -

Virtuoso6 37678.319 32969 . 006 8984.789 Virtuoso7

47178.820

-

27933 . 682

A much more detailed view of the results for the Explore use case is given under Detailed Results For The Explore-Query-Mix Benchmark Run.

7.1.2 QMpH: BI use case



10M 100M

1B

BigData 7.290 -

-

BigOwlim 121.841 15.512

1.400 TDB 7.468 -

-

Virtuoso6 431 . 465 35.342

2.383 Virtuoso7

- 996.795

75 . 236

A much more detailed view of the results for the BI use case is given under Detailed Results For The BI-Query-Mix Benchmark Run.





7.1.2 QMpH: Cluster edition



Explore use case

10B 50B

150B

BigOwlim 16.506 -

-

Virtuoso7

2360 . 210 4253.157 2090.574

BI use case

10B 50B

150B

BigOwlim 0.044 -

-

Virtuoso7

13 . 078 0.964 0.285

7.2 Query Mixes per Hour for Multiple Clients

Explore use case



Dataset Size 100M Number of clients 1 4 8 64 BigData 12512.278

17949.632

19574.007

20422.626

BigOwlim

14029.453 17184.314 11677.860 8321.202 TDB

15381.857 19036.097 24646.705 14838.483 Virtuoso6

37678.319 64885.747 112388.811 20647.413 Virtuoso7

47178.820

91505.200

188632.144

216118.852







Dataset Size 200M Number of clients 1 4 8 64 BigData

10059.940 9762.856 11572.433 12935.595 BigOwlim

9170.083 8130.137 5614.489 5150.768 TDB

10573.858 9540.452 18610.896 8265.151 Virtuoso6

32969 . 006 31387 . 107 77224 . 941 14480 . 812





Dataset Size 1B Number of clients 1 4 8 64 BigOwlim

1669.899 2246.865 1081.508 912.518 Virtuoso6

8984.789 15637.439 14343.728 2800.053 Virtuoso7

27933 . 682 56714 . 875 79261 . 626 132685 . 957





BI use case



Dataset Size 10M Number of clients 1 4 8 64 BigData 7.290 16.222 16.439 18.812 BigOwlim

121.841 265.294 177.338 218.678 TDB

7.468 17.698 9.503 8.414 Virtuoso6

431 . 465 2667 . 657 3915 . 854 1401 . 186





Dataset Size 100M Number of clients 1 4 8 64 BigOwlim

15.512

33.986

20.263

15.076

Virtuoso6

35.342

191.431

268.428

99.321

Virtuoso7

996.795 5644.323 6402.190 7132.212





Dataset Size 1B Number of clients 1 4 8 64 BigOwlim

1.400 3.465 2.323 -

Virtuoso6

2.383 17.777 21.457 8.355 Virtuoso7

75 . 236 348 . 666 361 . 205 134 . 459

Cluster - Explore use case (10B only)



Dataset Size 10B Number of clients

1 8 BigOwlim

16.506 257.399 Virtuoso7

2360 . 210 4978 . 511

Cluster - BI use case (10B only)



Dataset Size 10B Number of clients

1 8 BigOwlim

0.044 0.120 Virtuoso7

13 . 078 20 . 554

7.3 Detailed Results For The Explore-Query-Mix Benchmark Run

The details of running the Explore query mix are given here. There are two different views:

7.3.1 Queries per Second by Query and Dataset Size

Running 500 query mixes against the different stores lead to the following query throughput for each type of query over all 500 runs (in Queries per Second). The best performance figure for each dataset size is set bold in the tables.

BigData BigOwlim TDB Virtuoso6 Virtuoso7 100M 49.955

93.773 119.048 232 . 234 125.786 200M 49.520 52.094 94.877 217 . 865

1B



25.128

87 . 245 75.324

BigData BigOwlim TDB Virtuoso6 Virtuoso7

100M 42.769

115.960 158 . 755 109.445 68.929 200M 43.713 65.158 151 . 883 110.019

1B



34.181

79 . 791 68.820

BigData BigOwlim TDB Virtuoso6 Virtuoso7

100M 37.280

170.242 84.660 180 . 245 117.426 200M 38.355 61.155 70.492 174 . 216

1B

26.042

119 . 104 62.243

BigData BigOwlim TDB Virtuoso6 Virtuoso7

100M 36.846

140 . 607 70.912 116.604 58.514 200M 36.830 127 . 747 52.759 111.732

1B

12.868

42 . 586 30.473

BigData BigOwlim TDB Virtuoso6 Virtuoso7 100M 2.684

1.868 1.959 9 . 976 21 . 182 200M 1.799 1.199 1.308 7 . 168

1B

0.198

1.201 6 . 064

BigData BigOwlim TDB Virtuoso6 Virtuoso7 100M 16.172

75.746 196 . 754 30.001 54.484 200M 16.548 98.357 184.349 32.918

1B

32.593

31.840 55 . 356

BigData BigOwlim TDB Virtuoso6 Virtuoso7 100M 37.498

93.467 228 . 258 117.247 93.336 200M 38.721 193.087 199 . 362 124.502

1B

60.702

127 . 698 97.248

BigData BigOwlim TDB Virtuoso6 Virtuoso7 100M 59.524

202.041 355.999 397 . 456 173.898 200M 61.476 105.759 319.489 363 . 042

1B

38.391

132.459 176 . 772

BigData BigOwlim TDB Virtuoso6 Virtuoso7 100M 41.326

146.327 297 . 619 122.926 107.968 200M 42.427 69.411 267 . 094 123.487

1B

60.357

99.433 101 . 678

BigData BigOwlim TDB Virtuoso6 Virtuoso7 100M 62.375

368.732 483.092 539 . 957 214.133 200M 63.784 74.074 450.045 493 . 583





65.428

500 . 501 225.124

BigData BigOwlim TDB Virtuoso6 Virtuoso7 100M 50.989

244 . 738 204.834 220.167 126.743 200M 52.094 197.239 192.901 215 . 424

1B

61.418

207 . 641 137.287

7.3.2 Queries per Second by Dataset Size and Query

Removed.

Running 500 query mixes against the different stores lead to the following query throughput for each type of query over all 500 runs (in Queries per Second). The best performance figure for each query is set bold in the tables.

100M

200M

1B

7.4 Detailed Results For The BI-Query-Mix Benchmark Run

The details of running the BI query mix are given here. There are two different views:

7.4.1 Queries per Second by Query and Dataset Size

Running 10 query mixes against the different stores lead to the following query throughput for each type of query over all 10 runs (in Queries per Second). The best performance figure for each dataset size is set bold in the tables.

BigData BigOwlim TDB Virtuoso6 Virtuoso7 10M 0.453 1.426 0.488 1 . 469

100M



0.176



0.118

11 . 558 1B



0.016

0.009 0 . 462

BigData BigOwlim TDB Virtuoso6 Virtuoso7 10M 0.445 0.069 0.023 37 . 707

100M

0.009



7.931

28 . 969 1B



0.001

0.635 2 . 409

BigData BigOwlim TDB Virtuoso6 Virtuoso7 10M 0.300 2 . 540 0.018 0.768

100M

0.105



0.090

0 . 886 1B



0.002

0.007 0 . 035

BigData BigOwlim TDB Virtuoso6 Virtuoso7 10M 0.167 0.150 0.140 1 . 183

100M

0.027



0.216

3 . 773 1B



0.003

0.020 0 . 644

BigData BigOwlim TDB Virtuoso6 Virtuoso7 10M 1.992 1.923 0.008 1.920

100M

0.240



0.240

5.496 1B



0.020

0.009 0 . 468

BigData BigOwlim TDB Virtuoso6 Virtuoso7 10M 9.917 23 . 923 16.202 14.988

100M

15.538



10.767

18 . 997 1B

13 . 951

3.726 10.517

BigData BigOwlim TDB Virtuoso6 Virtuoso7 10M 0.006 2.232 0.849 9 . 849

100M

0.369



1.466

14 . 816 1B

0.040

0.122 1 . 912

BigData BigOwlim TDB Virtuoso6 Virtuoso7 10M 0.568 1 . 395 0.018 0.592

100M

0.191



0.048

2.512 1B

0.016

0.003 0 . 215

7.4.2 Queries per Second by Dataset Size and Query

Running 10 query mixes against the different stores lead to the following query throughput for each type of query over all 10 runs (in Queries per Second). The best performance figure for each query is set bold in the tables.

10M





100M





1B





8. Thanks

Mike Personick

Thanks a lot to BSBM authors Chris Bizer and Andreas Schultz for providing instructions and sharing the software/scripts at the very beginning of our benchmark experiment.We want to thank the store vendors and implementors for helping us to setup and configure their stores for the experiment. Lots of thanks to Orri Erling, Ivan Mikhailov, Mitko Iliev, Hugh Williams, Alexei Kaigorodov, Zdravko Tashev, Barry Bishop, Bryan Thompson,

The work on the BSBM Benchmark Version 3 is funded through the LOD2 - Creating Knowledge out of Linked Data project.

Please send comments and feedback about the benchmark to Peter Boncz and Minh-Duc Pham.