BlazingSQL is Now Open Source

All of It.

BlazingSQL, the GPU-accelerated SQL engine of the RAPIDS ecosystem, is now 100% open-source licensed under Apache 2.0!

Check out the code on our Github page.

BlazingSQL is not a database, which is why we changed our original name of BlazingDB to BlazingSQL. It is a SQL engine that processes (almost) any data you want.

Working within RAPIDS has been game-changing. There are now over 100 developers contributing to our community. Most of these developers come from enterprise and their contributions add valuable features to BlazingSQL, like support for more file formats.

As RAPIDS adoption continues to explode, open-sourcing BlazingSQL accelerates our development cycle, gets our product in the hands of more users, and aligns our licensing and messaging with the greater RAPIDS.ai ecosystem.

NVIDIA is building the future data center for data science. RAPIDS is positioned to exploit this new architecture, and so is BlazingSQL, making us the de facto standard of GPU SQL engines for data science, and part of an incredibly well thought out, rapidly maturing, and forward-thinking ecosystem.

“NVIDIA and the RAPIDS ecosystem are delighted that BlazingSQL is open-sourcing their SQL engine built on RAPIDS,” said Josh Patterson, GM of data science at NVIDIA. “By leveraging Apache Arrow on GPUs and integrating with Dask, BlazingSQL will extend open-source functionality, and drive the next wave of interoperability in the accelerated data science ecosystem.”

We went all-in on RAPIDS before it had a name. Now, open-sourcing is the culmination of a strategy by NVIDIA and BlazingSQL.

NVIDIA stepped up to ensure RAPIDS would solve customer problems at scale. BlazingSQL, in addition to contributing heavily to the RAPIDS ecosystem, will focus on the services and support agreements necessary to make RAPIDS + BlazingSQL deployments successful and accessible to all.

Customer Challenges

When we talk about challenges our customers are facing around their analytics pipelines we hear the same complaints over and over; processing data at scale is expensive, slow, and incredibly complex.

Expensive — Customers cluster thousands of servers together for data science at scale. BlazingSQL + RAPIDS requires a small fraction of the infrastructure to run at an equivalent scale.

— Customers cluster thousands of servers together for data science at scale. BlazingSQL + RAPIDS requires a small fraction of the infrastructure to run at an equivalent scale. Slow — Workloads and queries can take hours or days on large data sets. BlazingSQL + RAPIDS provides GPU-accelerated results in seconds, allowing data scientists to quickly iterate over new models.

— Workloads and queries can take hours or days on large data sets. BlazingSQL + RAPIDS provides GPU-accelerated results in seconds, allowing data scientists to quickly iterate over new models. Complex — Workloads are prototyped at small scale and then rebuilt for distributed systems. BlazingSQL + RAPIDS enables users to write code once and dynamically change the scale of distribution with a single line of code.

BlazingSQL addresses these customer concerns not only with an incredibly fast, distributed GPU SQL engine, but also a zealous focus on simplicity.

With a few lines of code, BlazingSQL can query your raw data, wherever it resides and interoperate with your existing analytics stack and RAPIDS.

The Future of Analytics

RAPIDS is the next-generation analytics ecosystem. SQL forms a fundamental pillar of every major analytics ecosystem to date, and BlazingSQL is the SQL standard for RAPIDS.

For this reason, we are fully integrated with the greater RAPIDS team and contribute heavily to cuDF. BlazingSQL is built entirely on top of cuDF and cuIO. New features pushed to these projects directly impact BlazingSQL features and performance, and because BlazingSQL runs on GDFs it is 100% interoperable with all of RAPIDS.

Something we wish to make very clear, if you are a user of RAPIDS, or are considering RAPIDS (which you honestly should), you need to check out BlazingSQL and add it to your stack. BlazingSQL offers RAPIDS users countless benefits, not limited to:

Reducing code complexity — SQL is easy and can replace dozens to hundreds of cuDF function calls with a single statement.

— SQL is easy and can replace dozens to hundreds of cuDF function calls with a single statement. Connect to data lakes — never synch another database, BlazingSQL can query raw files in your cloud/networked filesystem.

— never synch another database, BlazingSQL can query raw files in your cloud/networked filesystem. Make RAPIDS faster — advanced SQL optimizers help the RAPIDS stack run smarter, not just harder.

“Open-sourcing redefines what’s possible, and now partners, like NVIDIA, are contributing code to the BlazingSQL codebase to provide customers with holistic data science solutions.” — Felipe Aramburu CTO

Time to Roll Up Your Sleeves

So if it isn’t abundantly clear, this is an open-source project. The only thing left to do is try BlazingSQL out, work with it, BREAK it (because you will), and maybe even help fix it.

You can get started easily, and on free GPUs, through our BlazingSQL Notebooks. You can also install on any device of your choosing through our Dockerhub container, or if you really want the guts, you can build from the source code here.