Recent advances in machine learning have demonstrated that AI outperforms classical software in domains such as computer vision, speech recognition and language translation. The adoption of AI tools will result in ever-increasing demand of computational resources. At the same time, the rising blockchain technologies have created a huge distributed network of computation infrastructure. At Snark AI, we have figured out a way to run crypto-mining and AI applications simultaneously without hurting hash rate much. We want to help AI applications tap into the abundant computational power in the blockchain world.

The Big Little Lie of GPU Utilization

As an AI researcher, I never had enough GPU computing power for training neural networks. In February 2018, I decided to take a loan to buy 12 GPUs, pay back the loan by mining crypto-currencies using the GPUs, and hoping to assemble a Deep Learning workstation at the end of year when the loan was paid off.

Running the commercial GPU mining software Excavator from NiceHash, my room started to heat up very quickly. Nvidia-SMI told me that the GPUs were utilized at 100% and Excavator told me that my GTX 1080 was mining at 500 hashes per second. Everything looked good.

But there’s one caveat — the volatile GPU-utilization rates that Nvidia-SMI reports only show the percentage of time when there is at least one GPU kernel running. Is it possible that the CUDA-cores and other memory resources on GPU are actually under-utilized while the crypto-mining kernels are running?

The little lie told by Nvidia-SMI was quickly unveiled with the help of Nvidia Visual Profiler. It turned out that 62% of the CUDA cores, 25% of CUDA warps, 20% of GPU shared memory and 93% of GPU global memory were not utilized at all when running Equihash mining algorithm on NiceHash Excavator!

From Figure 1, we can see that the compute utilization rate is only 38%. It is also verified in the latency analysis in Figure 2 that for 62% of the profiling samples, the CUDA cores were idle and waiting for the memory read/write to complete.

Figure 1: Compute and Memory Load/Store units utilization of GTX 1080 running Equihash

Figure 2: Distribution of latency stall causes

Equihash is an anti-ASIC IO-intensive algorithm that heavily relies on low-latency memory resources such as registers and shared memory on GPUs. Computations are often stalled by memory access and the CUDA cores have to wait for the memory read/write to complete. To solve this problem, CUDA has great latency hiding mechanism through warp scheduling. Each warp is composed of 32 threads sharing the same instruction flow. When one warp is stalled by IO, the warp scheduler will swap the stalled warp by another active warp to run on the CUDA cores with zero context-switching overhead. With a lot of warps switching in and out, we can keep CUDA cores busy and hide IO latency. As we can see from the Profiler, 25% of the warp resources is under-utilized and can be used to schedule other jobs to the CUDA cores.

CUDA threads in GPUs have to be organized into ‘blocks’, which are independent scheduling units on the GPU. For Equihash, each block has 1024 threads and asks for 32768 Bytes of registers and 42756 Bytes of shared memory. Because of the hardware limit, only two blocks can be run at the same time on each of the Streaming Multiprocessors (SMs) on the GTX 1080. The application is bottlenecked at registers but leaves 20% of shared memory under-utilized.

Besides, each GTX 1080 has 8GB global memory but Equihash mining algorithm only needs 0.5GB GPU global memory, therefore leaving the 93% of the global memory free for in other applications.

Okay, so 62% of the CUDA cores, 25% of CUDA warps, 20% of GPU shared memory and 93% of GPU global memory are not utilized at all for running Equihash mining with NiceHash Excavator. Is it possible that we can use these resources to run deep learning? That question inspired us to start Snark AI.

After some engineering efforts, we were able to run image classification using deep neural network (AlexNet) and crypto-mining simultaneously without hurting the mining hash rate! The test was done on a GTX 1050 GPU and Intel i7–7700HQ 2.8GHz CPU. Right now we are working on optimizing for GTX 1080 which is a more challenging task.

AlexNet on CPU — 7 images/sec

AlexNet on GPU — 208 images/sec

Equihash on GPU — 110 Hashes/sec

AlexNet+Equihash on GPU — 18 images/sec and 110 Hashes/sec

Blockchain GPUs Unchained

Everybody knows that the there is an insane number of GPUs working on crypto-mining today. But how many of them are there exactly? Let’s do a quick estimation. Let’s take the network hashrate of ZCash, a popular coin that can be mined only on GPUs right now, and divide it by the Equihash hashing power of a single GTX 1080.

Hashrate(ZCash) = 650,000,000 H/s

Hashrate(NVIDIA GTX 1080) = 500 H/s Number of GPUs = 1,360,000 (GTX 1080 equivalent)

That comes to more than a million of underutilized GTX 1080’s. Moreover, the profits that those GPU’s make are small. At the time of writing, http://whattomine.com/coins reports that a single NVIDIA 1080 GPU can mine less $0.65 per 24 hours. On AWS, renting a much weaker GPU (Nvidia K80, p2.xlarge instance) for 24 hours would cost around $8, which is more than 10x difference! At Snark AI, we are working on tools that give Deep Learning researchers and product developers access to these under-utilized and cheap computational resources.

Snark Infer

We have recently launched Snark Infer, a light-weight python library for running neural network prediction. With Snark Infer, you can run neural networks without owning GPUs and without any configuration. Here’s how simple it is to run a network:

If you want to try the package, sign up for a free account at https://hub.snark.ai/ and install:

pip3 install snark

snark login

The models that Snark Infer runs are taken from Snark Hub, which is a deep learning model repository. Anybody can share their models by uploading them to Snark Hub. Each model can be flagged as either private or public — private models will only be accessible from the user who uploaded it. Check out what’s already available on Snark Hub through https://hub.snark.ai/explore!

Prior to upload the models need be converted to ONNX (Open Network Exchange format). The following links can help you transform your PyTorch, Caffe2 and Tensorflow models to ONNX. Native Tensorflow and Caffe2 model support is currently under development.

Under the hood, Snark Infer serves as a router between the users and the available mining hardware. When you submit an Infer request, our servers match your task with a GPU. These GPUs come either from partnering mining farms or idle machines in partnering enterprise data centers.

We match your task with the optimal GPU for the job. We consider GPU speed, network bandwidth, latency, and whether or not the machine has the model already cached. When the data to be processed is sensitive, Snark Infer will dispatch the task only to Privacy Shield Verified GPU providers.

Snark Infer makes it simple and cheap to run any service that incorporates neural network computation. No need to manage and scale instances, no need to pay for the time the machines are staying idle, and simple per-task pricing makes it easy to predict the expenses.

Snark Lab

You can also get raw hardware access to the distributed GPU workforce with pre-installed deep learning libraries. To try it out, register on https://lab.snark.ai/ and install:

pip3 install snark

To access a GPU, run:

snark start

Snark start command will create a temporary instance (a Snark Pod). Unlike with Snark Infer, no background mining is running on your Snark Pod. You get full sudo access within your pod, and you can specify the number of GPU’s to attach to the pod through -g flag. You can also specify pod type and pod id:

snark start --pod_id my_pod --pod_type tensorflow -g 2

Default Snark Pod configuration is a pytorch pod with 1 GPU. To stop your pod, run

snark stop

That’s it! If you’d like to learn more, check out our documentation at https://github.com/snarkai/snark-doc

While we’re still in the development stage, we give all new users 10 free GPU hours to play around. Try it out!

Currently your Snark Pods will run on NVIDIA P106 GPUs, which are similar in performance to one GPU on K80. GTX 1080 and 1070 support is coming very soon. We charge 9.5 cents per hour of GPU time. This is by far the best offer on the market — give it a shot and let us know what you think at support@snark.ai!

Jason, Sergiy and Davit

Founders at Snark AI, Inc.