The NVIDIA-powered G4 instances that I promised you earlier this year are available now and you can start using them today in nine AWS regions in six sizes! You can use them for machine learning training & inferencing, video transcoding, game streaming, and remote graphics workstations applications.

The instances are equipped with up to four NVIDIA T4 Tensor Core GPUs, each with 320 Turing Tensor cores, 2,560 CUDA cores, and 16 GB of memory. The T4 GPUs are ideal for machine learning inferencing, computer vision, video processing, and real-time speech & natural language processing. The T4 GPUs also offer RT cores for efficient, hardware-powered ray tracing. The NVIDIA Quadro Virtual Workstation (Quadro vWS) is available in AWS Marketplace. It supports real-time ray-traced rendering and can speed creative workflows often found in media & entertainment, architecture, and oil & gas applications.

G4 instances are powered by AWS-custom Second Generation Intel® Xeon® Scalable (Cascade Lake) processors with up to 64 vCPUs, and are built on the AWS Nitro system. Nitro’s local NVMe storage building block provides direct access to up to 1.8 TB of fast, local NVMe storage. Nitro’s network building block delivers high-speed ENA networking. The Intel AVX512-Deep Learning Boost feature extends AVX-512 with a new set of Vector Neural Network Instructions (VNNI for short). These instructions accelerate the low-precision multiply & add operations that reside in the inner loop of many inferencing algorithms.

Here are the instance sizes:

Instance Name

NVIDIA T4 Tensor Core GPUs vCPUs RAM Local Storage EBS Bandwidth Network Bandwidth g4dn.xlarge 1 4 16 GiB 1 x 125 GB Up to 3.5 Gbps Up to 25 Gbps g4dn.2xlarge 1 8 32 GiB 1 x 225 GB Up to 3.5 Gbps Up to 25 Gbps g4dn.4xlarge 1 16 64 GiB 1 x 225 GB Up to 3.5 Gbps Up to 25 Gbps g4dn.8xlarge 1 32 128 GiB 1 x 900 GB 7 Gbps 50 Gbps g4dn.12xlarge 4 48 192 GiB 1 x 900 GB 7 Gbps 50 Gbps g4dn.16xlarge 1 64 256 GiB 1 x 900 GB 7 Gbps 50 Gbps

We are also working on a bare metal instance that will be available in the coming months:

Instance Name

NVIDIA T4 Tensor Core GPUs vCPUs RAM Local Storage EBS Bandwidth Network Bandwidth g4dn.metal 8 96 384 GiB 2 x 900 GB 14 Gbps 100 Gbps

If you want to run graphics workloads on G4 instances, be sure to use the latest version of the NVIDIA AMIs (available in AWS Marketplace) so that you have access to the requisite GRID and Graphics drivers, along with an NVIDIA Quadro Workstation image that contains the latest optimizations and patches. Here’s where you can find them:

NVIDIA Gaming – Windows Server 2016

NVIDIA Gaming – Windows Server 2019

NVIDIA Gaming – Ubuntu 18.04

The newest AWS Deep Learning AMIs include support for G4 instances. The team that produces the AMIs benchmarked a g3.16xlarge instance against a g4dn.12xlarge instance and shared the results with me. Here are some highlights:

MxNet Inference (resnet50v2, forward pass without MMS) – 2.03 times faster.

(resnet50v2, forward pass without MMS) – 2.03 times faster. MxNet Inference (with MMS) – 1.45 times faster.

(with MMS) – 1.45 times faster. MxNet Training (resnet50_v1b, 1 GPU) – 2.19 times faster.

(resnet50_v1b, 1 GPU) – 2.19 times faster. Tensorflow Inference (resnet50v1.5, forward pass) – 2.00 times faster.

(resnet50v1.5, forward pass) – 2.00 times faster. Tensorflow Inference with Tensorflow Service (resnet50v2) – 1.72 times faster.

(resnet50v2) – 1.72 times faster. Tensorflow Training (resnet50_v1.5) – 2.00 times faster.

The benchmarks used FP32 numeric precision; you can expect an even larger boost if you use mixed precision (FP16) or low precision (INT8).

You can launch G4 instances today in the US East (N. Virginia), US East (Ohio), US West (Oregon), US West (N. California), Europe (Frankfurt), Europe (Ireland), Europe (London), Asia Pacific (Seoul), and Asia Pacific (Tokyo) Regions, in Amazon SageMaker, and (as of October 1, 2019) Amazon EKS clusters.

— Jeff;