NVIDIA cuDNN

The NVIDIA CUDA® Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks. cuDNN provides highly tuned implementations for standard routines such as forward and backward convolution, pooling, normalization, and activation layers.

Deep learning researchers and framework developers worldwide rely on cuDNN for high-performance GPU acceleration. It allows them to focus on training neural networks and developing software applications rather than spending time on low-level GPU performance tuning. cuDNN accelerates widely used deep learning frameworks, including Caffe2, Chainer, Keras, MATLAB, MxNet, PyTorch, and TensorFlow. For access to NVIDIA optimized deep learning framework containers that have cuDNN integrated into frameworks, visit NVIDIA GPU CLOUD to learn more and get started.

8x Tesla V100 + cuDNN 7.6 on 20.03 NGC container vs. 8x Tesla A100 + cuDNN 8.0 Preview on Pre-Release NGC container. MaskRCNN, PyTorch TF32 vs FP32, Batch Size: 8. GNMT, PyTorch TF32 vs FP32, Batch Size: 512. WaveGlow, PyTorch TF32 vs FP32, Batch Size: 10. U-Net Medical, TensorFlow FP16 (Mixed) vs FP16, Batch Size: 16. U-Net Industrial, TensorFlow FP16 (Mixed) vs FP16, Batch Size: 24. TacoTron2, PyTorch FP16 (Mixed) vs FP16, Batch Size: 128.

What’s New in cuDNN 8 cuDNN 8 is optimized for A100 GPUs delivering up to 5x higher performance versus V100 GPUs out of the box and includes new optimizations and APIs for applications such as conversational AI and computer vision. It has been redesigned for ease of use, application integration, and offers greater flexibility to developers. cuDNN 8 highlights include: Tuned for peak performance on NVIDIA A100 GPUs including new TensorFloat-32, FP16, and FP32

Redesigned low-level API provides direct access to cuDNN kernels for greater control and performance tuning

Backward compatibility layer maintains support for cuDNN 7.x letting developers manage their transition to the new cuDNN 8 API

New optimizations for computer vision, speech, and language understanding networks

Fuse operators to accelerate convolutional neural networks with a new API cuDNN 8 is now available as six smaller libraries, providing granularity when integrating into applications. Developers can download cuDNN or pull it from framework containers on NGC. Read the latest cuDNN release notes for a detailed list of new features and enhancements.



Key Features

Tensor Core acceleration for all popular convolutions including 2D, 3D, Grouped, Depth-wise separable, and Dilated with NHWC and NCHW inputs and outputs

Optimized kernels for computer vision and speech models including ResNet, ResNext, SSD, MaskRCNN, Unet, VNet, BERT, GPT-2, Tacotron2 and WaveGlow

Supports FP32, FP16, and TF32 floating point formats and INT8, and UINT8 integer formats

Arbitrary dimension ordering, striding, and sub-regions for 4d tensors means easy integration into any neural net implementation

Speed up fused operations on any CNN architecture

cuDNN is supported on Windows and Linux with Ampere, Turing, Volta, Pascal, Maxwell, and Kepler GPU architectures in data center and mobile GPUs.

cuDNN Accelerated Frameworks

Learn More