The Pascal architecture is continuing to find its way through Nvidia's product line-up, with today marking the introduction of its Tesla P4 and P40 GPUs.

According to Nvidia's specs sheet, the P40 clocks in at 12 teraflops for single precision calculation and 47 trillion int8 operations per second thanks to 24GB GDDR5 memory with 346GBps bandwidth, and 3,840 CUDA cores. The less powerful P4 offers 5.5 teraflops for single precision and 22 trillion int8 operations per second, backed by 2,560 CUDA cores and 8GB GDDR5 memory with bandwidth of 192GBps.

Designed for artificial intelligence and running neural networks, the P40 will be available next month, while the P4 will arrive in November.

The company said the chips offer four times the performance of its M40 and M4 launched last year. Compared to using an Intel Xeon E5-2690v4, which was launched earlier this year, Nvidia claimed its offering is 40 times more power efficient while being 45 times faster to respond.

"A single server with a single Tesla P4 replaces 13 CPU-only servers for video inferencing workloads, delivering over 8x savings in total cost of ownership, including server and power costs," the company boasted.

Alongside its hardware, Nvidia is releasing its TensorRT library and DeepStream SDK. TensorRT takes 16-bit or 32-bit trained neural networks and "optimises" for reduced-precision 8-bit operations, while DeepStream allows for analysing up to 93 HD video streams simultaneously in real time.

"[DeepStream] addresses one of the grand challenges of AI: Understanding video content at-scale for applications such as self-driving cars, interactive robots, filtering, and ad placement," the company said.

Nvidia took the wraps off its Pascal architecture in May when it launched its GTX 1080 and 1070. At the time, Nvidia CEO Jen-Hsun Huang claimed the 1080 had twice the performance and three times the efficiency of its former beefiest card, the Titan X.

Diane Bryant, Intel executive vice president and general manager of its Data Center Group, told ZDNet in June that customers still prefer a single environment.

"Most customers will tell you that a GPU becomes a one-off environment that they need to code and program against, whereas they are running millions of Xeons in their datacentre, and the more they can use single instruction set, single operating system, single operating environment for all of their workloads, the better the performance of lower total cost of operation," she said.

"We already have the in-memory analytics world, we already have the deep scale-out world, we have 90 percent share of the server market, and machine learning [is] the next algorithm to be accelerated. It makes sense to do it on a [familiar] platform.

"Over 90 percent of all cloud service providers' servers are two-socket Xeons."

Nvidia today also announced its Drive PX 2 platform that is set to be deployed by Baidu in its self-driving car. The chip maker said the PX 2 will allow highway-automated driving, HD mapping, and processing input from cameras, lidar, radar, and ultrasonic sensors.