IT Creations Partners

NVIDIA L40 GPU Accelerator - System Overview

Performance

NVIDIA’s Ada Lovelace architecture offers 18,176 CUDA cores compared to the previous generation NVIDIA A40 with Ampere architecture with up to 10,752 CUDA cores. This GPU is designed to take advantage of the PCIe 4th Generation connection interface with a bi-directional transfer speed of 64GB/s, which is the double the 3rd Gen PCIe throughput. 142 3rd Generation Ray Tracing (RT) Cores deliver 2x to 3x the speed for Lovelace architecture cards compared to previous generations. Data science and AI model training has been improved with 568 4th Generation Tensor Cores providing up to 90.1 TF32 TFLOPS.

Memory

The L40 GPU supports 48GB of GDDR6 memory with Error Correction Code (ECC). It has a memory bandwidth of 864GB/s. Using NVIDIA RTX Virtual Workstation (RTX vWS) virtual GPU (vGPU) software it is possible to allocate memory to multiple users across different teams such as creative, data science, and design.

Cooling and Power

NVIDIA L40 GPU Ports

Maximum power consumption on this card is rated at 300W and uses a 16-pin power connector. The NVIDIA L40 GPU features bi-directional passive cooling and depends on the cooling provided by the host server chassis to maintain operational temperatures.

Summary

Offering impressive performance for simulation, ray-tracing, AI modeling, and virtual production, the NVIDIA L40 GPU Accelerator is a great general-purpose GPU and ideal demanding datacenter workloads. 48GB of GGRD6 memory are available on this card, as well as 142x 3rd generation RT Cores, 18176x CUDA cores and 568x 4th generation Tensor cores.

Check Availability

NVIDIA L40 GPU Accelerator - Specifications

Memory
    GPU Memory: 48GB GDDR6

  • Memory Bandwidth: Up to 864GB/s
Cores
  • NVIDIA Ada Lovelace CUDA Cores: 18,176
  • NVIDIA 3rd Generation RT Cores: 142
  • NVIDIA 4th Generation Tensor Cores: 568
Performance
    FP32 TFLOPS performance:

  • 90.5 TFLOPS
  • TF32 Tensor Core TFLOPS performance:

  • 90.5 TFLOPS
  • TF32 Tensor Core TFLOPS performance with Sparsity:

  • 181 TFLOPS
  • BFLOAT16 Tensor Core TFLOPS performance:

  • 181.05 TFLOPS
  • BFLOAT16 Tensor Core TFLOPS performance with Sparsity:

  • 362 TFLOPS
  • FP16 Tensor Core performance:

  • 181.05 TFLOPS
  • FP16 Tensor Core performance with Sparsity:

  • 362 TFLOPS
  • FP8 Tensor Core performance:

  • 362 TFLOPS
  • FP8 Tensor Core performance with Sparsity:

  • 724 TFLOPS
Card Interface
  • PCIe Gen4 x16
  • 64GB/s bi-directional
Display Connectors
  • 4 x DisplayPort 1.4a
Power Consumption
  • 300W
Form Factor
  • Board Height: 4.4"
  • Board Length: 10.5"
  • Dual Slot

Check Availability

Get a quote for NVIDIA L40 GPU Accelerator

If you know what you want but can't find the exact configuration you're looking for, have one of our knowledgeable sales staff contact you. Give us a list of the components you would like to incorporate into the system, and the quantities, if more than one. We will get back to you immediately with an official quote.

[email protected]