Nvidia DGX A100 system 101: Nvidia DGX A100 specifications?
This file helps get started w/ Nvidia DGX A100 system.
i. The NVIDIA DGX A100 is a powerful AI system that combines eight NVIDIA A100 GPUs with 320GB or 640GB of GPU memory, delivering up to 5 petaFLOPS of AI performance¹. The system is designed to handle various AI workloads, such as training, inference, and analytics, on a single platform. Here are some of the main specifications of the DGX A100 system:
- GPUs: The system has eight NVIDIA A100 Tensor Core GPUs, each with 40GB or 80GB of HBM2e memory and 6912 CUDA cores. The GPUs support the Multi-Instance GPU (MIG) technology, which allows each GPU to be partitioned into up to seven independent instances for different workloads. The GPUs also support the third-generation Tensor Cores, which enable up to 20x faster mixed-precision performance for AI applications.
- CPU: The system has a dual AMD Rome 7742 CPU, with 128 cores total, 2.25 GHz (base), and 3.4 GHz (max boost). The CPU supports PCIe Gen4, which enables faster data transfer between the CPU and the GPUs.
- System Memory: The system has 1TB of DDR4-3200 system memory, which provides high bandwidth and low latency for the CPU and the GPUs.
- Networking: The system has eight single-port Mellanox ConnectX-6 VPI 200Gb/s HDR InfiniBand network adapters, which enable high-speed and low-latency communication between the GPUs and other systems. The system also has a dual-port Mellanox ConnectX-6 VPI 10/25/50/100/200Gb/s Ethernet network adapter, which provides flexible connectivity options for the system.
- Storage: The system has two 1.92TB M.2 NVMe drives for the operating system, and four 3.84TB U.2 NVMe drives for the internal storage. The system supports RAID 0, 1, 5, and 10 configurations for the internal storage, which provide different levels of performance and redundancy.
- Software: The system runs on the Ubuntu Linux OS, and comes with the NVIDIA DGX software stack, which includes NVIDIA Base Command, NVIDIA NGC, NVIDIA AI Enterprise, and NVIDIA HPC SDK. The system also supports NVIDIA CUDA, NVIDIA TensorRT, NVIDIA RAPIDS, NVIDIA Triton Inference Server, and other NVIDIA libraries and frameworks for AI development and deployment.
- Power: The system consumes up to 6.5kW of power, and requires a 200-240V AC input. The system has four 2200W power supplies, which provide redundancy and load balancing for the system.
- Dimensions and Weight: The system has a 6U form factor, with a height of 10.4 in (264.0 mm), a width of 19.0 in (482.3 mm), and a length of 35.3 in (897.1 mm). The system weighs 271 lbs (123 kgs).
Now check out:
(1) NVIDIA DGX A100 Datasheet. https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/nvidia-dgx-a100-datasheet.pdf.
(2) DGX A100 : Universal System for AI Infrastructure | NVIDIA. https://www.nvidia.com/en-us/data-center/dgx-a100/.
(3) NVIDIA DGX A100 Datasheet. https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/nvidia-dgx-a100-datasheet.pdf.
(4) NVIDIA DGX A100. https://www.nvidia.com/content/dam/en-zz/Solutions/industries/finance/finance-industry-dgx-a100-us-nvidia-datasheet-web.pdf.
(5) NVIDIA DGX STATION A100. https://www.nvidia.com/content/dam/en-zz/zh_tw/Solutions/Data-Center/dgx-a100/NVIDIA-DGX-Station-A100-Datasheet_2020-Nov.pdf.
Learn more@ https://www.youtube.com/c/ITGuides/search?query=DGX.