AMD Unveils Instinct MI100 CDNA GPU Accelerator, The World’s Fastest HPC GPU With Highest Double-Precision Horsepower

AMD has officially announced its next-generation CDNA GPU-based Instinct MI100 accelerator which it calls the fastest HPC GPU in the world. The Instinct MI100 is designed to offer world’s fastest double-precision compute capabilities and also deliver insane amounts of GPU performance for AI Deep Learning workloads.

AMD Instinct MI100 32 GB HPC Accelerator Official, The World’s Fastest HPC GPU Based on 1st Gen CDNA Architecture

AMD’s Instinct MI100 will be utilizing the CDNA architecture which is entirely different than the RDNA architecture that gamers will have access to later this month. The CDNA architecture has been designed specifically for the HPC segment and will be pitted against NVIDIA’s Ampere A100 & similar accelerator cards.

ASUS Radeon RX 6900 XT ROG STRIX LC, PowerColor RX 6900 XT Red Devil Limited & Gigabyte RX 6900 XT Gaming OC Custom Cards Unveiled

Some of the key highlights of the AMD Instinct MI100 GPU accelerator include:

All-New AMD CDNA Architecture- Engineered to power AMD GPUs for the exascale era and at the heart of the MI100 accelerator, the AMD CDNA architecture offers exceptional performance and power efficiency
Leading FP64 and FP32 Performance for HPC Workloads – Delivers industry-leading 11.5 TFLOPS peak FP64 performance and 23.1 TFLOPS peak FP32 performance, enabling scientists and researchers across the globe to accelerate discoveries in industries including life sciences, energy, finance, academics, government, defense, and more.
All-New Matrix Core Technology for HPC and AI – Supercharged performance for a full range of single and mixed-precision matrix operations, such as FP32, FP16, bFloat16, Int8, and Int4, engineered to boost the convergence of HPC and AI.
2nd Gen AMD Infinity Fabric Technology – Instinct MI100 provides ~2x the peer-to-peer (P2P) peak I/O bandwidth over PCIe 4.0 with up to 340 GB/s of aggregate bandwidth per card with three AMD Infinity Fabric Links. In a server, MI100 GPUs can be configured with up to two fully-connected quad GPU hives, each providing up to 552 GB/s of P2P I/O bandwidth for fast data sharing.
Ultra-Fast HBM2 Memory– Features 32GB High-bandwidth HBM2 memory at a clock rate of 1.2 GHz and delivers an ultra-high 1.23 TB/s of memory bandwidth to support large data sets and help eliminate bottlenecks in moving data in and out of memory.
Support for Industry’s Latest PCIe Gen 4.0 – Designed with the latest PCIe Gen 4.0 technology support providing up to 64GB/s peak theoretical transport data bandwidth from CPU to GPU.

AMD Instinct MI100 ‘CDNA GPU’ Specifications – 7680 Cores & 32 GB HBM2 VRAM

The AMD Instinct MI100 features the 7nm CDNA GPU which packs a total of 120 Compute Units or 7680 stream processors. Internally referred to as Arcturus, the CDNA GPU powering the instinct MI100 is said to measure at around 720mm2. The GPU has a clock speed configured around 1500 MHz and delivers a peak performance throughput of 11.5 TFLOPs in FP64, 23.1 TFLOPs in FP32, and a massive 185 TFLOPs in FP16 compute workloads. The accelerator will feature a total power draw of 300 Watts.

For memory, AMD will be equipping its Radeon Instinct MI100 HPC accelerator with a total of 32 GB HBM2 memory. The graphics card will be configured in 4 and 8 GPU configurations & communicate through the new Infinity Fabric X16 interconnect which has a rated bandwidth of 276 GB/s. AMD is using HBM2 chip rated to provide an effective bandwidth of 1.23 GB/s while NVIDIA’s A100 features HBM2e memory dies with 1.536 GB/s bandwidth.

AMD To End Production of Radeon RX 6900 XT, RX 6800 XT, RX 6800 Reference Models, Will Focus on Custom AIB Variants [Update]

AMD Radeon Instinct Accelerators 2020

Accelerator Name	AMD Radeon Instinct MI6	AMD Radeon Instinct MI8	AMD Radeon Instinct MI25	AMD Radeon Instinct MI50	AMD Radeon Instinct MI60	AMD Radeon Instinct MI100
GPU Architecture	Polaris 10	Fiji XT	Vega 10	Vega 20	Vega 20	Arcturus
GPU Process Node	14nm FinFET	28nm	14nm FinFET	7nm FinFET	7nm FinFET	7nm FinFET
GPU Cores	2304	4096	4096	3840	4096	7680
GPU Clock Speed	1237 MHz	1000 MHz	1500 MHz	1725 MHz	1800 MHz	~1500 MHz
FP16 Compute	5.7 TFLOPs	8.2 TFLOPs	24.6 TFLOPs	26.5 TFLOPs	29.5 TFLOPs	185 TFLOPs
FP32 Compute	5.7 TFLOPs	8.2 TFLOPs	12.3 TFLOPs	13.3 TFLOPs	14.7 TFLOPs	23.1 TFLOPs
FP64 Compute	384 GFLOPs	512 GFLOPs	768 GFLOPs	6.6 TFLOPs	7.4 TFLOPs	11.5 TFLOPs
VRAM	16 GB GDDR5	4 GB HBM1	16 GB HBM2	16 GB HBM2	32 GB HBM2	32 GB HBM2
Memory Clock	1750 MHz	500 MHz	945 MHz	1000 MHz	1000 MHz	1200 MHz
Memory Bus	256-bit bus	4096-bit bus	2048-bit bus	4096-bit bus	4096-bit bus	4096-bit bus
Memory Bandwidth	224 GB/s	512 GB/s	484 GB/s	1 TB/s	1 TB/s	1.23 TB/s
Form Factor	Single Slot, Full Length	Dual Slot, Half Length	Dual Slot, Full Length	Dual Slot, Full Length	Dual Slot, Full Length	Dual Slot, Full Length
Cooling	Passive Cooling	Passive Cooling	Passive Cooling	Passive Cooling	Passive Cooling	Passive Cooling
TDP	150W	175W	300W	300W	300W	300W

AMD’s Instinct MI100 ‘CDNA GPU’ Performance Numbers, An FP32 & FP64 Compute Powerhouse

In terms of performance, the AMD Instinct MI100 was compared to the NVIDIA Volta V100 and the NVIDIA Ampere A100 GPU accelerators. Comparing the numbers, the Instinct MI100 offers a 19.5% uplift in FP64 and an 18.5% uplift in FP32 performance. In FP16 performance, the NVIDIA A100 has a 69% advantage over the Instinct MI100.

Surprisingly, NVIDIA’s numbers with Sparsity are still higher than what AMD can crunch. It’s 19.5 TFLOPs vs 11.5 TFLOPs in FP64, 156 TFLOPs vs 23.1 TFLOPs in FP32, and 624 TFLOPs vs 185 TFLOPs in FP16.

AMD Instinct MI100 vs NVIDIA’s Ampere A100 HPC Accelerator

In terms of actual workload performance, the AMD Instinct MI100 offers a 2.1x perf/$ ratio in FP64 and FP32 workloads. Once again, these are numbers comparing the non-sparsity performance that the NVIDIA A100 has to offer. AMD also provides some performance statistics in various HPC work- loads such as NAMD, CHOLLA, and GESTS where it is up to 3x/1.4x/2.6x faster than the Volta-based V100 GPU accelerator. Compared to the MI60, the Instinct MI100 offers a 40% improvement in the PIConGPU workload.

According to AMD, its Instinct MI100 GPU accelerator will be available through OEMs and ODMs and integrated with the first systems by the end of 2020. The systems will be packing AMD’s EPYC CPUs and Instinct accelerators. Some partners include HPE, Dell, Supermicro, and Gigabyte who already have servers based on the new Instinct MI100 accelerator ready to ship out. There was no word regarding the price of this particular accelerator.

AMD Unveils Instinct MI100 CDNA GPU Accelerator, The World’s Fastest HPC GPU With Highest Double-Precision Horsepower

AMD Instinct MI100 32 GB HPC Accelerator Official, The World’s Fastest HPC GPU Based on 1st Gen CDNA Architecture

AMD Instinct MI100 ‘CDNA GPU’ Specifications – 7680 Cores & 32 GB HBM2 VRAM

AMD Radeon Instinct Accelerators 2020

AMD’s Instinct MI100 ‘CDNA GPU’ Performance Numbers, An FP32 & FP64 Compute Powerhouse

AMD Radeon RX 6800 XT “Big Navi” GPU Alleged 3DMark Benchmarks Leaked – Faster Than GeForce RTX 3080 at 4K, Slower In Port Royal Ray Tracing

AMD Ryzen 7 5800H 8 Core & 16 Thread Cezanne ‘Zen 3’ High-Performance CPU Shows Up, Early ES Chip With 3.2 GHz Clocks

BitFenix Announces Two New Cases, The Nova Mesh SE and the Nova Mesh SE TG

How to obtain the Wing Piece by completing the Observatory’s Constellation – Immortal Fenyx Rising

How to upgrade your weapons, helmets, and armor in Immortals Fenyx Rising

Doom Eternal review

Mid-October Ghost of Tsushima update delivering New Game+, loadouts, and more items

AMD Instinct MI100 32 GB HPC Accelerator Official, The World’s Fastest HPC GPU Based on 1st Gen CDNA Architecture

AMD Instinct MI100 ‘CDNA GPU’ Specifications – 7680 Cores & 32 GB HBM2 VRAM

AMD Radeon Instinct Accelerators 2020

AMD’s Instinct MI100 ‘CDNA GPU’ Performance Numbers, An FP32 & FP64 Compute Powerhouse

Leave a Reply Cancel reply

More Stories

AMD Radeon RX 6800 XT “Big Navi” GPU Alleged 3DMark Benchmarks Leaked – Faster Than GeForce RTX 3080 at 4K, Slower In Port Royal Ray Tracing

AMD Ryzen 7 5800H 8 Core & 16 Thread Cezanne ‘Zen 3’ High-Performance CPU Shows Up, Early ES Chip With 3.2 GHz Clocks

BitFenix Announces Two New Cases, The Nova Mesh SE and the Nova Mesh SE TG

You may have missed

How to obtain the Wing Piece by completing the Observatory’s Constellation – Immortal Fenyx Rising

How to upgrade your weapons, helmets, and armor in Immortals Fenyx Rising

Doom Eternal review

Mid-October Ghost of Tsushima update delivering New Game+, loadouts, and more items