NVIDIA Grace and Hopper superchips are poised to become powerhouses of AI and HPC

Data center AI powerhouse NVIDIA offered more details this week at the Hot Chips 2022 Symposium, regarding its upcoming Grace processor architecture, as well as its Grace Superchip implementation for HPC workloads. (High Performance Compute) and cloud, and its Grace Hopper integration of Grace Silicium CPU and Hopper GPU for large-scale AI training. These new NVIDIA multi-chip module products mark the company’s first foray into core data center and compute-intensive CPU technologies, combined with its ubiquitous GPU technologies. It is also a milestone for Arm-based architectures in the data center.

NVIDIA’s Superchip Anatomy – Enter Grace

NVIDIA’s Grace processor architecture is based on Arm Neoverse cores and the ArmV9 instruction set. Although Grace’s architecture has been heavily optimized for scalability, shared memory, and cache coherency, this Arm architecture has gained popularity in the industry with major chip players like NVIDIA.

A pedestal Grace Core Processor complex is composed of a 72-core implementation with 117 MB of distributed cache and can scale up to 144 cores (234 MB cache) in a single Grace Superchip implementation – or 72 cores with a Hopper GPU in a Grace Hopper implementation – and they are all related to what NVIDIA calls Scalable Coherency Fabric or SFC. SCF offers a mesh fabric architecture and distributed cache design with up to 32 TB/s of bandwidth.

From there, further scalability is achieved via NVIDIA’s high-speed NVLink-C2C chip-to-chip interconnect that provides 900 GB/s of bi-directional bandwidth between chips in a single package (Grace Superchip or Grace Hopper) and a additional scalability up to four socket designs, with SFC support for consistency across sockets. It also enables an expanded GPU memory pool in Grace Hopper implementations, allowing Hopper to directly access Grace-attached LPDDR5X memory through the NVLink network.

All told, not only does NVIDIA’s Grace use state-of-the-art Arm-core architecture for modern data center, cloud, AI, and HPC workloads, it has very robust plumbing for a maximum inter-chip communication and scalability.

NVIDIA Grace Hopper – A Leading Namesake for AI

While Grace Superchip is optimized for more general-purpose HPC and cloud workloads, NVIDIA’s Grace Hopper Combo Superchip combines a 72-core Arm processor core complex with a NVIDIA Hopper GPUs on a single module via NVLink-C2C. Named after US Navy Admiral and computer programming pioneer Grace Hopper, the chip is optimized for more intensive HPC and AI training workloads, with shared memory and consistency throughout.

In fact, as I briefly noted earlier, a GPU Hopper can access the memory of a neighboring Grace processor, via NVLink-C2C as well. This saves costs compared to adding expensive HBM2e memory to the GPU Hopper module and significantly increases memory capacity, while still being more power efficient, for workloads that don’t necessarily need of the massive bandwidth provided by the HBM2s.

Expectations, market impact and key takeaways

NVIDIA is making bold claims in terms of the performance of Grace and Grace Hopper Superchip, and as the proverbial 800-pound gorilla in accelerating data centers and cloud AI, the company has already staked a head of major bridge from which to launch these full-platform server solutions.

Coming in the first half of 2023, NVIDIA’s Grace Superchips will be part of a building inertia against the highly scalable ArmV9 architecture in the data center. Other major players, like Ampere Computing, which already has a 128-core “cloud-native” processor on the market, are also ushering in the Arm ecosystem, and this will give Intel’s and AMD’s legacy X86 architectures a race for their money this coming year.

With major OEMs like HPE, Google Cloud and Microsoft Azure announces commitments for Arm-based solutions, the TAM expansion opportunity for NVIDIA is excellent in the new year.

Comments are closed.