NVIDIA AI Enterprise 是一款端到端的安全云原生 AI 软件套件,可以在提高运营效率的同时解决新挑战。它可以加速数据科学流程,简化预测性 AI 模型的开发和部署,从而实现基本流程的自动化,并快速从数据中获得见解。借助丰富的全栈软件库(包括 AI 解决方案工作流、框架、预训练模型和基础设施优化),将缔造无限的可能性。
基于Ampere架构的A100系列计算卡在过去近三年里,被众多高性能计算集群(HPC)所采用,英伟达在去年推出了新一代基于Hopper架构的H100系列计算卡,进一步提高了算力,这些GPU大量用于人工智能和深度学习任务。
由于众所周知的原因,英伟达为了绕开去年施加的相关出口限制,推出了A800系列计算卡,专供中国市场使用。与原有的A100系列计算卡相比,A800系列的规格基本相同,比较大的区别在于NVLink互连总线的连接速率,前者为600 GB/s,后者限制在了400 GB/s。
据相关媒体报道,英伟达今年采用了相同的方法,将普通H100 PCIe型号的互连速率减掉大概一半左右,推出了H800系列计算卡,以供中国市场。相比于正常的H100产品,由于被限制导致输出减慢,在某些大型模型训练里的延迟会增加,降低了工作负荷。
有媒体联系英伟达了解情况,询问H100和H800之间的区别,不过英伟达没有给出正面回应,解释其中的差别,仅表示H800系列计算卡完全符合出口管制法规。
完整的GH100芯片配置了8组GPC、72组TPC、144组SM、共18432个FP32 CUDA核心。其采用了第四代Tensor Core,共576个,并配有60MB的L2缓存。有不过实际产品中没有全部打开,其中SXM5版本中启用了132组SM,共16896个FP32 CUDA核心,528个Tensor Core以及50MB的L2缓存,而PCIe 5.0版本则启用了114组SM,FP32 CUDA核心数量只有14592个。此外,前者的TDP达到了700W,后者则为350W。
此外,H100支持英伟达第四代NVLink接口,可提供高达900 GB/s的带宽。同时H100是第一款支持PCIe 5.0标准的GPU,也是第一款采用HBM3的GPU,最多支持六颗HBM3,带宽为3TB/s,是A100采用HBM2E的1.5倍,默认显存容量为80GB。

A100-PCIE
Access best-in-class AI software for your NVIDIA H800 Tensor Core GPU on a major server. The NVIDIA AI Enterprise software suite is the operating system for NVIDIA’s AI platform and is essential for production-ready applications built using NVIDIA’s extensive library of frameworks such as voice AI, recommenders, customer service chatbots, and more.
The NVIDIA H800 PCIe GPU includes NVIDIA AI Enterprise software and support.
NVIDIA AI Enterprise is an end-to-end secure cloud-native AI software suite that solves new challenges while improving operational efficiency. It can accelerate data science processes and simplify the development and deployment of predictive AI models to automate basic processes and quickly gain insights from data. With a rich full-stack software library, including AI solution workflows, frameworks, pre-trained models, and infrastructure optimization, the possibilities are endless.
The Ampere A100 series cards have been used by many high-performance computing clusters (HPCS) for nearly three years, and Nvidia last year introduced a new generation of H100 series cards based on the Hopper architecture to further improve computing power, and these Gpus are heavily used for artificial intelligence and deep learning tasks.
For well-known reasons, in order to circumvent the relevant export restrictions imposed last year, Nvidia launched the A800 series computing card for use in the Chinese market. Compared with the original A100 series compute card, the A800 series specifications are basically the same, the big difference is the NVLink interconnect bus connection rate, the former is 600 GB/s, the latter is limited to 400 GB/s.
According to relevant media reports, Nvidia adopted the same method this year, reducing the interconnection rate of the ordinary H100 PCIe model by about half, and launched the H800 series computing card for the Chinese market. Compared to normal H100 products, the output is slowed down due to restrictions, and delays are increased in some large model training, reducing the workload.
The media contacted Nvidia to ask about the difference between the H100 and H800, but Nvidia did not give a positive response to explain the difference, only to say that the H800 series of computing cards fully comply with export control regulations.
The complete GH100 chip is configured with 8 groups of GPC, 72 groups of TPC, 144 groups of SM, and a total of 18,432 FP32 CUDA cores. It uses the fourth-generation Tensor Core, a total of 576 units, and has 60MB of L2 cache. However, not all of them are enabled in the actual product, including 132 SM enabled in the SXM5 version, a total of 16,896 FP32 CUDA cores, 528 Tensor cores and 50MB L2 cache, and 114 SM enabled in the PCIe 5.0 version. There are only 14,592 CUDA cores in FP32. In addition, the TDP of the former reaches 700W and that of the latter 350W.
In addition, the H100 supports Nvidia’s fourth-generation NVLink interface, which provides up to 900 GB/s of bandwidth. At the same time, the H100 is the first GPU to support the PCIe 5.0 standard, and the first GPU to use HBM3, supporting up to six HBM3, the bandwidth is 3TB/s, which is 1.5 times that of the A100 using HBM2E, and the default video memory capacity is 80GB.