Cloud-based AI training got a little more diverse this week after Amazon Web Services (AWS) and Google Cloud announced the general availability of their latest custom AI accelerators.
Kicking things off with Amazon, the cloud provider’s Trainium chips are now generally available on AWS. First previewed at AWS re:Invent last year, Amazon’s Trainium-powered Trn1n instances are designed to train large machine-learning models, such as those used in natural language processing and image recognition.
Amazon claims the instances are between 40 percent and 250 percent higher performance in BF16 and 32-bit TensorFlow workloads than its Nvidia A100-powered P4d instances, according to Amazon’s internal benchmarks. The accelerator also supports FP32, FP16, UINT8, and a configurable FP8 datatype. FP8 has become popular in the AI world in recent years as a means to trade accuracy for raw performance.
The instances are available in two sizes: Amazon’s trn1.2xlarge pairs eight vCPUs with a single Trainium chip, 64GB of memory divided evenly between the CPU and accelerator, 12.5Gbit/sec networking, and 500GB of local SSD storage. Meanwhile, for larger workloads, the trn1.32xlarge is 16-times larger, packing 128 vCPUs, 16 Trainium chips, 1TB of combined memory, and 800Gbit/sec of network bandwidth per instance.
For large-scale model training, multiple trn1.32xlarge instances can be clustered using Amazon’s FSx Lustre storage service and “petabit-class” non-blocking top-of-rack switches.
The accelerator uses the same Neuron SDK as Amazon’s previously announced Inferentia inferencing chip, which comes with a compiler, framework extensions, a runtime library, and developer tools. Put together, Amazon claims workloads written in popular ML frameworks, such as PyTorch and TensorFlow, can be adapted to run on Trainium with minimal refactoring.
The Trn1n instances are available this week in Amazon’s US East and US West regions.
Google’s TPU v4 now generally available
Google also unveiled a bundle of hardware updates at its Cloud Next event this week, including the general availability of its fourth-gen Tensor Processing Units (TPU).
Google Cloud’s TPU v4-powered virtual machines are available in configurations ranging from four chips — a single TPU module — to a pod packed with up to 4,096 chips all connected over a high-speed fabric.
For those who aren’t familiar, Google’s TPU accelerators were specifically designed to speed up in hardware large machine-learning models, such as those used in natural language processing, recommender systems, and computer vision.
At a high level, the accelerator is essentially a bunch of big bfloat matrix math engines called MXUs, supported by some high-bandwidth memory and a few CPU cores to make it programmable; the CPU cores are instructed to feed a workload’s AI math operations into the MXUs for high-speed processing. Each TPU VM consists of four chips, each with two processing cores, and a total of 128GB of memory.
For a full breakdown of Google’s latest TPU architecture, we recommend checking our sister site The Next Platform.
The custom accelerators were designed to speed up Google’s own AI workloads, but were later opened up to customers on GCP. As you’d expect, TPUs support a variety of popular ML frameworks including JAX, PyTorch and TensorFlow. And according to Google, the TPU v4 is more than twice as fast as its predecessor, while also delivering 40 percent higher performance per-dollar.
TPU v4 Pod slices are available now in GCP’s Oklahoma region, at a rate of between $0.97 and $3.22 per chip, per hour. For Google’s smallest instance, that works out to $5,924 a month with a one-year commitment.
Google offers a peek at Intel’s next-gen CPUs, smartNICs
Intel’s Sapphire Rapids CPUs and Mount Evans IPUs also made an appearance in Google Cloud as a private preview this week.
Select customers can now give Intel’s long-delayed Sapphire Rapids CPUs a spin, however, today’s announcement offers few hints as to what we can expect from the microprocessors. Instead, the biz played up the Mount Evans IPUs it co-developed with Intel.
“A first of its kind in any public cloud, C3 VMs will run workloads on 4th Gen Intel Xeon Scalable processors while they free up programmable packet processing to the IPUs securely at line rates of 200Gbit/sec,” Nick McKeown, who leads Intel network and edge group, said in a statement.
Announced at Intel’s Architecture Day last year, Mount Evans — now rebranded as the E2000 — is Intel’s first IPU ASIC. IPU being an Infrastructure Processing Unit, basically another hardware accelerator for networking and storage tasks.
The smartNIC-class chip will be used to speed up Google’s cloud infrastructure workloads. One of the first will be storage. The cloud provider claims its IPU-boosted C3 instances offer 10x higher IOPS and 4x the throughput of its outgoing C2 instances, when using its recently announced Hyperdisk service.
IPUs, data processing units, and SmartNICs are hardly a new phenomenon in the cloud world. Amazon, Microsoft Azure, and Alibaba Cloud are also using SmartNICs to offload infrastructure tasks, like networking, storage, and security from the host, freeing up CPU cycles for use by tenant workloads in the process.
Intel’s Sapphire Rapids still stuck in the cloud
Despite teasing the C3 instances as the “first VM in the public cloud” powered by Sapphire Rapids, “public” is probably the wrong word here. Google’s C3 instances remain limited to select customers by application, presumably under a strict NDA.
As of this week, Intel has yet to announce a launch date for its Sapphire Rapids processor family, which is already more than a year behind schedule. However, with the launch of AMD’s fourth-gen Epyc processors slated for this fall, Intel appears more eager than ever to get its next-gen datacenter chips in some customers’ hands — at least virtually.
Google is only the latest Intel partner to make Sapphire Rapids-based resources available to customers in some capacity. While Google is offering cloud VMs, Supermicro and Intel are each offering remote access to bare-metal systems to provide customers an opportunity to explore the new capabilities enabled by the chips.
Intel has begun shipping Sapphire-Rapids-powered fourth-gen Xeon Scalable processors to some OEMs, cloud pals, and government agencies. However, it’s unclear how many chips the x86 titan has managed to get out to customers. ®