RISC-V chip biz SiFive says its processors are being used to manage AI workloads to some degree in Google datacenters.
According to SiFive, the processor in question is its Intelligence X280, a multi-core RISC-V design with vector extensions, optimized for AI/ML applications in the datacenter. When combined with the matrix multiplication units (MXU) lifted from Google’s Tensor Processing Units (TPUs), this is claimed to deliver greater flexibility for programming machine-learning workloads.
Essentially, the X280’s general-purpose RV64 cores in the processor run code that manages the device, and feeds machine-learning calculations into Google’s MXUs as required to complete jobs. The X280 also includes its own vector math unit that can handle operations the accelerator units can’t.
SiFive and Google were a little coy, perhaps for commercial reasons, about exactly how this is packaged and used, though it sounds to us as though Google has placed its custom acceleration units in a multi-core X280 system-on-chip, connecting the Google-designed MXU blocks directly to the RISC-V core complex. These chips are used in Google’s datacenters, in “AI compute hosts” according to SiFive, to speed up machine-learning work.
We imagine that if these are used in production, these chips are handling tasks within services. We note that you can’t rent this hardware direct on Google Cloud, which offers AI-optimized virtual machines powered by traditional x86, Arm, TPU, and GPU tech.
The details were disclosed at the AI Hardware Summit in Silicon Valley earlier this month, in a talk by SiFive co-founder and chief architect Krste Asanović and Google TPU Architect Cliff Young, and in a SiFive blog post this week.
According to SiFive, it noticed that following the introduction of the X280, some customers started using it as a companion core alongside an accelerator, in order to handle all the housekeeping and general-purpose processing tasks that the accelerator was not designed to perform.
Many found that a full-featured software stack was needed to manage the accelerator, the chip biz says, and customers realized they could solve this with an X280 core complex next to their large accelerator, the RISC-V CPU cores handling all the maintenance and operations code, performing math operations that the big accelerator cannot, and providing various other functions. Essentially, the X280 can serve as a kind of management node for the accelerator.
To capitalize on this, SiFive worked with customers such as Google to develop what it calls the Vector Coprocessor Interface eXtension (VCIX), which allows customers to tightly link an accelerator directly to the vector register file of the X280, providing increased performance and greater data bandwidth.
According to Asanović, the benefit is that customers can bring their own coprocessor into the RISC-V ecosystem and run a complete software stack and programming environment, with the ability to boot Linux with full virtual memory and cache coherent support, on a chip containing a mix of general-purpose CPU cores and acceleration units.
From Google’s point of view, it wanted to focus on improving its family of TPU technologies, and not waste time crafting its own application processor from scratch, and so pairing these acceleration functions with a ready-made general-purpose processor seemed like the right way to go, according to Young.
VCIX essentially glues the MXUs to the RISC-V cores with low latency, skipping the need to spend many cycles waiting to shuttle data between CPU and acceleration unit via memory, cache, or PCIe. Instead, we’re told, it’s just tens of cycles through vector register access. That also suggests everything – the RISC-V CPU complex and the custom accelerators – are all on the same die, packaged as a system-on-chip.
The application code runs on the general-purpose RISC-V cores, and any work that can be accelerated by the MXU is passed over via the VCIX. According to Young, there are other advantages of this approach as well as efficiency. The programming model is simplified, resulting in a single program with scalar, vector and co-processor instructions interleaved, and allowing a single software toolchain where developers may code in C/C++ or assembler as preferred.
“With SiFive VCIX-based general purpose cores ‘hybridized’ with Google MXUs, you can build a machine that lets you ‘have your cake and eat it too,’ taking full advantage of all the performance of the MXU and the programmability of a general CPU as well as the vector performance of the X280 processor,” Young said.
The ability to make a custom chip like this is likely to remain the domain of the hyperscalers like Google, or those with niche requirements and deep pockets, but it does demonstrate what can be achieved thanks to the flexibility of the open ecosystem RISC-V model.
That flexibility and openness appears to be enough to lure Google – a long-time supporter of RISC-V, with RV cores used in some of its other products – into using the upstart architecture as opposed to shoehorning its custom coprocessors into x86 chips or Arm-licensed designs. ®
PS: Remember when Google was toying with using the POWER CPU architecture in its datacenters?