AI stood at centre stage at Google Cloud Next this week, as Google announced TPU updates, virtual machine instances powered by Nvidia GPUs, enhancements to Google Distributed Cloud and a premium version of GKE.
The traditional ways of designing and building computing infrastructure, Google said in a release, are no longer adequate for the growing demands of workloads like generative AI and large language models (LLMs).
Cloud TPU (tensor processing unit) 5ve, now available in preview, the company said, is the most cost-efficient, versatile and scalable Cloud TPU to date. Google Cloud TPUs are custom AI chips designed for training and inference of large AI models.
TPU 5ve offers integration with Google Kubernetes Engine (GKE), Google’s fully managed Kubernetes service, as well as with Vertex AI and frameworks like Pytorch, JAX and TensorFlow, so that customers can get started with familiar interfaces.
At less than half of the cost of TPU v4, its previous generation AI chip which released in 2021, TPU 5ve delivers higher training and inference performance for more complex AI models, Google asserts.
Further, Google is kicking training job performance up a notch via Multislice technology, available in preview, which allows developers to scale workloads with up to tens of thousands of Cloud TPU v5e or TPU v4 chips. Previously, training jobs using TPUs were limited to a single slice of TPU chips (a slice is a subset of a group of interconnected TPU devices known as a Pod), capping the size of the largest jobs at a maximum slice size of 3,072 chips for TPU v4.
Nvidia’s AI chips also headlined at Next, as Google announced that A3 virtual machine instances, powered by eight Nvidia H100 GPUs, dual 4th Gen Intel Xeon Scalable processors and 2TB of memory, will be generally available next month. These instances were first announced at Google IO in May.
Combining Nvidia’s GPUs with Google Cloud’s infrastructure technologies is a “huge leap forward in supercomputing capabilities, with 3x faster training and 10x greater networking bandwidth compared to the prior generation,” Google said.
Nvidia chief executive officer (CEO) Jensen Huang joined Google Cloud CEO Thomas Kurian during his opening keynote Tuesday to highlight the longstanding partnership between the two companies. He explained that doing frontier work in generative AI and LLMs is breakthrough, cutting edge computer science, and that both companies are working together to re-engineer and re-optimize the software stack.
Google also announced three new data and AI optimizations for Google Distributed Cloud (GDC), including Vertex AI integrations, AlloyDB Omni, and Dataproc Spark.
The Vertex AI integrations, including Prediction, Pipelines and Document Translation API service, will be available in preview in Q2 2024.
AlloyDB Omni is a new managed database engine in preview, touted to be more than 2x faster than standard PostgreSQL for transactional workloads.
Dataproc Spark is Google Cloud’s managed service for running open-source data tools which it said will allow customers to run Spark with a 54 per cent lower TCO (total cost of ownership).
Google also introduced an updated hardware stack for GDC featuring 4th Gen Intel Xeon Scalable Processors and high-performance network fabrics with up to 400 Gbps throughput. Plus, it added new hardware configurations for GDC Edge that are designed to be resilient against inconsistent network connectivity.
Finally, Google launched the enterprise version of GKE, a new premium edition of GKE designed to increase workload velocity across multiple teams and reduce TCO with a a fully integrated and managed solution from Google Cloud.