Google Cloud Compute Engine provides a robust assortment of NVIDIA GPU models integrated with different machine types to accelerate artificial intelligence, machine learning, data processing, and graphics-intensive workloads on virtual machines. This documentation details the available GPU models, how they´re attached depending on machine series, and the performance characteristics users can expect across the fleet. Specialized accelerator-optimized machine families such as A4X, A4, A3, A2, and G2 offer pre-attached GPUs, while N1 general-purpose instances allow users to attach various GPU models based on specific workload needs.
The range of available GPUs includes cutting-edge options like the NVIDIA GB200 Grace Blackwell Superchips, B200 Blackwell, H200 and H100 SXM, A100, L4, T4, V100, P4, and P100. Each serves a segment of compute needs, from exascale training of large language models to cost-optimized inference or high-end visualization. For example, A4X machines combine Arm-based Grace CPUs with multiple B200 GPUs for foundation model workloads, while A3 series provides access to H100 and H200 GPUs ideal for demanding training and serving. G2 and N1 series support NVIDIA RTX Virtual Workstations (vWS), making them suitable for high-end 3D visualization and graphics, as well as remote workstations.
Comprehensive comparisons of each GPU model are available, not only in terms of memory, interconnects, and support for features like Virtual Workstation licenses, but also in raw computational performance (showing FP64, FP32, FP16, and INT8 throughput), and advanced tensor core capabilities. Tables delineate memory bandwidth, ideal workload types, and restrictions for each configuration. For dense, performance-optimized supercomputing environments, users can leverage the AI Hypercomputer platform, designed for large-scale artificial intelligence and machine learning tasks with deep integrations for orchestration using tools like Kubernetes and Slurm. Additional guidance is given on regional availability, pricing, storage options, and network bandwidth considerations, ensuring organizations can select the optimal GPU configuration for a wide spectrum of modern workloads.