aws unveiled Trainium3 during its re:Invent conference in Las Vegas as a new ASIC for internal Artificial Intelligence workloads and select external customers. the chip delivers 2.52 PetaFLOPS of FP8 compute per chip and raises on-chip memory capacity to 144 GB of HBM3E with a memory bandwidth of 4.9 TB/s. Trainium3 supports both dense and expert-parallel model topologies and introduces compact data types, MXFP8 and MXFP4, aimed at improving the balance between memory and compute for real-time, multimodal, and long-context reasoning tasks. the device is manufactured on TSMC’s N3 3 nm node and is now available in Amazon EC2 Trn3 UltraServer instances.
Trn3 UltraServers can scale up to 144 Trainium3 chips in a single server, achieving approximately 362 FP8 PetaFLOPS, and multiple servers can be combined into EC2 UltraClusters 3.0 for larger deployments. a fully equipped UltraServer provides about 20.7 TB of HBM3e memory and around 706 TB/s of aggregate memory bandwidth. the platform also incorporates the NeuronSwitch-v1 fabric, which doubles interchip interconnect bandwidth compared to the previous UltraServer generation, enabling higher throughput across the server footprint.
AWS highlights generational gains versus Trainium2, citing up to 4.4x higher performance, 3.9x greater memory bandwidth, and about 4x better performance per watt. the company also reports improvements in inference and token efficiency across various Amazon services, positioning Trainium3 and Trn3 UltraServers as an internally developed option to reduce reliance on third-party accelerator hardware. overall, the announcement emphasizes larger on-chip memory, new compact numeric formats, and expanded system-level scale as the primary vectors for performance and efficiency gains in Artificial Intelligence workloads.
