AWS releases Trainium3 ASIC for Artificial Intelligence workloads

AWS introduced Trainium3, a purpose-built ASIC for Artificial Intelligence workloads, at re:Invent. the chip delivers 2.52 PetaFLOPS of FP8 compute per chip and is available in Amazon EC2 Trn3 UltraServer instances.

aws unveiled Trainium3 during its re:Invent conference in Las Vegas as a new ASIC for internal Artificial Intelligence workloads and select external customers. the chip delivers 2.52 PetaFLOPS of FP8 compute per chip and raises on-chip memory capacity to 144 GB of HBM3E with a memory bandwidth of 4.9 TB/s. Trainium3 supports both dense and expert-parallel model topologies and introduces compact data types, MXFP8 and MXFP4, aimed at improving the balance between memory and compute for real-time, multimodal, and long-context reasoning tasks. the device is manufactured on TSMC’s N3 3 nm node and is now available in Amazon EC2 Trn3 UltraServer instances.

Trn3 UltraServers can scale up to 144 Trainium3 chips in a single server, achieving approximately 362 FP8 PetaFLOPS, and multiple servers can be combined into EC2 UltraClusters 3.0 for larger deployments. a fully equipped UltraServer provides about 20.7 TB of HBM3e memory and around 706 TB/s of aggregate memory bandwidth. the platform also incorporates the NeuronSwitch-v1 fabric, which doubles interchip interconnect bandwidth compared to the previous UltraServer generation, enabling higher throughput across the server footprint.

AWS highlights generational gains versus Trainium2, citing up to 4.4x higher performance, 3.9x greater memory bandwidth, and about 4x better performance per watt. the company also reports improvements in inference and token efficiency across various Amazon services, positioning Trainium3 and Trn3 UltraServers as an internally developed option to reduce reliance on third-party accelerator hardware. overall, the announcement emphasizes larger on-chip memory, new compact numeric formats, and expanded system-level scale as the primary vectors for performance and efficiency gains in Artificial Intelligence workloads.

70

Impact Score

Samsung completes hbm4 development, awaits NVIDIA approval

Samsung says it has cleared Production Readiness Approval for its first sixth-generation hbm (hbm4) and has shipped samples to NVIDIA for evaluation. Initial samples have exceeded NVIDIA’s next-gen GPU requirement of 11 Gbps per pin and hbm4 promises roughly 60% higher bandwidth than hbm3e.

NVIDIA and AWS expand full-stack partnership for Artificial Intelligence compute platform

NVIDIA and AWS expanded integration around Artificial Intelligence infrastructure at AWS re:Invent, announcing support for NVIDIA NVLink Fusion with Trainium4, Graviton and the Nitro System. the move aims to unify NVIDIA scale-up interconnect and MGX rack architecture with AWS custom silicon to speed cloud-scale Artificial Intelligence deployments.

the state of artificial intelligence and DeepSeek strikes again

the download highlights a new MIT Technology Review and Financial Times feature on the uneven economic effects of Artificial Intelligence and a roundup of major technology items, including DeepSeek’s latest model claims and an Amsterdam welfare Artificial Intelligence investigation.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.