Microsoft unveils Maia 200 artificial intelligence inference accelerator

Microsoft has introduced Maia 200, a custom artificial intelligence inference accelerator built on a 3 nm process and designed to improve the economics of token generation for large models, including GPT-5.2. The chip targets higher performance per dollar for services like Microsoft Foundry and Microsoft 365 Copilot while supporting synthetic data pipelines for next generation models.

Microsoft has introduced Maia 200, describing it as a breakthrough inference accelerator that is engineered to dramatically improve the economics of Artificial Intelligence token generation. The company positions Maia 200 as an Artificial Intelligence inference powerhouse, aimed at handling large scale model workloads while improving performance per dollar across its cloud and product stack.

The Maia 200 accelerator is built on TSMC’s 3 nm process with native FP8/FP4 tensor cores, a redesigned memory system with 216 GB HBM3e at 7 TB/s and 272 MB of on-chip SRAM, plus data movement engines that keep massive models fed, fast and highly utilized. Microsoft states that this combination makes Maia 200 the most performant, first-party silicon from any hyperscaler, with three times the FP4 performance of the third generation Amazon Trainium, and FP8 performance above Google’s seventh generation TPU. The company also says Maia 200 is the most efficient inference system it has ever deployed, with 30% better performance per dollar than the latest generation hardware in its fleet today.

Maia 200 is part of Microsoft’s heterogenous Artificial Intelligence infrastructure and is intended to serve multiple models, including the latest GPT-5.2 models from OpenAI, bringing performance per dollar advantage to Microsoft Foundry and Microsoft 365 Copilot. The Microsoft Superintelligence team will use Maia 200 for synthetic data generation and reinforcement learning to improve next generation in house models. For synthetic data pipeline use cases, Microsoft says Maia 200’s design helps accelerate the rate at which high quality, domain specific data can be generated and filtered, providing downstream training systems with fresher and more targeted signals.

68

Impact Score

AMD and Rackspace plan dedicated AI compute rollout

AMD and Rackspace have finalized a phased deployment for dedicated AMD-based compute across Rackspace data centers. The capacity is aimed at regulated enterprise workloads, including clinical AI and large-scale inference.

Lexar tests SSD offloading for local AI models

Lexar is developing an AI-focused SSD approach designed to cut DRAM demand when running large language models on consumer PCs. Internal tests show the company’s storage offloading can load models that traditional local frameworks struggle to run with limited memory.

NVIDIA Blackwell leads MLPerf Training 6.0

NVIDIA’s latest MLPerf Training 6.0 results put Blackwell across every benchmark in the suite, including new MoE workloads. Partner systems from Microsoft Azure and CoreWeave highlighted large-cluster runs on Llama 3.1 405B and DeepSeek-V3 671B.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.