Microsoft has introduced Maia 200, describing it as a breakthrough inference accelerator that is engineered to dramatically improve the economics of Artificial Intelligence token generation. The company positions Maia 200 as an Artificial Intelligence inference powerhouse, aimed at handling large scale model workloads while improving performance per dollar across its cloud and product stack.
The Maia 200 accelerator is built on TSMC’s 3 nm process with native FP8/FP4 tensor cores, a redesigned memory system with 216 GB HBM3e at 7 TB/s and 272 MB of on-chip SRAM, plus data movement engines that keep massive models fed, fast and highly utilized. Microsoft states that this combination makes Maia 200 the most performant, first-party silicon from any hyperscaler, with three times the FP4 performance of the third generation Amazon Trainium, and FP8 performance above Google’s seventh generation TPU. The company also says Maia 200 is the most efficient inference system it has ever deployed, with 30% better performance per dollar than the latest generation hardware in its fleet today.
Maia 200 is part of Microsoft’s heterogenous Artificial Intelligence infrastructure and is intended to serve multiple models, including the latest GPT-5.2 models from OpenAI, bringing performance per dollar advantage to Microsoft Foundry and Microsoft 365 Copilot. The Microsoft Superintelligence team will use Maia 200 for synthetic data generation and reinforcement learning to improve next generation in house models. For synthetic data pipeline use cases, Microsoft says Maia 200’s design helps accelerate the rate at which high quality, domain specific data can be generated and filtered, providing downstream training systems with fresher and more targeted signals.
