Meta is introducing four generations of its in-house Meta Training and Inference Accelerator developed with Broadcom, with MTIA 300, 400, 450, and 500 scheduled to be integrated into its data centers over the next two years. Early MTIA units are already handling ranking and recommendation workloads, while later designs are aimed at real-time model serving across some of the largest social platforms on the web. The roadmap is explicitly inference first, reflecting a focus on making social media browsing and recommendation algorithms effectively instant.
Instead of chasing raw peak arithmetic alone, Meta is prioritizing memory throughput and inference efficiency to reduce latency and energy use at scale. According to the specification table, HBM bandwidth and capacity rises substantially across the series while compute grows more linearly, and this means that Meta’s point is increasing on-package bandwidth and capacity which can cut latency and power costs for production inference. The accelerators incorporate hardware support for attention primitives and mixture-of-experts layers, along with low precision data formats tuned for inference to minimize conversion overhead in modern neural networks.
Software compatibility and operational flexibility are central to the design. Meta says the MTIA software stack runs natively on common machine learning frameworks, so existing production models can be deployed on both GPUs and MTIA without major rewrites, simplifying adoption in live services. Multiple MTIA generations are engineered to share the same chassis, rack, and networking, which allows upgrades through module swaps rather than full data center retrofits and helps explain a fast release cadence across an infrastructure that spans millions of chips. MTIA chips are already running at kilowatt power budgets and PetaFLOPS of compute, positioning the accelerators to compete directly with leading solutions from NVIDIA, AMD, and other hyperscale providers.
