Meta details MTIA roadmap for high performance inference

Meta is rolling out four generations of its Meta Training and Inference Accelerator designed with Broadcom, prioritizing memory bandwidth, inference efficiency, and seamless deployment alongside GPUs in its massive data centers.

Meta is introducing four generations of its in-house Meta Training and Inference Accelerator developed with Broadcom, with MTIA 300, 400, 450, and 500 scheduled to be integrated into its data centers over the next two years. Early MTIA units are already handling ranking and recommendation workloads, while later designs are aimed at real-time model serving across some of the largest social platforms on the web. The roadmap is explicitly inference first, reflecting a focus on making social media browsing and recommendation algorithms effectively instant.

Instead of chasing raw peak arithmetic alone, Meta is prioritizing memory throughput and inference efficiency to reduce latency and energy use at scale. According to the specification table, HBM bandwidth and capacity rises substantially across the series while compute grows more linearly, and this means that Meta’s point is increasing on-package bandwidth and capacity which can cut latency and power costs for production inference. The accelerators incorporate hardware support for attention primitives and mixture-of-experts layers, along with low precision data formats tuned for inference to minimize conversion overhead in modern neural networks.

Software compatibility and operational flexibility are central to the design. Meta says the MTIA software stack runs natively on common machine learning frameworks, so existing production models can be deployed on both GPUs and MTIA without major rewrites, simplifying adoption in live services. Multiple MTIA generations are engineered to share the same chassis, rack, and networking, which allows upgrades through module swaps rather than full data center retrofits and helps explain a fast release cadence across an infrastructure that spans millions of chips. MTIA chips are already running at kilowatt power budgets and PetaFLOPS of compute, positioning the accelerators to compete directly with leading solutions from NVIDIA, AMD, and other hyperscale providers.

68

Impact Score

Samsung to deploy 2 nm process for HBM4E base die

Samsung plans to manufacture the base die of its next generation HBM4E memory on a 2 nm process, aiming to boost performance and efficiency while tightening its grip on the high bandwidth memory market.

Most EU businesses rely on US cloud, exposing data to foreign surveillance

More than 80% of EU businesses rely on US-based cloud and analytics services, exposing customer data to American surveillance laws and intensifying compliance risks under GDPR and the EU Artificial Intelligence Act. Italian startup Regolo pitches a fully European, zero data retention platform as a way to keep Artificial Intelligence workloads compliant and sovereign.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.