Enfabrica Corporation has introduced its Elastic Memory Fabric System, named EMFASYS, marking the first commercial deployment of a memory fabric that integrates high-performance RDMA Ethernet networking with a vast array of parallel ComputeExpressLink (CXL) DDR5 memory channels. This standalone appliance is engineered to enhance compute efficiency for large-scale, memory-bound artificial intelligence inference workloads, and is accessible to any GPU server at low, predictable latency using existing network infrastructure.
Driven by the surging demand for generative, agentic, and reasoning-intensive artificial intelligence workloads—which are outpacing prior large language model deployments by a factor of 10 to 100 in terms of compute requirements—EMFASYS addresses the bottleneck around GPU and High-Bandwidth-Memory resource utilization in modern compute racks. The system achieves this by offloading high-bandwidth memory to commodity DRAM using a sophisticated caching hierarchy, load-balancing token generation across distributed artificial intelligence servers, and minimizing the risk of underutilized, stranded GPU capacity. The result is a system that can elastically scale to meet growing user, agent, and context volumes while delivering efficiency gains at the memory layer.
When deployed with Enfabrica´s proprietary remote memory software stack, EMFASYS can deliver up to 50 percent lower cost per token per user. This enables large artificial intelligence and foundational large language model providers to offer more compelling price-to-performance ratios while accommodating the swelling volume of inference calls within cloud infrastructures. With this launch, Enfabrica establishes a new architectural standard for efficiently scaling memory resources in high-density artificial intelligence applications.