Chinese researchers let LLMs share meaning through internal memory instead of text

A team in China unveiled cache-to-cache, a method that lets language models exchange their internal KV cache rather than text, improving speed and accuracy. The code is open source and targets faster, more scalable Artificial Intelligence systems.

A research team in China introduced cache-to-cache, a communication method that allows large language models to share meaning by transmitting their internal memory rather than final text. The authors argue that text-based exchanges between models create three problems: a bandwidth bottleneck, ambiguity in natural language, and latency from token-by-token generation. By transferring the key value cache directly, models can convey richer intermediate representations and context, avoiding misunderstandings that arise when instructions are phrased only in text.

The KV cache functions as a model’s scratchpad, storing mathematical snapshots of tokens and their relationships. In one example, a programmer model asks a writer model to place content correctly in an HTML structure. With text-only instructions, the writer can misinterpret tags and position elements incorrectly. With cache-to-cache, the writer receives the programmer’s internal representation of the page structure and places content precisely. The system fuses a source model’s cache into a target model via a neural component called Cache Fuser, which includes a projection module to align cache formats, a dynamic weighting mechanism to control how much transferred information is used, and an adaptive gate that selects which layers should be enriched. The researchers also align tokenization and layer mappings so different models can synchronize their internal states.

In benchmarks, cache-to-cache outperformed text-based coordination by 3 to 5 percent and improved accuracy by 8.5 to 10.5 percent over single models, while roughly doubling speed. Tests covered combinations of Qwen2.5, Qwen3, Llama 3.2, and Gemma 3 across sizes from 0.6 billion to 14 billion parameters. Larger source models yielded greater gains. Technical analyses showed higher information density in fused caches, indicating that additional knowledge was successfully transferred, and the approach did not enlarge the cache itself.

The authors emphasize efficiency because only the connection module is trained while the source and target models remain unchanged. They highlight potential uses in privacy-sensitive collaboration between cloud and edge devices, pairing with acceleration methods, and integration into multimodal systems spanning language, images, and actions. The team open sourced its implementation and positions cache-to-cache as a practical alternative to text for building faster, more scalable Artificial Intelligence systems.

64

Impact Score

The missing step between Artificial Intelligence hype and profit

Artificial Intelligence companies have built powerful systems and promised sweeping change, but the path from technical progress to real business value remains unclear. Conflicting studies, weak workplace performance, and poor transparency are leaving a critical gap between hype and evidence.

Samsung workers leaked secrets into ChatGPT

Samsung employees reportedly exposed confidential company information while using ChatGPT for coding help and meeting note generation. The incidents highlight the risk of feeding sensitive data into public Artificial Intelligence tools that retain user inputs.

DeepSeek launches new flagship Artificial Intelligence models

DeepSeek has introduced preview versions of its V4 Flash and V4 Pro models, positioning them as its most powerful open-source Artificial Intelligence platform yet. The release renews competition with OpenAI, Anthropic, and major Chinese rivals while drawing fresh attention to the startup’s technical ambitions and regulatory scrutiny.

OpenAI’s GPT-5.5 sharpens coding but trails Anthropic’s Opus 4.7

OpenAI’s latest model upgrade improves coding, tool use, reasoning and token efficiency as the company pushes deeper into enterprise adoption. Early evaluations suggest stronger security performance, but Anthropic’s Opus 4.7 still leads in some important coding areas.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.