PaTH Attention boosts large language model positional reasoning

Researchers at MIT and the MIT-IBM Watson Artificial Intelligence Lab have introduced PaTH Attention, a new positional encoding method that makes transformers more context-aware and better at tracking state over long sequences. The technique adapts position information based on token content and can be combined with forgetting mechanisms to improve long-context reasoning and efficiency.

Researchers at MIT and the MIT-IBM Watson Artificial Intelligence Lab have developed a new attention and positional encoding technique for large language models called PaTH Attention, which addresses key weaknesses in how transformers track word order, state changes, and long-range dependencies. While standard attention lets a model look back over an input sequence to determine which tokens matter most, it does not inherently understand order, so transformers rely on positional encodings such as rotary position encoding, known as RoPE. RoPE assigns fixed mathematical rotations based solely on relative distance between tokens and does not depend on the content of those tokens, which limits its ability to handle complex, evolving structures in language, code, or conditional instructions.

PaTH Attention makes positional information adaptive and context-aware by treating the sequence between two words as a path composed of many small, data-dependent transformations. Each transformation is based on a Householder reflection, described as a tiny mirror that adjusts according to the content of each token it passes, so that every step in the sequence can influence how later information is interpreted. The cumulative effect allows the model to track how entities and relationships evolve along the path between words, giving transformers a form of positional memory rather than just a notion of distance. To make this practical at scale, the team also designed a hardware-efficient algorithm that compresses the cumulative PaTH transformation and breaks it into smaller computations compatible with fast processing on GPUs, preserving efficiency while increasing expressivity.

The MIT-IBM team evaluated PaTH Attention on synthetic and real-world benchmarks, including reasoning tasks, long-context evaluations, and full large language model training, to test whether it improves tracking of information over time. They examined how well the method handled tasks such as following the most recent write command amid many distracting steps and multi-step recall problems that are challenging for fixed schemes like RoPE, and they trained mid-size large language models to compare against alternative encodings. PaTH Attention improved perplexity and outperformed other methods on reasoning benchmarks it was not explicitly trained on, and it showed strong content-awareness on retrieval, reasoning, and stability tests with inputs containing tens of thousands of tokens. The researchers then combined PaTH Attention with the Forgetting Transformer, or FoX, to create PaTH-FoX, which selectively down-weights less relevant information in a data-dependent way, yielding strong performance across reasoning, long-context understanding, and language modeling tasks while maintaining transformer scalability. Senior author Yoon Kim situates this work within a broader push for new general-purpose building blocks in Artificial Intelligence architectures that enhance accuracy, expressivity, flexibility, and hardware scalability, and suggests that data-dependent positional encodings like PaTH could be especially impactful in structured domains such as biology.

55

Impact Score

China reportedly tests domestically built euv lithography prototype

China has reportedly built and begun testing a domestically developed euv lithography prototype assembled from second-hand components and reverse-engineered designs. Huawei is leading a broader effort to create a fully domestic artificial intelligence semiconductor supply chain spanning chip design to advanced manufacturing tools.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.