Large sign language models translate 3D American Sign Language

Researchers at Rutgers University and Qualcomm introduce a Large Sign Language Model that translates three-dimensional American Sign Language into English text by combining motion encoding with a large language model.

Researchers Sen Zhang, Xiaoxiao He and colleagues at Rutgers University, together with Chaowei Tan from Qualcomm, present a new framework for translating three-dimensional American Sign Language. The work moves beyond traditional two-dimensional video analysis by directly using 3D skeletal and hand motion data to capture the spatial and gestural complexity of signing. The team calls the approach the Large Sign Language Model, and demonstrates both direct translation to text and an instruction-guided translation mode to control outputs.

Technically, the system encodes continuous 3D motion using a vector quantized variational autoencoder to convert gestures into a discrete token sequence suitable for input to a pretrained large language model. The motion representation relies on recent 3D sign language datasets and SMPL-X human body representations, which model poses with 52 joints and include downsampled gesture features. Multilayer perceptrons project the quantized gesture tokens into the embedding space of the language model so the model can learn correspondences between motion and text.

The training pipeline is staged: first the sign language tokenizer is trained, then modality-alignment pretraining aligns gesture tokens with textual representations, and finally instruction fine-tuning improves the model’s ability to follow prompts and produce accurate English translations. The researchers used the large-scale SignAvatar dataset, which includes multiple sign languages such as American Sign Language and German Sign Language, to pretrain and evaluate the alignment between visual and linguistic modalities.

Results reported by the team indicate that integrating 3D gesture features with a language model improves translation accuracy and robustness compared with approaches restricted to 2D video. The work highlights instruction-guided translation as a flexible mechanism for controlling outputs and suggests future directions of expanding 3D sign datasets and adopting linguistically aware learning strategies to further enhance accessibility for deaf and hard-of-hearing communities.

56

Impact Score

JEDEC outlines LPDDR6 expansion for data centers

JEDEC has previewed planned updates to LPDDR6 aimed at pushing the memory standard beyond mobile devices and into selected data center and accelerated computing use cases. The roadmap includes higher-capacity packaging options, flexible metadata support, 512 GB densities, and a new SOCAMM2 module standard.

Tsmc debuts A13 process technology

Tsmc has introduced its A13 process at its 2026 North America Technology Symposium as a tighter version of A14 aimed at next-generation Artificial Intelligence, high performance computing, and mobile designs. The company positions the node as a more compact and efficient option with backward-compatible design rules for faster migration.

Google unveils eighth-generation tensor processor units

Google introduced its eighth generation of custom tensor processor units with separate designs for training and inference. The new TPU 8t and TPU 8i are aimed at large-scale model training, serving, and agentic workloads.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.