A new set of community recipes outlines how to run various models from the Qwen3.5 family on Jetson Thor. The shared guidance is based on NVIDIA’s latest vllm repository for Thor and is published through a linked GitHub repository intended for model deployment and experimentation.
The largest model reported to run is Qwen3.5-122B-A10B. The resharded NVFP4 version is described as a bit slower because it does not have MTP, but it is also described from experience as consistently better than the INT4 version. The note frames the recipes as practical instructions for users trying to evaluate performance and quality tradeoffs across quantized variants on Thor hardware.
The forum exchange is brief and focused on sharing working configurations rather than benchmarking details or broader documentation. A reply from an NVIDIA forum participant thanks the contributor for sharing the recipes, signaling community interest in reproducible ways to run large language models on Jetson Thor.
The surrounding forum activity shows sustained attention on generative model deployment on Thor, including discussions about Qwen variants, Nemotron, vllm containers, compatibility issues, and performance comparisons. That context positions the Qwen3.5 recipes as part of a larger effort by developers to tune local inference workflows for edge and robotics computing systems built on Jetson Thor.
