Qwen3.5-27B brings multimodal long-context Artificial Intelligence to open-source developers

Qwen3.5-27B is a 27B parameter multimodal language model that combines vision, long-context reasoning and tool use, with detailed guidance for deployment across major inference frameworks.

Qwen3.5-27B is a post-trained 27B parameter causal language model with a vision encoder released in Hugging Face Transformers format, and it is compatible with transformers, vLLM, SGLang, KTransformers and other popular inference stacks. The model adopts a unified vision language foundation with early fusion multimodal training that targets parity with earlier Qwen3 models while outperforming Qwen3-VL variants across reasoning, coding, agentic workflows and visual understanding benchmarks. Its architecture combines Gated Delta Networks and a sparse Mixture-of-Experts design to achieve high-throughput inference with low latency and cost overhead, while scaled reinforcement learning in million-agent environments is used to improve generalization on complex real-world tasks.

The language backbone features a hidden dimension of 5120, 64 layers and a token embedding size of 248320 (padded), with a stacked layout of 16 × (3 × (Gated DeltaNet → FFN) → 1 × (Gated Attention → FFN)). Gated DeltaNet uses 48 linear attention heads for V and 16 for QK with a head dimension of 128, while Gated Attention uses 24 attention heads for Q and 4 for KV with a head dimension of 256, and a rotary position embedding dimension of 64. The feed forward network has an intermediate dimension of 17408 and an LM output size of 248320 (padded), and the model is trained with multi-steps multi-token prediction. The native context length is 262,144 tokens and is extensible up to 1,010,000 tokens using RoPE scaling methods such as YaRN, with explicit configuration examples provided for transformers, vllm, ktransformers and sglang to reach a maximum length of 1,010,000 tokens.

Across benchmarks, Qwen3.5-27B posts competitive scores against larger proprietary and open source systems. On knowledge and instruction tests, MMLU-Pro is reported at 86.1, C-Eval at 90.5, SuperGPQA at 65.6, IFEval at 95.0 and IFBench at 76.5, while long-context evaluations show AA-LCR at 66.1 and LongBench v2 at 60.6. Reasoning and STEM scores include HLE w/ CoT at 24.3, GPQA Diamond at 85.5, HMMT Feb 25 at 92.0 and HMMT Nov 25 at 89.8. Coding performance highlights include SWE-bench Verified at 72.4, Terminal Bench 2 at 41.6, LiveCodeBench v6 at 80.7 and CodeForces at 1899, with additional full stack benchmarks and OJBench results enumerated for both English and Chinese. General agent and search agent metrics cover BFCL-V4 at 68.5, TAU2-Bench at 79.0, VITA-Bench at 41.9, DeepPlanning at 22.6, HLE w/ tool at 48.5, Browsecomp at 61.0 and WideSearch at 61.1, alongside multilingual scores such as MMMLU at 85.9, MMLU-ProX at 82.2 and MAXIFE at 88.0.

The vision and video side is tested on a wide suite of multimodal benchmarks, with Qwen3.5-27B achieving MMMU at 82.3, MathVision at 86.0, Mathvista(mini) at 87.8, DynaMath at 87.7, ZEROBench at 10 and VlmsAreBlind at 96.9. General visual question answering scores include RealWorldQA at 83.7, MMStar at 81.0, MMBenchEN-DEV-v1.1 at 92.6, SimpleVQA at 56.0 and HallusionBench at 70.0, while document understanding tests show OmniDocBench1.5 at 88.9, MMLongBench-Doc at 60.2 and OCRBench at 89.4. Spatial and embodied tasks are covered with metrics such as ERQA at 60.5, CountBench at 97.8, EmbSpatialBench at 84.5 and LingoQA at 82.0, along with multiple 3D and scene datasets, and video understanding results include VideoMME(w sub.) at 87.0, VideoMME(w/o sub.) at 82.8, VideoMMMU at 82.3 and MLVU at 85.9. Tool calling and medical visual question answering are also evaluated, with TIR-Bench at 59.8 / 42.3, V* at 93.7 / 89.0, SLAKE at 80.0, PMC-VQA at 62.4 and MedXpertQA-MM at 62.4.

Qwen3.5-27B is designed to operate in a “thinking mode” by default, where outputs include a <think>…</think> reasoning segment before the final answer, and examples show how to disable this via API parameters for an instruct style. Detailed serving recipes are provided for SGLang, vLLM and Hugging Face Transformers, including commands that stand up OpenAI-compatible endpoints at http://localhost:8000/v1 with tensor parallel size 8 and a context length of 262,144 tokens, as well as configurations for tool calling, multi-token prediction and text-only operation. Recommended sampling settings are specified for several modes, including thinking mode for general tasks with temperature=1.0, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=1.5, repetition_penalty=1.0, and output length suggestions of 32,768 tokens for most queries and 81,920 tokens for math and programming competitions. The model integrates with Qwen-Agent for tool-rich agentic use cases and with Qwen Code for terminal automation, and best-practice guidance covers standardizing prompts, avoiding storing historical thinking content, tuning presence_penalty between 0 and 2, and adjusting video preprocessing parameters, such as setting the longest_edge to 469,762,048 in video_preprocessor_config.json to enable higher frame-rate sampling on long videos.

66

Impact Score

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.