Qwen3.5-27B brings multimodal long-context Artificial Intelligence to open-source developers

March 5, 2026

Qwen3.5-27B is a 27B parameter multimodal language model that combines vision, long-context reasoning and tool use, with detailed guidance for deployment across major inference frameworks.

Qwen3.5-27B is a post-trained 27B parameter causal language model with a vision encoder released in Hugging Face Transformers format, and it is compatible with transformers, vLLM, SGLang, KTransformers and other popular inference stacks. The model adopts a unified vision language foundation with early fusion multimodal training that targets parity with earlier Qwen3 models while outperforming Qwen3-VL variants across reasoning, coding, agentic workflows and visual understanding benchmarks. Its architecture combines Gated Delta Networks and a sparse Mixture-of-Experts design to achieve high-throughput inference with low latency and cost overhead, while scaled reinforcement learning in million-agent environments is used to improve generalization on complex real-world tasks.

The language backbone features a hidden dimension of 5120, 64 layers and a token embedding size of 248320 (padded), with a stacked layout of 16 × (3 × (Gated DeltaNet → FFN) → 1 × (Gated Attention → FFN)). Gated DeltaNet uses 48 linear attention heads for V and 16 for QK with a head dimension of 128, while Gated Attention uses 24 attention heads for Q and 4 for KV with a head dimension of 256, and a rotary position embedding dimension of 64. The feed forward network has an intermediate dimension of 17408 and an LM output size of 248320 (padded), and the model is trained with multi-steps multi-token prediction. The native context length is 262,144 tokens and is extensible up to 1,010,000 tokens using RoPE scaling methods such as YaRN, with explicit configuration examples provided for transformers, vllm, ktransformers and sglang to reach a maximum length of 1,010,000 tokens.

Across benchmarks, Qwen3.5-27B posts competitive scores against larger proprietary and open source systems. On knowledge and instruction tests, MMLU-Pro is reported at 86.1, C-Eval at 90.5, SuperGPQA at 65.6, IFEval at 95.0 and IFBench at 76.5, while long-context evaluations show AA-LCR at 66.1 and LongBench v2 at 60.6. Reasoning and STEM scores include HLE w/ CoT at 24.3, GPQA Diamond at 85.5, HMMT Feb 25 at 92.0 and HMMT Nov 25 at 89.8. Coding performance highlights include SWE-bench Verified at 72.4, Terminal Bench 2 at 41.6, LiveCodeBench v6 at 80.7 and CodeForces at 1899, with additional full stack benchmarks and OJBench results enumerated for both English and Chinese. General agent and search agent metrics cover BFCL-V4 at 68.5, TAU2-Bench at 79.0, VITA-Bench at 41.9, DeepPlanning at 22.6, HLE w/ tool at 48.5, Browsecomp at 61.0 and WideSearch at 61.1, alongside multilingual scores such as MMMLU at 85.9, MMLU-ProX at 82.2 and MAXIFE at 88.0.

The vision and video side is tested on a wide suite of multimodal benchmarks, with Qwen3.5-27B achieving MMMU at 82.3, MathVision at 86.0, Mathvista(mini) at 87.8, DynaMath at 87.7, ZEROBench at 10 and VlmsAreBlind at 96.9. General visual question answering scores include RealWorldQA at 83.7, MMStar at 81.0, MMBenchEN-DEV-v1.1 at 92.6, SimpleVQA at 56.0 and HallusionBench at 70.0, while document understanding tests show OmniDocBench1.5 at 88.9, MMLongBench-Doc at 60.2 and OCRBench at 89.4. Spatial and embodied tasks are covered with metrics such as ERQA at 60.5, CountBench at 97.8, EmbSpatialBench at 84.5 and LingoQA at 82.0, along with multiple 3D and scene datasets, and video understanding results include VideoMME(w sub.) at 87.0, VideoMME(w/o sub.) at 82.8, VideoMMMU at 82.3 and MLVU at 85.9. Tool calling and medical visual question answering are also evaluated, with TIR-Bench at 59.8 / 42.3, V* at 93.7 / 89.0, SLAKE at 80.0, PMC-VQA at 62.4 and MedXpertQA-MM at 62.4.

Qwen3.5-27B is designed to operate in a “thinking mode” by default, where outputs include a <think>…</think> reasoning segment before the final answer, and examples show how to disable this via API parameters for an instruct style. Detailed serving recipes are provided for SGLang, vLLM and Hugging Face Transformers, including commands that stand up OpenAI-compatible endpoints at http://localhost:8000/v1 with tensor parallel size 8 and a context length of 262,144 tokens, as well as configurations for tool calling, multi-token prediction and text-only operation. Recommended sampling settings are specified for several modes, including thinking mode for general tasks with temperature=1.0, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=1.5, repetition_penalty=1.0, and output length suggestions of 32,768 tokens for most queries and 81,920 tokens for math and programming competitions. The model integrates with Qwen-Agent for tool-rich agentic use cases and with Qwen Code for terminal automation, and best-practice guidance covers standardizing prompts, avoiding storing historical thinking content, tuning presence_penalty between 0 and 2, and adjusting video preprocessing parameters, such as setting the longest_edge to 469,762,048 in video_preprocessor_config.json to enable higher frame-rate sampling on long videos.

Source

66

Impact Score

Latest News

SK Telecom details Artificial Intelligence native overhaul at MWC 2026

March 4, 2026

SK Telecom is pursuing an Artificial Intelligence native strategy that rewires its network, customer services, and infrastructure around large models and hyperscale data centres to help position Korea among the top Artificial Intelligence powers.

EU weighs new copyright rules for generative artificial intelligence

March 4, 2026

Members of the European Parliament are considering new copyright rules that would force generative artificial intelligence developers to be transparent about their training data and to fairly compensate rights-holders.

Autonomous heavy equipment reaches a tipping point for industrial artificial intelligence

March 4, 2026

Labor shortages, maturing sensor and edge compute hardware, and a migration of autonomous vehicle talent are pushing autonomous heavy equipment toward large-scale deployment in construction, mining and other industrial sectors.

SynaXG showcases fully software-defined artificial intelligence RAN on Nvidia platforms

March 4, 2026

SynaXG has demonstrated carrier-grade, fully software-defined artificial intelligence-native radio access networks running concurrent 5G FR1, 5G FR2 and artificial intelligence workloads on Nvidia accelerated platforms with real-time GPU orchestration.

Generative artificial intelligence copyright disputes advance across media and entertainment

March 4, 2026

Major entertainment companies and technology platforms are pressing forward in a wave of generative artificial intelligence infringement and contract cases, with new discovery fights, service milestones, and scheduling orders reshaping litigation strategy across multiple federal courts.

Qwen3.5-27B brings multimodal long-context Artificial Intelligence to open-source developers

66

Impact Score

Latest News

SK Telecom details Artificial Intelligence native overhaul at MWC 2026

EU weighs new copyright rules for generative artificial intelligence

Autonomous heavy equipment reaches a tipping point for industrial artificial intelligence

SynaXG showcases fully software-defined artificial intelligence RAN on Nvidia platforms

Generative artificial intelligence copyright disputes advance across media and entertainment

Contact Us