Qwen QwQ-32B reasoning model overview

Qwen positions QwQ-32B as a medium-sized reasoning model built for stronger downstream performance on difficult tasks. The release highlights architecture details, deployment guidance, and recommended inference settings for long-context and multi-turn use.

Qwen presents QwQ-32B as the reasoning model in the Qwen series, designed to outperform conventional instruction-tuned models on hard downstream tasks through stronger thinking and reasoning capabilities. It is described as a medium-sized reasoning model with competitive performance against state-of-the-art reasoning models, including DeepSeek-R1 and o1-mini. The model is a causal language model trained through pretraining and post-training, including supervised finetuning and reinforcement learning.

The technical profile includes a transformers architecture with RoPE, SwiGLU, RMSNorm, and Attention QKV bias. Number of Parameters: 32.5B. Number of Paramaters (Non-Embedding): 31.0B. Number of Layers: 64. Number of Attention Heads (GQA): 40 for Q and 8 for KV. Context Length: Full 131,072 tokens. For prompts exceeding 8,192 tokens in length, YaRN must be enabled. Qwen also notes that QwQ is based on Qwen2.5 and recommends using the latest version of transformers, warning that with transformers<4.37.0, users will encounter the error KeyError: 'qwen2'.

Qwen recommends several inference settings to improve output quality and reduce repetition. The model should begin with ” ” to avoid empty thinking content, a behavior already handled when apply_chat_template is used with add_generation_prompt=True. Sampling Parameters: Use Temperature=0.6, TopP=0.95, MinP=0 instead of Greedy decoding to avoid endless repetitions. Use TopK between 20 and 40 to filter out rare token occurrences while maintaining output diversity. For supported frameworks, `presence_penalty` can be adjusted between 0 and 2, though higher values may introduce language mixing and a slight drop in performance.

For multi-turn conversations, historical outputs should include only the final output and exclude thinking content, which is already implemented in apply_chat_template. Qwen also recommends prompt standardization for benchmarking, including step-by-step reasoning with a boxed final answer for math problems and a fixed JSON answer field for multiple-choice tasks. For inputs exceeding 8,192 tokens, YaRN can be enabled through rope_scaling with “factor”: 4.0 and “original_max_position_embeddings”: 32768. Qwen recommends vLLM for deployment, while noting that current vLLM support is limited to static YARN, which may affect performance on shorter texts.

50

Impact Score

Anu Bradford on tech sovereignty and regulatory fragmentation

Anu Bradford argues that Europe is wavering in its role as the world’s digital rule-setter just as governments everywhere move toward more state control over technology. Global companies are being pushed to treat geopolitical risk, data sovereignty, and Artificial Intelligence governance as core strategic issues.

Mistral launches text-to-speech model

Mistral has expanded its Voxtral family with a text-to-speech system aimed at enterprise voice applications. The company is positioning the open-weights model as a flexible alternative for organizations that want more control over deployment, cost and customization.

UK Parliament opens workforce inquiry on Artificial Intelligence

A UK Parliament committee is examining how Artificial Intelligence is changing business and work, with a focus on both economic opportunity and labour disruption. The inquiry is seeking evidence on government priorities as adoption expands across the economy.

Windows 11 tightens kernel trust for older drivers

Microsoft is changing Windows 11 kernel policy so new drivers must be signed through the Windows Hardware Compatibility Program. Older trusted drivers will still be allowed in some cases to preserve compatibility during the transition.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.