InternLM has released Intern-S1, an open-source multimodal reasoning model designed for both general-purpose and scientific applications. The system couples a 235 billion parameter mixture-of-experts language model (Qwen3) with a 6 billion parameter vision encoder (InternViT), and has been further pretrained on 5 trillion tokens, including more than 2.5 trillion tokens from scientific domains. The model targets tasks such as interpreting chemical structures, understanding protein sequences, and planning compound synthesis routes, while retaining broad reasoning capabilities. A dynamic tokenizer is used to natively parse molecular formulas, protein sequences, and seismic signals. The model is distributed under the Apache-2.0 license.
Intern-S1 reports strong results on a wide set of general and science-focused benchmarks. Highlights include MMLU-Pro 83.5 (best among open-source models), MMMU 77.7 (best among open-source models), MathVista 81.5 (best overall), MMStar 74.9 (best among open-source models), MathVision 62.5 (best among open-source models), and SFE 44.3 (best overall). It also posts SmolInstruct 51.0 (best overall), ChemBench 83.4 (best overall), MatBench 75.0 (best overall), MicroVQA 63.9 (best overall), MSEarthMCQ 65.7 (best overall), and XLRS-Bench 55.0 (best overall). Additional reported results include GPQA 77.3, AIME2025 86.0, IFEval 86.7, and Physics 44.0. Evaluations were conducted with OpenCompass and VLMEvalkit, and comparisons reference a mix of open and closed models.
The model runs in an image-text-to-text pipeline and supports text, image, and video inputs. The documentation recommends Transformers version 4.53.0 or newer and suggests sampling parameters of top_p 1.0, top_k 50, temperature 0.7, and min_p 0.0. Video usage requires the decord library. For deployment, Intern-S1 can be served via LMDeploy 0.9.2 or later to expose an OpenAI-compatible endpoint; vLLM support is noted as coming soon, and SGLang support is in progress. An Ollama recipe is provided for local use. Intern-S1 supports tool calling via OpenAI-style interfaces, and the model’s “thinking mode” is enabled by default for enhanced reasoning, with a switch to disable it through the chat template or request parameters.