As compute costs surge across Artificial Intelligence data centers, DeepSeek is leaning on model efficiency with a newly announced, open-source OCR context compression system. The DeepSeek-OCR approach uses optical mapping to convert lengthy text documents into images, achieving a 97 percent recognition precision at compression ratios below 10x. By pairing advanced encoder and decoder components, the system can convert more than nine text tokens into a single visual token, sharply cutting the number of tokens that downstream models must process and, in turn, the compute required for training and inference.
The efficiency gains translate into notable throughput on commodity accelerators. DeepSeek reports that a single Nvidia A100 can process roughly 200,000 document pages per day, while a 20-node A100 cluster can handle about 33 million pages daily. Even at a 20x compression ratio, the system maintains 60 percent optical recognition accuracy. On the OmniDocBench ranking, DeepSeek-OCR outperforms established alternatives such as GOT-OCR2.0 and MinerU2.0 by using fewer vision tokens per page, underscoring its token efficiency. The company positions this work as part of its broader push to deliver open-source models with lower training costs than offerings like OpenAI’s ChatGPT or Google’s Gemini.
Under the hood, DeepEncoder algorithms allow the system to adapt to diverse document sizes and resolutions without sacrificing speed or accuracy. The decoder, named DeepSeek3B-MoE-A570M, employs a mixture-of-experts architecture that distributes knowledge across specialized components for different OCR subtasks. This setup enables parsing of complex, multilingual documents that include graphs, scientific formulas, diagrams and images. To reach its current accuracy and scale, DeepSeek trained on 30 million Portable Document Format pages spanning nearly 100 languages and covering categories from newspapers and scientific handwriting to textbooks and PhD dissertations. While the gains in visual tokenization speed and efficiency are clear, the article notes it remains uncertain how much these improvements will translate into better reasoning performance versus today’s text-token paradigms.