DeepSeek OCR Artificial Intelligence model processes 200,000 pages a day on one Nvidia A100

DeepSeek introduced an open-source OCR context compression model that converts long documents into compact visual tokens for faster model training. The system processes about 200,000 pages per day on a single Nvidia A100 while maintaining up to 97 percent recognition precision at sub-10x compression.

As compute costs surge across Artificial Intelligence data centers, DeepSeek is leaning on model efficiency with a newly announced, open-source OCR context compression system. The DeepSeek-OCR approach uses optical mapping to convert lengthy text documents into images, achieving a 97 percent recognition precision at compression ratios below 10x. By pairing advanced encoder and decoder components, the system can convert more than nine text tokens into a single visual token, sharply cutting the number of tokens that downstream models must process and, in turn, the compute required for training and inference.

The efficiency gains translate into notable throughput on commodity accelerators. DeepSeek reports that a single Nvidia A100 can process roughly 200,000 document pages per day, while a 20-node A100 cluster can handle about 33 million pages daily. Even at a 20x compression ratio, the system maintains 60 percent optical recognition accuracy. On the OmniDocBench ranking, DeepSeek-OCR outperforms established alternatives such as GOT-OCR2.0 and MinerU2.0 by using fewer vision tokens per page, underscoring its token efficiency. The company positions this work as part of its broader push to deliver open-source models with lower training costs than offerings like OpenAI’s ChatGPT or Google’s Gemini.

Under the hood, DeepEncoder algorithms allow the system to adapt to diverse document sizes and resolutions without sacrificing speed or accuracy. The decoder, named DeepSeek3B-MoE-A570M, employs a mixture-of-experts architecture that distributes knowledge across specialized components for different OCR subtasks. This setup enables parsing of complex, multilingual documents that include graphs, scientific formulas, diagrams and images. To reach its current accuracy and scale, DeepSeek trained on 30 million Portable Document Format pages spanning nearly 100 languages and covering categories from newspapers and scientific handwriting to textbooks and PhD dissertations. While the gains in visual tokenization speed and efficiency are clear, the article notes it remains uncertain how much these improvements will translate into better reasoning performance versus today’s text-token paradigms.

55

Impact Score

Saudi Artificial Intelligence startup launches Arabic LLM

Misraj Artificial Intelligence unveiled Kawn, an Arabic large language model, at AWS re:Invent and launched Workforces, a platform for creating and managing Artificial Intelligence agents for enterprises and public institutions.

Introducing Mistral 3: open artificial intelligence models

Mistral 3 is a family of open, multimodal and multilingual Artificial Intelligence models that includes three Ministral edge models and a sparse Mistral Large 3 trained with 41B active and 675B total parameters, released under the Apache 2.0 license.

NVIDIA and Mistral Artificial Intelligence partner to accelerate new family of open models

NVIDIA and Mistral Artificial Intelligence announced a partnership to optimize the Mistral 3 family of open-source multilingual, multimodal models across NVIDIA supercomputing and edge platforms. The collaboration highlights Mistral Large 3, a mixture-of-experts model designed to improve efficiency and accuracy for enterprise artificial intelligence deployments starting Tuesday, Dec. 2.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.