DeepSeek OCR Artificial Intelligence model processes 200,000 pages a day on one Nvidia A100

DeepSeek introduced an open-source OCR context compression model that converts long documents into compact visual tokens for faster model training. The system processes about 200,000 pages per day on a single Nvidia A100 while maintaining up to 97 percent recognition precision at sub-10x compression.

As compute costs surge across Artificial Intelligence data centers, DeepSeek is leaning on model efficiency with a newly announced, open-source OCR context compression system. The DeepSeek-OCR approach uses optical mapping to convert lengthy text documents into images, achieving a 97 percent recognition precision at compression ratios below 10x. By pairing advanced encoder and decoder components, the system can convert more than nine text tokens into a single visual token, sharply cutting the number of tokens that downstream models must process and, in turn, the compute required for training and inference.

The efficiency gains translate into notable throughput on commodity accelerators. DeepSeek reports that a single Nvidia A100 can process roughly 200,000 document pages per day, while a 20-node A100 cluster can handle about 33 million pages daily. Even at a 20x compression ratio, the system maintains 60 percent optical recognition accuracy. On the OmniDocBench ranking, DeepSeek-OCR outperforms established alternatives such as GOT-OCR2.0 and MinerU2.0 by using fewer vision tokens per page, underscoring its token efficiency. The company positions this work as part of its broader push to deliver open-source models with lower training costs than offerings like OpenAI’s ChatGPT or Google’s Gemini.

Under the hood, DeepEncoder algorithms allow the system to adapt to diverse document sizes and resolutions without sacrificing speed or accuracy. The decoder, named DeepSeek3B-MoE-A570M, employs a mixture-of-experts architecture that distributes knowledge across specialized components for different OCR subtasks. This setup enables parsing of complex, multilingual documents that include graphs, scientific formulas, diagrams and images. To reach its current accuracy and scale, DeepSeek trained on 30 million Portable Document Format pages spanning nearly 100 languages and covering categories from newspapers and scientific handwriting to textbooks and PhD dissertations. While the gains in visual tokenization speed and efficiency are clear, the article notes it remains uncertain how much these improvements will translate into better reasoning performance versus today’s text-token paradigms.

55

Impact Score

U.S. and China revisit Artificial Intelligence emergency talks

Washington and Beijing are exploring renewed talks on an emergency communication channel for Artificial Intelligence as fears grow over the capabilities of Anthropic’s Mythos model. The shift reflects rising concern in both capitals that competitive pressure is outpacing safeguards.

Artificial Intelligence divides employers as hiring and headcount shift

U.S. hiring beat expectations in April, but employers remain split on whether Artificial Intelligence should drive layoffs, productivity gains, or internal redeployment. At the same time, candidate use of Artificial Intelligence is outpacing employer adoption in hiring, adding new pressure to screening and entry-level recruiting.

What businesses need to know about the EU cyber resilience act

The EU cyber resilience act is turning product cybersecurity into a legal requirement for companies that sell digital products into the European Union. A key compliance milestone arrives in September 2026, well before the full regulation takes effect in 2027.

Claude Mythos and cyber insurance’s next inflection point

Claude Mythos is being treated by governments and regulators as a potential systemic cyber risk with implications for financial stability and insurance markets. Its emergence is intensifying pressure on insurers to clarify whether Artificial Intelligence-enabled cyber losses are covered, excluded, or require new stand-alone products.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.