Bangla, with its complex grammar and numerous dialects, poses unique challenges for writers and language technologists. The surge of poorly written online content in Bangla further complicates matters, impacting the ability of international Artificial Intelligence systems, such as ChatGPT and Gemini, to generate accurate Bangla. In response, local innovators have stepped in. Shuddhi, developed by Sagor Sarker of Hishab Technologies, is a tool aimed at everyday Bangla writing, correcting grammar and spelling, supporting Banglish, and offering translation. Sarker envisions it as a ´Grammarly for Bangla,´ pending the acquisition of funding needed to develop additional features like paraphrasing, plagiarism detection, and fact-checking.
Hishab Technologies is also experimenting with homegrown large language models (LLMs). Their bilingual ´titulm-1b-bn-v1,´ trained on over a billion parameters and billions of Bangla tokens, handles both Bangla and English tasks, while drawing on popular model architectures like Llama, MPT, and Gemma. Public evaluation datasets for Bangla LLMs have also been produced to spur broader development within the ecosystem. Meanwhile, Intelsense AI has launched ´Ekush LLM,´ which touts itself as Bangladesh’s first model trained specifically on Bangla. Beyond text generation, Ekush leverages proprietary Voice Artificial Intelligence, with applications across telemedicine, agriculture, and more. These startups collaborate with local enterprises, aiming not only to improve workflow but to foster understanding and funding for Artificial Intelligence research in the country.
Despite visible progress, experts suggest Bangladesh’s path to homegrown, general-purpose LLMs like GPT-4 remains encumbered by infrastructure and resource limitations. Training state-of-the-art models—requiring thousands of advanced GPUs, vast datasets, and reliable electricity—is financially and technically formidable for local players. Most Bangladeshi companies instead fine-tune open-source models for narrow, domain-specific use cases or rely on API offerings from global leaders. Data privacy, too, is a rising concern, with the nation lacking robust frameworks akin to Europe’s GDPR. The deepest challenge may be cultural: while multilingual models increasingly ´understand´ Bangla, experts argue these systems lack the subtlety to capture local nuances and identity. As OpenAI’s models improve their Bangla, Dr Farig Sadeque of BRAC University suggests the focus should shift to ensuring LLMs authentically reflect Bangladesh’s unique linguistic and cultural character rather than trying to replicate a Western-style foundational model from scratch.