Building a new operating model for enterprise retrieval augmented generation

January 22, 2026

Alkira details how it evolved from fine-tuning language models and off-the-shelf retrieval augmented generation tools to a custom hybrid architecture that combines a knowledge graph and vector search inside FalkorDB to reduce context pollution and preserve data sovereignty.

The article traces Alkira’s effort to build an internal enterprise assistant capable of answering complex questions across scattered corporate knowledge, from wikis and tickets to design docs and support channels. Early attempts focused on fine-tuning open source models such as Llama3- 8B, Qwen3-8B, and Mistral-7B using generated question and answer pairs from documentation. These models struggled with vague queries and only produced reliable answers when prompts were phrased very specifically, highlighting the limitations of fine-tuning as a substitute for queryable memory. Without direct grounding in source documents, responses were brittle and prone to hallucinations, which made this strategy unsuitable for a broad, dynamic knowledge base that also had strict data sovereignty requirements.

Moving beyond fine-tuning, the team turned to retrieval augmented generation, initially experimenting with existing open source frameworks. They adopted LightRAG with Neo4j and Qdrant in prototype v0.1 and saw encouraging results on a small corpus. In prototype v0.2 they customized LightRAG with semantic chunking, contextual embeddings, and dynamic entity extraction, but when the system grew from ~100 to ~3000 documents, retrieval quality collapsed due to context pollution, where too many irrelevant or tangential documents diluted the context fed to the generator model. This experience led Alkira to conclude that generic frameworks lacked the granular control over ingestion and retrieval necessary for complex enterprise environments. A subsequent proof of concept, v0.3, combined FalkorDB with graphrag-sdk, Qdrant, and a FastAPI backend using gemini-2.5-flash, along with dual dense and sparse vectors, and delivered outstanding quality with ~5,000 documents, validating a hybrid graph plus vector approach and revealing that a single high-quality dense vector within FalkorDB would be sufficient.

These lessons informed the production-ready architecture, AKGPT v0.4, built around FalkorDB as a single, in-memory engine that stores both a knowledge graph and vector embeddings to minimize latency and complexity. A manually triggered ingestion pipeline, orchestrated via a Redis queue, uses a language model to filter out non-technical noise and personally identifiable information, performs semantic chunking, extracts entities and relationships under a defined schema, generates Qwen3-embedding-8b vectors with 4096 dimensions, and creates hybrid links between conceptual nodes and their source text. At query time, an enhancement step reframes user prompts using a rich system prompt before dispatching them to two parallel retrieval paths: precise graph traversal based on extracted entities and semantic vector search, both running over FalkorDB. Retrieved results from each path are passed to a dedicated Qwen3-reranker-8b model, which scores relevance and returns the top 10 items per path, and the top 20 unique chunks are then synthesized into an answer by gemini-2.5-flash. The system now handles both vague conceptual questions and specific operational commands while keeping all data inside Alkira’s infrastructure. The article notes that this approach demands significant custom development, careful management of FalkorDB’s RAM costs, multi-model latency, and schema evolution, and outlines a roadmap that includes automated ingestion from tools like Jira, Confluence, and Slack, temporal and multi-hop graph retrieval, more agentic orchestration for live data and actions, corrective retrieval augmented generation loops, and caching for frequently asked questions.

Source

52

Impact Score

Latest News

Anthropic March 2026 release roundup

March 16, 2026

Anthropic rolled out a broad set of March 2026 updates across Claude Code, the Claude Developer Platform, Claude apps, and enterprise partnerships. Changes focused on larger context windows, workflow improvements, reliability fixes, visual output features, and new partner enablement programs.

China renews push to lead in technology and Artificial Intelligence

March 16, 2026

China’s 15th five-year plan elevates science and technology as core national priorities, with a strong emphasis on self-reliance and Artificial Intelligence. The blueprint signals heavier investment, broader industrial support, and a more confident bid to shape global technology standards.

Concerns grow over artificial intelligence narrowing scientific research

March 16, 2026

Artificial intelligence is credited with significant benefits for society and academia, but senior UCL leadership warns it may also narrow scientific inquiry and constrain future breakthroughs.

Nvidia prepares GTC 2026 pitch on next generation artificial intelligence and chips

March 16, 2026

Nvidia is using GTC 2026 to convince investors and customers that its aggressive reinvestment strategy in the artificial intelligence ecosystem will maintain its lead, even as major clients build their own chips and the industry shifts toward agentic artificial intelligence.

Building a new operating model for enterprise retrieval augmented generation

52

Impact Score

Latest News

Anthropic March 2026 release roundup

China renews push to lead in technology and Artificial Intelligence

Top artificial intelligence video generation tools shaping video creation in 2026

Concerns grow over artificial intelligence narrowing scientific research

Nvidia prepares GTC 2026 pitch on next generation artificial intelligence and chips

Contact Us