Building a new operating model for enterprise retrieval augmented generation

Alkira details how it evolved from fine-tuning language models and off-the-shelf retrieval augmented generation tools to a custom hybrid architecture that combines a knowledge graph and vector search inside FalkorDB to reduce context pollution and preserve data sovereignty.

The article traces Alkira’s effort to build an internal enterprise assistant capable of answering complex questions across scattered corporate knowledge, from wikis and tickets to design docs and support channels. Early attempts focused on fine-tuning open source models such as Llama3- 8B, Qwen3-8B, and Mistral-7B using generated question and answer pairs from documentation. These models struggled with vague queries and only produced reliable answers when prompts were phrased very specifically, highlighting the limitations of fine-tuning as a substitute for queryable memory. Without direct grounding in source documents, responses were brittle and prone to hallucinations, which made this strategy unsuitable for a broad, dynamic knowledge base that also had strict data sovereignty requirements.

Moving beyond fine-tuning, the team turned to retrieval augmented generation, initially experimenting with existing open source frameworks. They adopted LightRAG with Neo4j and Qdrant in prototype v0.1 and saw encouraging results on a small corpus. In prototype v0.2 they customized LightRAG with semantic chunking, contextual embeddings, and dynamic entity extraction, but when the system grew from ~100 to ~3000 documents, retrieval quality collapsed due to context pollution, where too many irrelevant or tangential documents diluted the context fed to the generator model. This experience led Alkira to conclude that generic frameworks lacked the granular control over ingestion and retrieval necessary for complex enterprise environments. A subsequent proof of concept, v0.3, combined FalkorDB with graphrag-sdk, Qdrant, and a FastAPI backend using gemini-2.5-flash, along with dual dense and sparse vectors, and delivered outstanding quality with ~5,000 documents, validating a hybrid graph plus vector approach and revealing that a single high-quality dense vector within FalkorDB would be sufficient.

These lessons informed the production-ready architecture, AKGPT v0.4, built around FalkorDB as a single, in-memory engine that stores both a knowledge graph and vector embeddings to minimize latency and complexity. A manually triggered ingestion pipeline, orchestrated via a Redis queue, uses a language model to filter out non-technical noise and personally identifiable information, performs semantic chunking, extracts entities and relationships under a defined schema, generates Qwen3-embedding-8b vectors with 4096 dimensions, and creates hybrid links between conceptual nodes and their source text. At query time, an enhancement step reframes user prompts using a rich system prompt before dispatching them to two parallel retrieval paths: precise graph traversal based on extracted entities and semantic vector search, both running over FalkorDB. Retrieved results from each path are passed to a dedicated Qwen3-reranker-8b model, which scores relevance and returns the top 10 items per path, and the top 20 unique chunks are then synthesized into an answer by gemini-2.5-flash. The system now handles both vague conceptual questions and specific operational commands while keeping all data inside Alkira’s infrastructure. The article notes that this approach demands significant custom development, careful management of FalkorDB’s RAM costs, multi-model latency, and schema evolution, and outlines a roadmap that includes automated ingestion from tools like Jira, Confluence, and Slack, temporal and multi-hop graph retrieval, more agentic orchestration for live data and actions, corrective retrieval augmented generation loops, and caching for frequently asked questions.

52

Impact Score

Indiana launches Artificial Intelligence business portal

Indiana is rolling out IN AI, a statewide portal meant to help employers adopt Artificial Intelligence with practical guidance, workshops and peer support. State leaders and business groups are positioning the effort as a way to raise productivity, wages and job growth while keeping workers at the center.

Goodfire launches model debugging tool for large language models

Goodfire has introduced Silico, a mechanistic interpretability platform designed to let developers inspect and adjust model behavior during development. The company is positioning it as a way to give smaller teams deeper control over open-source models and more trustworthy outputs.

Nvidia launches nemotron 3 nano omni for enterprise agents

Nvidia has introduced Nemotron 3 Nano Omni, a multimodal open model designed to support enterprise agents that reason across vision, speech and language. The launch extends Nvidia’s push beyond hardware into models and services while targeting more efficient agentic workflows.

Intel 18A-P node improves performance and efficiency

Intel plans to present new results for its 18A-P process at the VLSI 2026 Symposium, highlighting gains in performance, power efficiency, and manufacturing predictability. The updated node is positioned as a stronger option for customers seeking 18A density with better operating characteristics.

EA CEO defends broader Artificial Intelligence use in game development

EA CEO Andrew Wilson defended the company’s internal use of Artificial Intelligence after employee claims that the tools were slowing work rather than helping. He framed the technology as an aid for repetitive quality assurance tasks, even as concerns persist over its broader impact on development.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.