Dutch researcher advances artificial intelligence for hidden structured data insights

January 18, 2026

Dutch researcher Madelon Hulsebos is developing table representation learning techniques that allow artificial intelligence systems to understand what tables mean, aiming to unlock long overlooked structured data and democratise access to insights across organisations.

Organisations are sitting on large quantities of structured data in relational databases and spreadsheets that remain underused, because specialists still spend much of their time on repetitive work such as cleaning tables, extracting features and linking datasets. Dutch researcher Madelon Hulsebos, based at the Centrum Wiskunde & Informatica (CWI) in the Netherlands, is tackling this by developing “table representation learning”, a method that enables artificial intelligence to interpret what tables mean rather than simply search them by column names. After a PhD at the University of Amsterdam and postdoctoral work at the University of California, Berkeley, she now leads the Table Representation Learning Lab at CWI, guiding a team of three PhD students, two postdocs and six master’s students.

Backed by an NWO AiNed Fellowship Grant under the National Growth Fund programme, Hulsebos launched the DataLibra project, which runs from 2024 to 2029 and aims to build practical tools that make querying organisational data as simple as a web search. She argues that artificial intelligence can lower the barrier by allowing users to ask questions in natural language rather than learning programming, business intelligence tools and relational database concepts. The challenge is that each system uses different column names and logic, which limits traditional techniques such as SQL and pattern matching, so her work focuses on models that generalise from context in order to identify and combine relevant tables, moving from basic information retrieval to what she terms “insight retrieval”. Hulsebos stresses that full automation is not the goal, because users must be able to understand and explain why a specific answer was produced, making transparency, iteration and robustness central design requirements.

Hulsebos sees table representation learning as a way to automate the commonly cited “80% data work and 20% modelling” split in data science, freeing experts to focus on more critical questions while also empowering non-specialists to query relational databases directly in plain language. She is sceptical of many current vendor claims about artificial intelligence powered analytics, pointing to benchmarks where success rates are often zero and highlighting the need for systems that can justify their outputs rather than simply generate confident responses. The importance of context and explanation came into focus in a recent collaboration with the United Nations Humanitarian Data Centre on detecting sensitive data in humanitarian datasets. Together with master’s student Liang Telkamp, she developed two mechanisms: one that reasons over full data context to reduce false positives, and a “retrieve then detect” approach that dynamically links datasets to relevant policies and protocols so that assessments change as conflicts or situations evolve. Quality Assessment Officers at the UN found the contextualised explanations from large language models particularly valuable for navigating long information sharing protocols, and Telkamp’s work was recognised with the Amsterdam AI Thesis Award.

For Hulsebos, the UN project illustrates the broader organisational problem of making data both accessible and comprehensible, including understanding sensitivities before information is published on data sharing portals that could feed model training sets. She wants to surface unknown datasets and combinations so people can uncover insights they did not realise were possible, reducing the need to route every question through business intelligence or data science teams. In her view, dependence on dashboards and SQL queries introduces delays until a delivered insight is no longer timely, so she focuses on artificial intelligence powered systems that shorten “speed to insight” by allowing everyone from sales staff to CEOs to query data directly. Concrete tools are in development: one PhD student is building components to automate dataset retrieval and support structured query language generation, with first open source versions expected within the next two months. An earlier tool, DataScout, created during her time at the University of California, Berkeley, already showed in user studies that task based search with large language models helped data scientists find relevant datasets faster than traditional keyword based data platforms, addressing situations where gathering the right data for a machine learning model could otherwise take two weeks to a month.

Source

55

Impact Score

Latest News

UK online safety overhaul sharpens artificial intelligence liability for insurers

February 27, 2026

Proposed changes to the UK Online Safety Act are tightening expectations on artificial intelligence platforms, forcing insurers to reconsider how they classify, underwrite and word coverage for emerging tech risks.

Most UK firms see Artificial Intelligence training gap as shadow tool use grows

February 27, 2026

New research finds that 6 in 10 UK businesses say employees lack comprehensive Artificial Intelligence training, even as shadow use of unapproved tools becomes widespread and investment surges. Executives warn that without stronger skills, governance and strategy, many organisations risk missing out on expected Artificial Intelligence returns.

COSO issues internal control roadmap for governing generative artificial intelligence

February 27, 2026

COSO has released governance guidance that applies its Internal Control-Integrated Framework to generative artificial intelligence, offering audit-ready control structures and implementation tools for organizations. The publication details capability-based risk mapping, aligned controls, and practical templates to help institutions manage emerging technology risks.

World financial review ranks leading artificial intelligence infrastructure players

February 27, 2026

A new world financial review ranking highlights 11 companies building the core infrastructure for enterprise generative artificial intelligence, from inference orchestration and sovereign data centers to neocloud compute alternatives.

Microsoft claims 70x more energy efficient large language model

February 27, 2026

A Microsoft study claims a large language model design that is 70x more energy efficient, signaling a potential shift in how Artificial Intelligence workloads are powered in data centers.

Dutch researcher advances artificial intelligence for hidden structured data insights

55

Impact Score

Latest News

UK online safety overhaul sharpens artificial intelligence liability for insurers

Most UK firms see Artificial Intelligence training gap as shadow tool use grows

COSO issues internal control roadmap for governing generative artificial intelligence

World financial review ranks leading artificial intelligence infrastructure players

Microsoft claims 70x more energy efficient large language model

Contact Us