Dutch researcher advances artificial intelligence for hidden structured data insights

Dutch researcher Madelon Hulsebos is developing table representation learning techniques that allow artificial intelligence systems to understand what tables mean, aiming to unlock long overlooked structured data and democratise access to insights across organisations.

Organisations are sitting on large quantities of structured data in relational databases and spreadsheets that remain underused, because specialists still spend much of their time on repetitive work such as cleaning tables, extracting features and linking datasets. Dutch researcher Madelon Hulsebos, based at the Centrum Wiskunde & Informatica (CWI) in the Netherlands, is tackling this by developing “table representation learning”, a method that enables artificial intelligence to interpret what tables mean rather than simply search them by column names. After a PhD at the University of Amsterdam and postdoctoral work at the University of California, Berkeley, she now leads the Table Representation Learning Lab at CWI, guiding a team of three PhD students, two postdocs and six master’s students.

Backed by an NWO AiNed Fellowship Grant under the National Growth Fund programme, Hulsebos launched the DataLibra project, which runs from 2024 to 2029 and aims to build practical tools that make querying organisational data as simple as a web search. She argues that artificial intelligence can lower the barrier by allowing users to ask questions in natural language rather than learning programming, business intelligence tools and relational database concepts. The challenge is that each system uses different column names and logic, which limits traditional techniques such as SQL and pattern matching, so her work focuses on models that generalise from context in order to identify and combine relevant tables, moving from basic information retrieval to what she terms “insight retrieval”. Hulsebos stresses that full automation is not the goal, because users must be able to understand and explain why a specific answer was produced, making transparency, iteration and robustness central design requirements.

Hulsebos sees table representation learning as a way to automate the commonly cited “80% data work and 20% modelling” split in data science, freeing experts to focus on more critical questions while also empowering non-specialists to query relational databases directly in plain language. She is sceptical of many current vendor claims about artificial intelligence powered analytics, pointing to benchmarks where success rates are often zero and highlighting the need for systems that can justify their outputs rather than simply generate confident responses. The importance of context and explanation came into focus in a recent collaboration with the United Nations Humanitarian Data Centre on detecting sensitive data in humanitarian datasets. Together with master’s student Liang Telkamp, she developed two mechanisms: one that reasons over full data context to reduce false positives, and a “retrieve then detect” approach that dynamically links datasets to relevant policies and protocols so that assessments change as conflicts or situations evolve. Quality Assessment Officers at the UN found the contextualised explanations from large language models particularly valuable for navigating long information sharing protocols, and Telkamp’s work was recognised with the Amsterdam AI Thesis Award.

For Hulsebos, the UN project illustrates the broader organisational problem of making data both accessible and comprehensible, including understanding sensitivities before information is published on data sharing portals that could feed model training sets. She wants to surface unknown datasets and combinations so people can uncover insights they did not realise were possible, reducing the need to route every question through business intelligence or data science teams. In her view, dependence on dashboards and SQL queries introduces delays until a delivered insight is no longer timely, so she focuses on artificial intelligence powered systems that shorten “speed to insight” by allowing everyone from sales staff to CEOs to query data directly. Concrete tools are in development: one PhD student is building components to automate dataset retrieval and support structured query language generation, with first open source versions expected within the next two months. An earlier tool, DataScout, created during her time at the University of California, Berkeley, already showed in user studies that task based search with large language models helped data scientists find relevant datasets faster than traditional keyword based data platforms, addressing situations where gathering the right data for a machine learning model could otherwise take two weeks to a month.

55

Impact Score

How Artificial Intelligence is reshaping financial services oversight

Financial services regulators are largely treating Artificial Intelligence as another technology governed by existing rules rather than building new securities-specific frameworks. History suggests that clearer expectations will emerge through examinations, enforcement, and supervisory guidance.

Nvidia faces gamer backlash over Artificial Intelligence shift

Nvidia is facing growing frustration from gamers as memory supply is steered toward data center chips and DLSS 5 becomes more central to game performance. The dispute highlights how far the company’s priorities have shifted toward enterprise Artificial Intelligence.

Executives see limited Artificial Intelligence productivity gains so far

Corporate enthusiasm around Artificial Intelligence has yet to translate into broad gains in employment or productivity, reviving comparisons to the long lag between early computing breakthroughs and measurable economic impact. Recent surveys and studies show mixed results, with strong expectations for future benefits but little consensus on present gains.

Nvidia skips a new GeForce generation as Artificial Intelligence chips dominate

Nvidia is set to go a year without a new GeForce GPU generation for the first time since the 1990s as memory shortages and higher margins in Artificial Intelligence hardware reshape the market. AMD and Intel are also struggling to capitalize because the same supply constraints are hitting gaming products across the industry.

Where gpu debt starts to break

Stress in gpu-backed infrastructure financing is emerging around deals that lack the structural protections seen in the strongest transactions. Oracle, the Abilene Stargate project, and older CoreWeave debt illustrate different ways residual risk can surface when contracts, collateral, and counterparties fall short.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.