Skip to content
Research · Jun 25, 2026

Researchers propose automated benchmark generation for neural relational reasoning using LLMs

Method uses LLM-driven evolutionary search and autonomous agentic workflows to produce increasingly challenging problem instances and improve reasoning evaluators.

Trust79
HypeLow hype

1 source · cross-referenced

ShareXLinkedInEmail
TL;DR
  • Researchers propose a method to automate the generation of challenging benchmark instances for neural relational reasoning using LLMs.
  • The approach combines LLM-driven evolutionary search and autonomous agentic search to discover sampling functions that yield hard problem instances.
  • The same machinery can be applied to novel worlds proposed by LLMs, enabling autonomous research on neural relational reasoning.
  • The work introduces an Edge Transformer as the reasoning evaluator and shows it can be improved to generalize to further data perturbations.

A new arXiv preprint introduces Project Auto-World, a framework that uses large language models (LLMs) to automate the generation of challenging benchmark instances for neural relational reasoning. The work targets a persistent challenge in evaluating neural models: determining what makes a problem instance hard and ensuring models can generalize to instances beyond their training distribution.

The authors frame the problem as a search for sampling functions that produce increasingly difficult problem instances. They employ LLM-driven evolutionary search—based on the FunSearch paradigm—and autonomous agentic search to discover these functions. The approach is applied within worlds defined by Datalog rules, with an Edge Transformer serving as the reasoning evaluator.

The paper demonstrates that the Edge Transformer can be improved using data generated by this process, yielding better generalization to unseen data perturbations. The authors also show that the same machinery can be extended to novel worlds proposed by LLMs, suggesting a path toward autonomous research workflows in neural relational reasoning.

The preprint is submitted to the NeurIPS 2026 Exposition & Demonstrations track and includes a link to an associated code repository. The authors include Anirban Das, Joanne Boisson, Irtaza Khalid, Sumita Garai, and Steven Schockaert.

Sources
  1. 01arXiv cs.AIProject Auto-World: Towards Automated Benchmarking of Neural Relational Reasoners
Also on Research

Stories may contain errors. Dispatch is assembled with AI assistance and curated by human editors; despite the trust-score filter, mistakes happen. We correct publicly — every article links to its revision history. Nothing here is financial, legal, or medical advice. Verify before relying on any claim.

© 2026 Dispatch. No ads. No sponsorships. No paid placement. Reader-supported via Ko-fi.

Built by a person who cares about honest AI news.