Research · May 22, 2026

New Method Improves LLM Reasoning About Conflicting Beliefs in Complex Social Scenarios

Researchers introduce OSCToM, combining reinforcement learning and domain-specific languages to help large language models handle nested belief conflicts and information asymmetries—core challenges in social reasoning tasks.

Trust79

HypeLow hype

1 source · single source

ShareX LinkedIn Email

TL;DR

OSCToM, a new approach combining RL and compositional surrogate models, addresses gaps in how LLMs reason about recursive beliefs and conflicting perspectives in complex social settings.
On the information-asymmetric FANToM benchmark, OSCToM-8B achieves 76% accuracy, versus 0.2% reported by the prior ExploreToM system.
The method uses an extended domain-specific language to generate observer-self conflict scenarios, where one agent's perspective conflicts with another's belief state.
Code and a 15-page paper with detailed experiments are publicly available; the data-synthesis approach is reported to be 6 times more efficient than prior methods.

Large language models demonstrate solid performance across many language understanding tasks, yet their ability to reason about beliefs—particularly nested, recursive beliefs in situations involving conflicting information—remains inconsistent. Existing benchmarks for evaluating theory of mind, such as ExploreToM, have not fully captured the complexity of scenarios where an observer's understanding of another agent contradicts the observer's own belief state, a fundamental aspect of genuine social reasoning.

Researchers have now introduced OSCToM (Observer-Self Conflict Theory of Mind), a framework designed to model and improve LLM reasoning about these nested belief conflicts. The approach combines reinforcement learning, a purpose-built domain-specific language for encoding conflict scenarios, and compositional surrogate models to systematically generate training cases that expose and address these reasoning gaps.

In controlled experiments, an 8-billion-parameter model trained with OSCToM-generated data achieved 76% accuracy on FANToM, a benchmark specifically designed to test reasoning under information asymmetry. This represents a substantial improvement over the 0.2% accuracy previously reported by ExploreToM on the same benchmark. The model remained competitive on other theory-of-mind benchmarks including Hi-ToM and BigToM, suggesting the approach generalizes without sacrificing breadth.

A notable practical finding is that the OSCToM data-synthesis procedure requires six times less computational overhead than prior methods, indicating that carefully targeted synthetic training data can enable smaller models to acquire advanced reasoning capabilities rather than relying solely on scale. The authors have released both the code and the full paper, enabling further research and validation of the method.

Sources

01arXiv cs.AI — OSCToM: RL-Guided Adversarial Generation for High-Order Theory of Mind

Also on Research

New Method Improves LLM Reasoning About Conflicting Beliefs in Complex Social Scenarios

Researchers introduce Cura 1T, a healthcare-specialized LLM trained via a human-gated self-evolution loop

GraphDx framework improves LLM-based clinical diagnosis accuracy and reduces test costs in study

Researchers propose Causal-Audit, a framework for explicit and auditable graph-based causal reasoning in LLMs