Neuro-Symbolic Drive framework improves driving VLA reasoning with rule-grounded traces
Researchers propose converting classical rule-based planner logic into structured supervision for vision-language-action driving models, reducing trajectory errors and miss rates in simulation.
1 source · cross-referenced
- A neuro-symbolic framework called Neuro-Symbolic Drive supervises driving VLAs using rule-grounded reasoning traces extracted from classical rule-based planners.
- The method instruments rule-based planners in simulation to capture decision traces, which are then paired with trajectories to fine-tune Qwen3.5-4B.
- On a simulator-generated benchmark, rule-grounded reasoning reduced ADE@3s from 0.47 to 0.26 and miss rate from 8.30% to 6.40% with three-camera perception.
- Under eight-camera perception, the same metrics improved from 0.54 to 0.26 (ADE@3s) and 10.13% to 5.99% (miss rate).
Researchers from multiple institutions introduce Neuro-Symbolic Drive, a neuro-symbolic framework designed to supervise driving vision-language-action (VLA) models using rule-grounded reasoning traces extracted from classical rule-based planners.
The core idea is to leverage the inherent decision-making logic of rule-based planners—systems that explicitly reason about safety constraints, search over maneuvers, and select trajectories—as a source of structured supervision for VLA models.
To implement this, the team instruments rule-based planners within a simulation environment to capture both the executed trajectory and the internal decision trace at each rule-evaluation step. These traces are then serialized into structured, rule-grounded reasoning and paired with the corresponding trajectories to fine-tune the Qwen3.5-4B model as a driving VLA.
By deriving reasoning traces directly from the planner states that determine actions, the approach ensures that the VLA’s intermediate rationales are causally connected to motion generation by construction, rather than relying on post-hoc alignment techniques.
The researchers evaluate the method on a simulator-generated benchmark, reporting improvements in trajectory accuracy and reliability. Specifically, under three-camera perception, average displacement error at 3 seconds (ADE@3s) decreased from 0.47 to 0.26, and the miss rate dropped from 8.30% to 6.40%. With eight-camera perception, ADE@3s improved from 0.54 to 0.26, and the miss rate fell from 10.13% to 5.99%.
The authors describe this as converting neuro-symbolic planning logic into structured supervision, providing a pathway to more interpretable and reliable autonomous driving systems.
- Jun 24, 2026 · arXiv cs.AI
Researchers propose Goal-Identity-Configurator architecture to distinguish agentive from agentic systems
Trust79 - Jun 24, 2026 · arXiv cs.CL
EXPO-SQL introduces clause-level execution feedback to improve Text-to-SQL models
Trust84 - Jun 23, 2026 · Apple — Machine Learning Research
Apple study finds annotation needs depend on the evaluation metric in NLI tasks
Trust84