LLMs show strong performance on text-only statics problems but struggle with diagrams and multi-step reasoning
A study finds large language models excel at text-only engineering problems but face accuracy drops when diagrams are introduced, with multi-step reasoning and visual information integration cited as key challenges.
1 source · cross-referenced
- LLMs perform well on text-only statics problems but see accuracy drop when diagrams are introduced.
- Performance decline linked to multi-step reasoning and consistent application of extracted visual information.
- Study used 25 text-only statics questions and two additional datasets with diagrams and modified numerical values.
- Authors distilled ChatGPT to construct evaluation datasets, focusing on mechanical engineering statics.
A new arXiv preprint evaluates large language model (LLM) capabilities in solving statics problems, a domain within mechanical engineering that often requires multi-step reasoning and visual interpretation. The study, titled *Investigating LLM's Problem Solving Capability -- a Study on Statics Questions*, was authored by Tanner Culleton and Hung-Fu Chang and submitted to the Engineering and Technology Symposium 2026.
Rather than relying on traditional textbook-style prompts, the researchers adopted a model distillation process to extract 25 text-only statics questions from ChatGPT. They then constructed two additional datasets by adding diagrams to the original questions and modifying their numerical values. This approach allowed for a controlled evaluation of how LLMs handle variations in problem presentation and complexity.
Experimental results indicate that LLMs achieve strong performance on text-only statics problems. However, their accuracy declines when diagrams are introduced, particularly when problems require multi-step reasoning. The study further analyzes the cause of this performance drop, concluding that limitations in image recognition are not the primary issue. Instead, the challenges stem from difficulties in multi-step reasoning and in consistently applying extracted visual information across successive stages of problem-solving.
The authors suggest that these findings point to specific technical gaps in current LLM architectures when addressing visually complex, multi-step engineering tasks. The work contributes to a growing body of research examining the educational applications of LLMs, while also identifying areas for improvement in multimodal reasoning capabilities.
- Jun 26, 2026 · arXiv cs.CL
Post-training helpfulness degrades compassion values more than coding training in Llama 3.1 8B
Trust79 - Jun 26, 2026 · arXiv cs.CL
Linguistic features that shift LLM reasoning about animal welfare identified in new arXiv study
Trust79 - Jun 26, 2026 · arXiv cs.AI
Paper proposes activation-steering method to detect and reduce sycophancy in language models
Trust79