Skip to content
Research · Jun 26, 2026

LLMs show strong performance on text-only statics problems but struggle with diagrams and multi-step reasoning

A study finds large language models excel at text-only engineering problems but face accuracy drops when diagrams are introduced, with multi-step reasoning and visual information integration cited as key challenges.

Trust79
HypeLow hype

1 source · cross-referenced

ShareXLinkedInEmail
TL;DR
  • LLMs perform well on text-only statics problems but see accuracy drop when diagrams are introduced.
  • Performance decline linked to multi-step reasoning and consistent application of extracted visual information.
  • Study used 25 text-only statics questions and two additional datasets with diagrams and modified numerical values.
  • Authors distilled ChatGPT to construct evaluation datasets, focusing on mechanical engineering statics.

A new arXiv preprint evaluates large language model (LLM) capabilities in solving statics problems, a domain within mechanical engineering that often requires multi-step reasoning and visual interpretation. The study, titled *Investigating LLM's Problem Solving Capability -- a Study on Statics Questions*, was authored by Tanner Culleton and Hung-Fu Chang and submitted to the Engineering and Technology Symposium 2026.

Rather than relying on traditional textbook-style prompts, the researchers adopted a model distillation process to extract 25 text-only statics questions from ChatGPT. They then constructed two additional datasets by adding diagrams to the original questions and modifying their numerical values. This approach allowed for a controlled evaluation of how LLMs handle variations in problem presentation and complexity.

Experimental results indicate that LLMs achieve strong performance on text-only statics problems. However, their accuracy declines when diagrams are introduced, particularly when problems require multi-step reasoning. The study further analyzes the cause of this performance drop, concluding that limitations in image recognition are not the primary issue. Instead, the challenges stem from difficulties in multi-step reasoning and in consistently applying extracted visual information across successive stages of problem-solving.

The authors suggest that these findings point to specific technical gaps in current LLM architectures when addressing visually complex, multi-step engineering tasks. The work contributes to a growing body of research examining the educational applications of LLMs, while also identifying areas for improvement in multimodal reasoning capabilities.

Sources
  1. 01arXiv cs.CLInvestigating LLM's Problem Solving Capability -- a Study on Statics Questions
Also on Research

Stories may contain errors. Dispatch is assembled with AI assistance and curated by human editors; despite the trust-score filter, mistakes happen. We correct publicly — every article links to its revision history. Nothing here is financial, legal, or medical advice. Verify before relying on any claim.

© 2026 Dispatch. No ads. No sponsorships. No paid placement. Reader-supported via Ko-fi.

Built by a person who cares about honest AI news.