Research · May 11, 2026

Chain-of-Thought Reasoning Amplifies Position Bias in Multiple-Choice Questions

A new study challenges the assumption that extended reasoning reduces model bias, finding instead that longer reasoning trajectories correlate with increased susceptibility to answer-position preferences across DeepSeek-R1 and other reasoning-tuned models.

Trust79

HypeLow hype

1 source · single source

ShareX LinkedIn Email

TL;DR

Chain-of-thought reasoning in models like DeepSeek-R1 does not eliminate position bias in multiple-choice tasks; instead, longer reasoning chains correlate with stronger bias toward certain answer positions (correlation 0.11–0.41, all p<0.05).
Testing across thirteen configurations on MMLU, ARC-Challenge, and GPQA datasets showed twelve models exhibited monotonically increasing position bias across reasoning-length quartiles.
A truncation intervention confirmed causality: resuming reasoning from later points in a chain increased the likelihood of shifting toward position-preferred answers (16% to 32% for R1-Qwen-7B).
DeepSeek-R1 at 671B shows lower aggregate position bias (0.019), but the length effect persists in its longest reasoning quartile, suggesting accuracy may mask rather than eliminate the underlying mechanism.
The authors propose diagnostic tools for auditing position bias in reasoning models and argue that multi-choice evaluation pipelines should not assume reasoning-tuned models are order-robust.

Chain-of-thought reasoning has become a foundational technique for improving model performance on complex tasks, with the assumption that extended thinking reduces reliance on shallow heuristics and spurious correlations. A new preprint challenges this premise by examining how reasoning models handle position bias—a persistent tendency to favor answers in specific locations within multiple-choice sets. Across DeepSeek-R1 and distilled variants, plus base models prompted with CoT instructions, researchers found a counterintuitive pattern: the longer the reasoning chain, the more pronounced the position bias.

The study analyzed thirteen distinct reasoning configurations on three major benchmarks (MMLU, ARC-Challenge, GPQA) using a Position Bias Score (PBS) metric. Twelve of the thirteen configurations showed statistically significant positive correlations between reasoning trajectory length and position bias (r ranging 0.11–0.41, p<0.05), even after controlling for accuracy. All open-weight reasoning variants displayed monotonically increasing PBS across length quartiles, suggesting a systematic rather than random effect.

To establish causality beyond correlation, the researchers performed truncation interventions: they resumed partially completed reasoning chains from intermediate points and observed whether the model's final answer selection shifted. The results were unambiguous—continuations started from later points in the trajectory showed increasingly strong shifts toward position-preferred answers (16% to 32% for R1-Qwen-7B depending on position bucket), demonstrating that the reasoning process itself is driving the bias, not an artifact of model initialization.

Notably, DeepSeek-R1 at 671B parameters showed substantially lower aggregate position bias (PBS 0.019) compared to smaller models, but the length effect did not disappear; it remained visible in the longest reasoning quartile (PBS 0.071). The authors interpret this as evidence that higher accuracy achieved by larger models may gate or suppress the expression of bias rather than eliminating the underlying mechanism. This distinction matters: a bias that only manifests at certain accuracy thresholds could still distort evaluations within particular task domains or model scales.

The research also identifies a separate direct-answer position bias—present when models bypass reasoning and answer directly—with a distinct pattern across architectures (strong in Llama-Instruct-direct, weak in Qwen-Instruct-direct) uncorrelated with trajectory length. CoT reasoning does not reduce this baseline bias but rather replaces it with a length-accumulated form, suggesting the two biases may have independent mechanisms.

Sources

01arXiv cs.AI — More Thinking, More Bias: Length-Driven Position Bias in Reasoning Models

Also on Research

Chain-of-Thought Reasoning Amplifies Position Bias in Multiple-Choice Questions

Anthropic reports discovery of an internal reasoning space in its Claude models

Apple researchers propose interactive proof systems to verify distribution property claims with sublinear overhead

Apple researchers propose doubly sub-linear interactive proofs for verifying large inputs