Apple study finds RL-finetuned vision-language models vulnerable to textual perturbations and CoT inconsistencies
Researchers show RL-tuned VLMs degrade sharply under misleading captions or incorrect chain-of-thought traces, with closed models more robust than open-source counterparts.
1 source · cross-referenced
- RL-finetuned vision-language models (VLMs) suffer large robustness drops under simple textual perturbations like misleading captions or incorrect chain-of-thought traces.
- Effects are stronger when chain-of-thought consistency is considered across open-source multimodal reasoning models.
- Closed models show similar failure modes but maintain greater robustness and reasoning consistency than open-source RL-finetuned models.
- Training improves benchmark accuracy but can erode chain-of-thought faithfulness and robustness; faithfulness-aware rewards help but may introduce new vulnerabilities.
Apple Machine Learning Research describes a study examining the robustness and chain-of-thought (CoT) consistency of reinforcement learning (RL)-finetuned vision-language models (VLMs). The work argues that while RL finetuning improves performance on visual reasoning benchmarks, RL-tuned VLMs remain vulnerable to weak visual grounding, hallucinations, and over-reliance on textual cues.
The researchers report that simple, controlled textual perturbations—such as misleading captions or incorrect CoT traces—cause substantial drops in both robustness and model confidence. These effects are more pronounced when CoT consistency is evaluated across open-source multimodal reasoning models.
In contrast, closed models exhibit similar failure modes but maintain markedly greater robustness and reasoning consistency, suggesting the gap reflects a shortcoming in current open-source RL finetuning rather than an inherent limitation of the task.
The paper further analyzes RL finetuning dynamics and uncovers an accuracy–faithfulness trade-off: finetuning raises benchmark accuracy but can simultaneously erode the reliability of the accompanying CoT and its robustness to contextual shifts.
The authors find that while adversarial augmentation improves robustness, it does not by itself prevent faithfulness drift. Incorporating a faithfulness-aware reward can restore alignment between answers and reasoning, but when paired with augmentation, training risks collapsing onto shortcut strategies and robustness remains elusive.
The findings motivate training and assessment protocols that jointly emphasize correctness, robustness, and the faithfulness of visually grounded reasoning, rather than relying solely on accuracy metrics.
- Jul 4, 2026 · Apple — Machine Learning Research
Apple proposes amortized maximum inner product search to speed up vector retrieval
Trust79 - Jul 4, 2026 · Apple — Machine Learning Research
Apple proposes MemoryLLM to decouple feed-forward modules from self-attention for interpretable memory retrieval
Trust79 - Jul 4, 2026 · Apple — Machine Learning Research
Apple proposes VideoFlexTok for flexible-length, coarse-to-fine video tokenization
Trust79