Skip to content
Research · Jun 17, 2026

Researchers propose RepSelect method to make LLM unlearning more robust against reversal attacks

RepSelect isolates forget-set-specific representations to reduce post-relearning answer accuracy by 4–50x compared to baselines while maintaining general capabilities.

Trust79
HypeLow hype

1 source · cross-referenced

ShareXLinkedInEmail
TL;DR
  • RepSelect targets selective representations to achieve deep and robust forgetting in LLMs.
  • Evaluated on four model families (Llama 3, Qwen 3.5, Gemma 4 E4B, DeepSeek V2 Lite) across biohazardous knowledge and abusive tendencies.
  • Reduces post-relearning answer accuracy by 4–50x over five baselines (GradDiff, NPO, SimNPO, RMU, UNDIAL).
  • Near-perfectly robust to few-shot prompting attacks compared to existing methods.

Researchers from an unnamed institution propose RepSelect, a method to make large language model (LLM) unlearning more robust against reversal attacks. The core idea is that existing unlearning techniques target representations shared with both the retain set and the subspace recoverable by a fine-tuning attacker, which makes forgetting shallow and easily reversible.

RepSelect isolates forget-set-specific representations by collapsing the top principal components of weight gradients before each update. This approach aims to leave general capabilities intact while limiting what fine-tuning can recover after unlearning.

The method was evaluated across two forget categories—biohazardous knowledge and abusive tendencies—and four model families spanning dense and Mixture-of-Experts architectures: Llama 3, Qwen 3.5, Gemma 4 E4B, and DeepSeek V2 Lite.

Compared to five popular baselines—GradDiff, NPO, SimNPO, RMU, and UNDIAL—RepSelect achieves a 4–50x larger reduction in post-relearning answer accuracy than the strongest baseline. It is also near-perfectly robust to few-shot prompting attacks, a common method for reversing unlearning.

The authors argue that targeting selective representations is an important step toward deep and robust LLM forgetting, addressing a central challenge in making models forget specific knowledge and values without sacrificing general capabilities.

Sources
  1. 01arXiv cs.CLRepSelect: Robust LLM Unlearning via Representation Selectivity
Also on Research

Stories may contain errors. Dispatch is assembled with AI assistance and curated by human editors; despite the trust-score filter, mistakes happen. We correct publicly — every article links to its revision history. Nothing here is financial, legal, or medical advice. Verify before relying on any claim.

© 2026 Dispatch. No ads. No sponsorships. No paid placement. Reader-supported via Ko-fi.

Built by a person who cares about honest AI news.