Research · Jun 17, 2026

Researchers propose RepSelect method to make LLM unlearning more robust against reversal attacks

RepSelect isolates forget-set-specific representations to reduce post-relearning answer accuracy by 4–50x compared to baselines while maintaining general capabilities.

Trust79

HypeLow hype

1 source · cross-referenced

ShareX LinkedIn Email

TL;DR

RepSelect targets selective representations to achieve deep and robust forgetting in LLMs.
Evaluated on four model families (Llama 3, Qwen 3.5, Gemma 4 E4B, DeepSeek V2 Lite) across biohazardous knowledge and abusive tendencies.
Reduces post-relearning answer accuracy by 4–50x over five baselines (GradDiff, NPO, SimNPO, RMU, UNDIAL).
Near-perfectly robust to few-shot prompting attacks compared to existing methods.

Researchers from an unnamed institution propose RepSelect, a method to make large language model (LLM) unlearning more robust against reversal attacks. The core idea is that existing unlearning techniques target representations shared with both the retain set and the subspace recoverable by a fine-tuning attacker, which makes forgetting shallow and easily reversible.

RepSelect isolates forget-set-specific representations by collapsing the top principal components of weight gradients before each update. This approach aims to leave general capabilities intact while limiting what fine-tuning can recover after unlearning.

The method was evaluated across two forget categories—biohazardous knowledge and abusive tendencies—and four model families spanning dense and Mixture-of-Experts architectures: Llama 3, Qwen 3.5, Gemma 4 E4B, and DeepSeek V2 Lite.

Compared to five popular baselines—GradDiff, NPO, SimNPO, RMU, and UNDIAL—RepSelect achieves a 4–50x larger reduction in post-relearning answer accuracy than the strongest baseline. It is also near-perfectly robust to few-shot prompting attacks, a common method for reversing unlearning.

The authors argue that targeting selective representations is an important step toward deep and robust LLM forgetting, addressing a central challenge in making models forget specific knowledge and values without sacrificing general capabilities.

Sources

01arXiv cs.CL — RepSelect: Robust LLM Unlearning via Representation Selectivity

Also on Research

Researchers propose RepSelect method to make LLM unlearning more robust against reversal attacks

Apple researchers propose graph-based sensemaking workflows using UMAP’s internal kNN graph

Apple details memory-efficient audio synthesis architecture for on-device Siri Expressive Voices

Apple proposes MoMo, a two-stage imitation-learning framework for robot motion control