Research · Jul 4, 2026

Apple proposes amortized maximum inner product search to speed up vector retrieval

Regression-based approach trains neural networks to predict MIPS solutions directly, cutting compute cost for repeated queries over fixed databases.

Trust79

HypeLow hype

1 source · cross-referenced

ShareX LinkedIn Email

TL;DR

Apple researchers propose amortized MIPS, a regression-based method to predict vector search solutions directly rather than computing them repeatedly.
Two complementary models are introduced: SupportNet regresses the support function to route queries, and KeyNet regresses the optimal key for direct use in indexing pipelines.
On the BEIR benchmark with document embeddings, the approach improves IVF match rates while reducing compute measured in FLOPs, probes, or wall-clock time.

Apple’s Machine Learning Research team describes a method to amortize the cost of maximum inner product search (MIPS), a core subroutine in many machine learning pipelines that selects the best-matching vector from a database for a given query.

The proposed approach trains neural networks to directly predict MIPS solutions instead of computing them repeatedly for each query, assuming queries are drawn from a known distribution over a fixed key database.

The authors frame MIPS as a support function of the key set, motivating two models: SupportNet, an input-convex neural network that regresses the support function and can act as a cluster router to guide queries to relevant database partitions, and KeyNet, a vector-valued network that regresses the optimal key and can be used as a drop-in replacement for the original query in off-the-shelf indexing pipelines.

In experiments on the BEIR benchmark using document embeddings, SupportNets and KeyNets improved IVF match rates while reducing compute as measured by FLOPs, the number of probes, or wall-clock time.

Sources

01Apple — Machine Learning Research — Amortizing Maximum Inner Product Search with Learned Support Functions

Also on Research

Apple proposes amortized maximum inner product search to speed up vector retrieval

Apple study finds RL-finetuned vision-language models vulnerable to textual perturbations and CoT inconsistencies

Apple proposes MemoryLLM to decouple feed-forward modules from self-attention for interpretable memory retrieval

Apple proposes VideoFlexTok for flexible-length, coarse-to-fine video tokenization