Research · Apr 24, 2026

Text embeddings replace domain knowledge in algorithm selection across seven problem classes

Researchers propose ZeroFolio, a feature-free method that uses pretrained text embeddings to select algorithms without manual feature engineering, outperforming hand-crafted approaches on 10 of 11 tested scenarios.

Trust79

HypeLow hype

1 source · single source

ShareX LinkedIn Email

TL;DR

ZeroFolio uses pretrained text embeddings instead of hand-crafted features to select algorithms across diverse problem domains including SAT, MaxSAT, QBF, ASP, CSP, MIP, and graph problems.
The method outperformed random forest baselines trained on domain-specific features in 10 of 11 test scenarios with a single configuration, and all 11 scenarios with two-seed voting.
Key design choices include inverse-distance weighting, line shuffling, and Manhattan distance as identified through ablation study.
Combining embeddings with traditional hand-crafted features via soft voting yielded further improvements on competitive scenarios.

A research team led by Stefan Szeider has proposed ZeroFolio, a domain-agnostic approach to algorithm selection that eliminates the need for hand-engineered features. Rather than extracting problem-specific characteristics, the method treats raw instance files as plain text, encodes them with pretrained embeddings, and applies weighted k-nearest neighbors for solver selection.

The core innovation rests on an empirical observation: pretrained language model embeddings capture structural distinctions between problem instances without explicit domain knowledge or task-specific fine-tuning. This permits the same three-step pipeline—serialize, embed, select—to work across unrelated problem classes.

The authors evaluated ZeroFolio on 11 scenarios spanning seven distinct combinatorial optimization domains: satisfiability, maximum satisfiability, quantified Boolean formulas, answer set programming, constraint satisfaction, mixed-integer programming, and graph problems. Against random forest classifiers built on conventional hand-crafted features, ZeroFolio outperformed baselines in 10 of 11 scenarios using a single fixed hyperparameter set, and in all 11 scenarios when ensemble voting with two random seeds was applied.

Ablation analysis identified three critical design decisions: inverse-distance weighting for neighbor contribution, random line shuffling during text preprocessing, and Manhattan distance as the similarity metric. On datasets where both approaches showed comparable performance, combining embeddings with traditional features through soft voting produced measurable gains.

Sources

01arXiv cs.AI — Algorithm Selection with Zero Domain Knowledge via Text Embeddings

Also on Research

Text embeddings replace domain knowledge in algorithm selection across seven problem classes

Researchers propose SeT-Diff, a diffusion-based foundational model for HPC telemetry and time-series

Researchers propose C-VCE, a diffusion-based framework for visual counterfactual explanations in vision models

New multi-agent framework uses quantum-classical loops to improve protein structure prediction