Zyphra releases ZAYA1-8B, a 700M-parameter reasoning model competitive with larger alternatives
The mixture-of-experts model trained on AMD infrastructure demonstrates strong performance on mathematics and coding benchmarks while introducing a new test-time compute method called Markovian RSA.
1 source · single source
- Zyphra published a technical report on ZAYA1-8B, a reasoning-focused mixture-of-experts model with 700M active parameters and 8B total parameters trained entirely on AMD compute infrastructure.
- The model matches or exceeds DeepSeek-R1-0528 on mathematics and coding benchmarks despite using less than 1B active parameters, according to the authors' evaluation.
- Post-training includes a four-stage reinforcement learning cascade covering math, puzzles, code, and instruction following, with additional behavioral RL for chat tasks.
- Zyphra introduces Markovian RSA, a test-time compute method that aggregates parallel reasoning traces while maintaining only a 4K-token tail between rounds.
- The model achieves 91.9% on AIME'25 and 89.6% on HMMT'25 benchmarks using test-time compute, narrowing performance gaps to much larger models like Gemini-2.5 Pro and DeepSeek-V3.2.
Zyphra has published a technical report on ZAYA1-8B, a mixture-of-experts language model designed specifically for reasoning tasks. The model contains 700M active parameters drawn from a total of 8B parameters, built on Zyphra's MoE++ architecture. The entire training pipeline—pretraining, midtraining, and supervised fine-tuning—was executed on AMD's compute, networking, and software platform, establishing the model as a result of a complete AMD stack implementation.
The authors report that ZAYA1-8B matches or exceeds the performance of DeepSeek-R1-0528 on several challenging mathematics and coding benchmarks, despite operating with substantially fewer active parameters. The model was trained from inception for reasoning, incorporating reasoning-focused data from the pretraining phase onward through an answer-preserving trimming scheme. Post-training employs a four-stage reinforcement learning cascade: initial reasoning warmup on mathematical and puzzle tasks, a 400-task curriculum via RLVE-Gym, combined math and code reinforcement learning using test-time compute traces and synthetic environments derived from competitive programming references, and finally behavioral RL to optimize performance on chat and general instruction-following tasks.
The report introduces Markovian RSA, a novel test-time compute method designed to recursively aggregate multiple parallel reasoning traces while constraining the reasoning state carried forward between rounds to only 4K tokens. Under this test-time compute approach, ZAYA1-8B achieves 91.9% accuracy on AIME'25 and 89.6% on HMMT'25—significant performances that narrow the gap to substantially larger open models including Gemini-2.5 Pro, DeepSeek-V3.2, and GPT-5-High. The technical details of the training process, architectural choices, and evaluation methodology are documented across 18 named authors affiliated with Zyphra.
- May 20, 2026 · TechCrunch
Stability AI releases Stable Audio 3.0 with models capable of generating six-minute compositions
Trust52 - May 20, 2026 · Allen Institute / Hugging Face
Allen Institute releases OlmoEarth v1.1, a satellite imagery model that cuts inference costs threefold
Trust74 - May 19, 2026 · Google AI — Blog
Google's AI Mode search feature surpasses one billion monthly active users one year after U.S. launch
Trust67