Models · May 8, 2026

Zyphra releases ZAYA1-8B, a 700M-parameter reasoning model competitive with larger alternatives

The mixture-of-experts model trained on AMD infrastructure demonstrates strong performance on mathematics and coding benchmarks while introducing a new test-time compute method called Markovian RSA.

Trust76

HypeLow hype

1 source · single source

ShareX LinkedIn Email

TL;DR

Zyphra published a technical report on ZAYA1-8B, a reasoning-focused mixture-of-experts model with 700M active parameters and 8B total parameters trained entirely on AMD compute infrastructure.
The model matches or exceeds DeepSeek-R1-0528 on mathematics and coding benchmarks despite using less than 1B active parameters, according to the authors' evaluation.
Post-training includes a four-stage reinforcement learning cascade covering math, puzzles, code, and instruction following, with additional behavioral RL for chat tasks.
Zyphra introduces Markovian RSA, a test-time compute method that aggregates parallel reasoning traces while maintaining only a 4K-token tail between rounds.
The model achieves 91.9% on AIME'25 and 89.6% on HMMT'25 benchmarks using test-time compute, narrowing performance gaps to much larger models like Gemini-2.5 Pro and DeepSeek-V3.2.

Zyphra has published a technical report on ZAYA1-8B, a mixture-of-experts language model designed specifically for reasoning tasks. The model contains 700M active parameters drawn from a total of 8B parameters, built on Zyphra's MoE++ architecture. The entire training pipeline—pretraining, midtraining, and supervised fine-tuning—was executed on AMD's compute, networking, and software platform, establishing the model as a result of a complete AMD stack implementation.

The authors report that ZAYA1-8B matches or exceeds the performance of DeepSeek-R1-0528 on several challenging mathematics and coding benchmarks, despite operating with substantially fewer active parameters. The model was trained from inception for reasoning, incorporating reasoning-focused data from the pretraining phase onward through an answer-preserving trimming scheme. Post-training employs a four-stage reinforcement learning cascade: initial reasoning warmup on mathematical and puzzle tasks, a 400-task curriculum via RLVE-Gym, combined math and code reinforcement learning using test-time compute traces and synthetic environments derived from competitive programming references, and finally behavioral RL to optimize performance on chat and general instruction-following tasks.

The report introduces Markovian RSA, a novel test-time compute method designed to recursively aggregate multiple parallel reasoning traces while constraining the reasoning state carried forward between rounds to only 4K tokens. Under this test-time compute approach, ZAYA1-8B achieves 91.9% accuracy on AIME'25 and 89.6% on HMMT'25—significant performances that narrow the gap to substantially larger open models including Gemini-2.5 Pro, DeepSeek-V3.2, and GPT-5-High. The technical details of the training process, architectural choices, and evaluation methodology are documented across 18 named authors affiliated with Zyphra.

Sources

01arXiv cs.AI — ZAYA1-8B Technical Report

Also on Models

Zyphra releases ZAYA1-8B, a 700M-parameter reasoning model competitive with larger alternatives

Claude Code confirmed using Bun’s Rust port in production

Moonshot AI releases Kimi K3 open source model, touting frontier-level performance

OpenAI CFO proposes scorecard to measure AI ROI