Skip to content
Tools · Apr 29, 2026

Granite 4.1 LLMs use five-stage pre-training and multi-stage reinforcement learning to achieve dense model efficiency

IBM's Granite 4.1 family includes dense models at 3B, 8B, and 30B parameters trained on 15 trillion tokens, with the smaller 8B variant matching performance of the prior 32B mixture-of-experts model through refined data curation and GRPO-based reinforcement learning.

Trust79
HypeLow hype

1 source · cross-referenced

ShareXLinkedInEmail
TL;DR
  • Granite 4.1 comprises three dense decoder-only models (3B, 8B, 30B) trained on ~15T tokens across five pre-training phases that progressively shift from broad web data to curated, domain-specific content.
  • The five-phase pipeline includes general pre-training (10T tokens), math/code pre-training (2T tokens), two stages of high-quality data annealing (2.5T tokens total), and long-context extension training extending to 512K tokens.
  • Supervised fine-tuning uses an LLM-as-Judge framework with multi-dimensional quality rubrics to filter ~4.1M high-quality samples, rejecting hallucinations, false premises, and incorrect computations automatically.
  • The 8B instruct variant matches or exceeds Granite 4.0-H-Small (32B MoE) performance despite using fewer parameters and a simpler dense architecture, trained with multi-stage RL via on-policy GRPO with DAPO loss.
  • All models released under Apache 2.0 license with publicly available code, weights, and documentation on Hugging Face and GitHub.

IBM's Granite 4.1 family comprises three decoder-only models—sized at 3B, 8B, and 30B parameters—all trained on approximately 15 trillion tokens using a five-phase pre-training strategy. The approach prioritizes data quality and composition changes over simple scale expansion, with each phase employing a distinct data mixture and learning-rate schedule.

The first two phases establish foundational language understanding: Phase 1 draws on general web data (59% CommonCrawl), code (20%), and math (7%), while Phase 2 sharpens reasoning capability by increasing math to 35% and code to 30%. Phases 3 and 4 shift toward mid-training with progressively higher-quality data, incorporating chain-of-thought reasoning trajectories and instruction data. Phase 5 extends the context window from 4K to 512K tokens through staged long-context extension on books and code repositories.

Supervised fine-tuning relies on an LLM-as-Judge framework that automatically filters ~4.1M samples against structural, semantic, and behavioral criteria. The system rejects hallucinations, false premises, and incorrect computations while maintaining contextual information such as retrieved documents and tool outputs as non-evaluated context.

The 8B instruct variant achieves performance parity with or exceeds Granite 4.0-H-Small, a 32B mixture-of-experts model, despite operating with fewer parameters and simpler dense architecture. This result was achieved through multi-stage reinforcement learning using on-policy GRPO with DAPO loss (as documented in Yu et al., 2025). All three model sizes employ identical training procedures, differing only in dimensional parameters such as embedding size and layer counts.

The models are released under the Apache 2.0 license with code available on GitHub and model weights accessible via Hugging Face, enabling external validation and reproducibility of the training methodology.

Sources
  1. 01Hugging FaceGranite 4.1 LLMs: How They're Built
Also on Tools

Stories may contain errors. Dispatch is assembled with AI assistance and curated by human editors; despite the trust-score filter, mistakes happen. We correct publicly — every article links to its revision history. Nothing here is financial, legal, or medical advice. Verify before relying on any claim.

© 2026 Dispatch. No ads. No sponsorships. No paid placement. Reader-supported via Ko-fi.

Built by a person who cares about honest AI news.