Skip to content
Tools · May 6, 2026

Transformers 5.8.0 adds DeepSeek-V4, Gemma 4 Assistant, and five new models

The popular open-source library released version 5.8.0 with support for multiple new language and multimodal models, including DeepSeek's next-generation MoE architecture and IBM's enterprise document extraction model.

Trust79
HypeLow hype

1 source · cross-referenced

ShareXLinkedInEmail
TL;DR
  • Transformers 5.8.0 adds official support for DeepSeek-V4, a next-generation MoE language model with hybrid attention and manifold-constrained hyper-connections, available in Flash and Pro variants.
  • Gemma 4 Assistant model integrated for speculative decoding using multi-token prediction, enabling faster inference on Gemma 4 base models through KV cache reuse.
  • New models added include Granite Speech Plus for multimodal speech-to-text transcription, Granite Vision 4.1 for document data extraction, EXAONE-4.5 vision language model, and PP-FormulaNet for table structure recognition.
  • Breaking change: Apex integration removed; users should migrate to PyTorch native equivalents for mixed precision and fused operations.
  • Tokenization fixes addressed DeepSeek R1 distilled model mapping and resolved 300x performance regression in convert_ids_to_tokens with special token handling.

Hugging Face released Transformers version 5.8.0 on May 5, 2025, adding support for six new model families. The release spans language models, specialized assistants, and multimodal systems, indicating both market adoption velocity and the library's role as a central distribution point for emerging architectures.

DeepSeek-V4 represents a step forward in mixture-of-experts design. The model departs from DeepSeek-V3's multi-head latent attention in favor of hybrid local and long-range attention. It replaces residual connections with manifold-constrained hyper-connections and uses a static token-id to expert-id hash table for its first MoE layers. The release covers Flash, Pro, and Base variants, acknowledging different scale and performance trade-offs.

Gemma 4 Assistant is purpose-built for speculative decoding on top of Gemma 4 base models. It uses multi-token prediction and KV cache sharing with the target model to draft tokens faster, reducing latency for inference pipelines. The architecture includes cross-attention to improve prediction accuracy.

Four additional models round out the release: Granite Speech Plus adds multimodal speech-to-text transcription with speaker annotation; Granite Vision 4.1 targets document extraction for enterprise use with chart-to-code and table recognition; EXAONE-4.5 is LG AI's first open-weight vision language model with 33 billion parameters and 256K context support; PP-FormulaNet handles mathematical formula and table structure recognition from images.

The release includes a breaking change: removal of Apex integration across the library and affected models like T5. Users relying on Apex for mixed precision or fused operations must switch to PyTorch native equivalents. Tokenization improvements addressed mapping inconsistencies for DeepSeek R1 Distilled and resolved a significant performance regression in special token handling that yielded approximately 300x speedup in convert_ids_to_tokens.

Sources
  1. 01GitHub · huggingface/transformers releasesRelease 5.8.0
Also on Tools

Stories may contain errors. Dispatch is assembled with AI assistance and curated by human editors; despite the trust-score filter, mistakes happen. We correct publicly — every article links to its revision history. Nothing here is financial, legal, or medical advice. Verify before relying on any claim.

© 2026 Dispatch. No ads. No sponsorships. No paid placement. Reader-supported via Ko-fi.

Built by a person who cares about honest AI news.