Researchers propose Wiola, a new small language model architecture with five novel components
The Wiola architecture introduces Spiral Rotary Positional Encoding, Gated Cross-Layer Attention, Adaptive Token Merging, Dual Stream Feed-Forward, and WiolaRMSNorm, with four model sizes released under an open license.
1 source · cross-referenced
- Wiola is a new small language model architecture built from first principles, sharing no structural lineage with existing model families such as GPT, LLaMA, Mistral, or Falcon.
- The architecture includes five novel components: Spiral Rotary Positional Encoding, Gated Cross-Layer Attention, Adaptive Token Merging, Dual Stream Feed-Forward, and WiolaRMSNorm.
- Wiola is released in four sizes (120M, 360M, 700M, and 1.5B parameters) and is fully compatible with the HuggingFace Transformers ecosystem.
- The paper provides mathematical derivations, architectural diagrams, complexity analyses, and comparisons against GPT-2, LLaMA-2, and Mistral.
Researchers have proposed Wiola, a new Small Language Model (SLM) architecture designed from first principles and explicitly positioned as having no structural lineage with existing model families such as GPT, LLaMA, Mistral, or Falcon. The architecture introduces five independently novel components intended to improve efficiency and representational capacity in small models.
The first component, Spiral Rotary Positional Encoding (SRPE), embeds token positions on a three-dimensional helical manifold that integrates absolute, relative, and hierarchical positional signals. The second, Gated Cross-Layer Attention (GCLA), grants each decoder layer soft cross-attention access to compressed summaries of the two preceding layers to improve inter-layer coherence. The third, Adaptive Token Merging (ATM), dynamically merges semantically redundant adjacent tokens in middle network layers to reduce attention complexity without information loss. The fourth, Dual Stream Feed-Forward (DSFF), replaces the conventional MLP with two parallel streams fused by a learned per-dimension gate. The fifth, WiolaRMSNorm, modifies normalization by introducing a per-dimension learned offset vector to prevent representation collapse.
The authors provide complete mathematical derivations, architectural block diagrams, complexity analyses, and systematic comparisons against GPT-2, LLaMA-2, and Mistral. They also release Wiola in four sizes—120M, 360M, 700M, and 1.5B parameters—and state full compatibility with the HuggingFace Transformers ecosystem, with all 22 architectural unit tests passing.
The preprint is dated July 1, 2026, and is available on arXiv under the cs.AI category.
- Jul 3, 2026 · arXiv cs.AI
Neuro-symbolic framework PACE generates feasibility-aware counterfactual explanations for ML models
Trust79 - Jul 3, 2026 · arXiv cs.AI
Researchers propose Auto-FL-Research, an agentic workflow for automating federated learning algorithm design
Trust79 - Jul 2, 2026 · arXiv cs.CL
Researchers propose Loom, a framework to improve creative writing assistance by separating narrative intent from rendering density
Trust76