Research · May 3, 2026

Apple researchers develop pseudo-annotation pipeline to expand sign language datasets

A new machine learning approach addresses the annotation bottleneck limiting AI-driven sign language interpretation, using sparse model predictions and few-shot learning to scale labeling of video datasets.

Trust76

HypeLow hype

1 source · cross-referenced

ShareX LinkedIn Email

TL;DR

Apple researchers created a pseudo-annotation pipeline that automatically generates ranked annotations for sign language video, including glosses, fingerspelled words, and classifiers, reducing manual annotation costs at scale.
The team established baseline models for fingerspelling recognition (6.7% character error rate on FSBoard) and isolated sign recognition (74% top-1 accuracy on ASL Citizen), achieving state-of-the-art performance.
A professional interpreter manually annotated nearly 500 videos from ASL STEM Wiki; Apple is releasing these human annotations and over 300 hours of pseudo-annotations as supplemental material alongside their CVPR paper.
The work targets a documented problem in sign language AI: existing datasets like ASL STEM Wiki and FLEURS-ASL contain hundreds of hours of video from professional interpreters but remain partially annotated due to prohibitive annotation costs.

Apple's machine learning research team developed a pseudo-annotation system designed to reduce the annotation burden that has constrained sign language AI development. The approach combines sparse model predictions from fingerspelling and isolated sign recognition systems with a few-shot language model strategy to automatically label video sequences with glosses, fingerspelled words, and sign classifiers, then ranks candidates by confidence.

The researchers first established baseline models for two foundational tasks. Their fingerspelling recognizer achieved 6.7% character error rate on the FSBoard benchmark, while their isolated sign recognizer reached 74% top-1 accuracy on the ASL Citizen dataset, both representing state-of-the-art results on those benchmarks at time of publication.

To validate the pipeline's output, a professional interpreter manually annotated 500 videos from the ASL STEM Wiki dataset at the sequence level, creating a gold-standard benchmark. Apple is releasing this professional annotation set alongside over 300 hours of pseudo-annotations in supplemental material accompanying their CVPR paper, making both resources available to the broader research community.

Sources

01Apple — Machine Learning Research — Bootstrapping Sign Language Annotations with Sign Language Models

Also on Research

Apple researchers develop pseudo-annotation pipeline to expand sign language datasets

Apple researchers introduce inference-time feedback system for tool-calling agents

Apple researchers propose normalizing flow-based model for video generation as alternative to diffusion

Researchers present Bayesian framework for replacing end-of-life language models in production