Skip to content
Models · Jun 17, 2026

AllenAI releases MolmoMotion, a language-guided 3D motion forecasting model with new dataset and benchmark

MolmoMotion predicts future 3D trajectories of object points from video frames and language instructions, outperforming prior methods. AllenAI also releases MolmoMotion-1M dataset and PointMotionBench benchmark.

Trust79
HypeLow hype

1 source · cross-referenced

ShareXLinkedInEmail
TL;DR
  • MolmoMotion predicts future 3D trajectories of object points from video frames and language instructions, outperforming prior motion forecasting methods.

Allen Institute for AI (AllenAI) released MolmoMotion, a model that forecasts future 3D trajectories of object points given a video frame, a sparse set of 3D points on an object, and a language instruction describing the intended action. The model outperforms existing motion forecasting methods on downstream tasks such as robotics planning and controllable video generation.

To support training and evaluation, AllenAI introduced MolmoMotion-1M, a dataset containing 1.16 million videos paired with 3D point trajectories and action descriptions, and PointMotionBench, a human-validated benchmark with 2.7K video clips designed to measure object-centric 3D motion forecasting accuracy.

MolmoMotion uses Molmo 2 as its backbone and represents motion as object-attached 3D points in world space, enabling class-agnostic, view-stable, and downstream-usable trajectory predictions. The model is available in two variants: an autoregressive model (MolmoMotion-AR) for well-defined future paths and a flow-matching model (MolmoMotion-FM) for representing uncertainty in ambiguous scenarios.

The dataset MolmoMotion-1M was constructed using an automated pipeline that extracts object-grounded 3D trajectories from unconstrained videos, filtering noisy tracks and segmenting clips to focus on meaningful motion intervals. The benchmark PointMotionBench was designed to evaluate forecasting accuracy with human validation.

Sources
  1. 01Hugging FaceMolmoMotion: Language-guided 3D motion forecasting
Also on Models

Stories may contain errors. Dispatch is assembled with AI assistance and curated by human editors; despite the trust-score filter, mistakes happen. We correct publicly — every article links to its revision history. Nothing here is financial, legal, or medical advice. Verify before relying on any claim.

© 2026 Dispatch. No ads. No sponsorships. No paid placement. Reader-supported via Ko-fi.

Built by a person who cares about honest AI news.