Skip to content
Culture · Jun 20, 2026

The Atlantic publishes searchable database of music used to train AI models

Reporter uncovers four datasets totaling over 21 million tracks, some confirmed used by Google and Stability AI, and makes them publicly searchable.

Trust79
HypeLow hype

1 source · cross-referenced

ShareXLinkedInEmail
TL;DR
  • A reporter at The Atlantic identified and made searchable four datasets containing over 21 million music tracks used to train AI models.

The Atlantic’s Alex Reisner documented four datasets containing music used to train AI models and published a searchable interface for the public. Two datasets each contain 12 million and 9 million tracks, respectively, while the other two each contain more than 100,000 tracks.

Reisner reports the datasets have been downloaded thousands of times, and Google and Stability AI have confirmed using at least some of them in research papers. Some sources, such as the Free Music Archive dataset, are intended for personal use with separate licensing for commercial applications.

The datasets are distributed as lists of links to songs on YouTube or Spotify. Developers typically download audio using automated tools that can bypass platform safeguards like logins or ads, which may violate the platforms’ terms of service.

The searchable database is hosted on The Atlantic’s AI Watchdog site, where users can look up individual songs, albums, or artists used in AI training data.

Sources
  1. 01The Verge — AIThe Atlantic created a searchable database of the music used to train AI
Also on Culture

Stories may contain errors. Dispatch is assembled with AI assistance and curated by human editors; despite the trust-score filter, mistakes happen. We correct publicly — every article links to its revision history. Nothing here is financial, legal, or medical advice. Verify before relying on any claim.

© 2026 Dispatch. No ads. No sponsorships. No paid placement. Reader-supported via Ko-fi.

Built by a person who cares about honest AI news.