The Atlantic publishes searchable database of music used to train AI models
Reporter uncovers four datasets totaling over 21 million tracks, some confirmed used by Google and Stability AI, and makes them publicly searchable.
1 source · cross-referenced
- A reporter at The Atlantic identified and made searchable four datasets containing over 21 million music tracks used to train AI models.
The Atlantic’s Alex Reisner documented four datasets containing music used to train AI models and published a searchable interface for the public. Two datasets each contain 12 million and 9 million tracks, respectively, while the other two each contain more than 100,000 tracks.
Reisner reports the datasets have been downloaded thousands of times, and Google and Stability AI have confirmed using at least some of them in research papers. Some sources, such as the Free Music Archive dataset, are intended for personal use with separate licensing for commercial applications.
The datasets are distributed as lists of links to songs on YouTube or Spotify. Developers typically download audio using automated tools that can bypass platform safeguards like logins or ads, which may violate the platforms’ terms of service.
The searchable database is hosted on The Atlantic’s AI Watchdog site, where users can look up individual songs, albums, or artists used in AI training data.
- Jun 19, 2026 · The Verge — AI
Amazon employees accuse employer of retaliatory HR meetings after testifying on data center moratorium
Trust74 - Jun 18, 2026 · The Verge — AI
Midjourney unveils ultrasound-based full-body scanner as first hardware product
Trust72 - Jun 17, 2026 · TechCrunch — AI
Pew: Only 16% of Americans expect AI to have a positive societal impact over the next 20 years
Trust79