Authors Sue Apple Over Alleged Use of Pirated Books in AI Training

Apple faces a class action lawsuit claiming it trained its AI systems with pirated books. Authors Grady Hendrix and Jennifer Roberson filed the suit in the U.S. District Court for the Northern District of California. They accuse Apple of using a dataset called “Books3,” which allegedly contained thousands of pirated works sourced from the shadow library Bibliotik.

The Core of the Complaint

The plaintiffs argue that Books3 was included in the RedPajama dataset, which Apple used for its OpenELM models in 2024. Since those models supported Apple Intelligence, the lawsuit suggests that Apple likely trained its Foundation Language Models with similar data. The authors stress that Apple never sought permission or offered compensation for using their copyrighted material.

The lawsuit demands a jury trial, financial damages, restitution, attorney’s fees, and even the destruction of Apple Intelligence models trained on the disputed data.

Industry Parallels and Precedents

The case echoes a recent $1.5 billion settlement between Anthropic and authors over AI book piracy. While Apple has not been accused of directly scraping works, the suit highlights its use of datasets with questionable origins.

Apple’s Ethical Claims

Apple has repeatedly insisted on ethical AI training. The company has previously paid publishers millions for access to content and secured image licenses from Shutterstock in 2024. In July, Apple reaffirmed that it respects publishers’ choices by following robots.txt restrictions on websites. This contrasts with some competitors who bypass such safeguards.

What’s Next

As the case unfolds, the outcome could reshape how tech companies source data for AI training. Apple’s strong public stance on ethical practices will now be tested in court.