Nvidia allegedly trained its AI on 500 terabytes of pirated books, says court filing

0
1كيلو بايت

Nvidia allegedly trained its AI on 500 terabytes of pirated books, says court filing

Nvidia is in the midst of a class action lawsuit brought about by several authors, citing alleged copyright infringement for the company's LLM AI models. New documents from that case have come to light, showing that Nvidia employees directly requested access to 500 terabytes of book archives known to contain pirated data.

The documents come from the complainant in this case and show emails from Nvidia employees requesting access to the Anna's Archive repository of books and other online works. The documents then suggest that it was made clear to the Nvidia employee that this archive contained "millions of pirated books" and that despite this "the green light" was given to access the data.

What's more, the documents, which were shared by Torrentfreak, allege that Anna's Archive also offered Nvidia access to "several million books from Internet Archive," which were normally only accessible through the Internet Archive's digital lending system. The filing concludes this section by saying that "by downloading Anna's Archive, Nvidia pirated additional copies of Plaintiff's Infringed Works."

The authors also go on to accuse Nvidia of using other pirated sources, such as the Books3 database, LibGen, Sci-Hub, and Z-Library.

nvidia anna archive lawsuit doc 01

Anna's Archive is an open source search engine and is also considered by some to be what's known as a shadow library. A shadow library is an online repository of freely available data that is otherwise normally paywalled or access-restricted. The focus of these repositories often tends to be scientific papers and scholarly journals, but can also extend to general interest books, audiobooks, comics, and more.

Anna's Archive proclaims itself the "largest truly open library in human history" and aggregates several other shadow libraries, such as LibGen, Sci-Hub, and Z-Library. These sites claim to be preserving online data, but do so by openly providing access to otherwise copyrighted material.

nvidia anna archive lawsuit doc 02

No proof of the data being used is shown in the documents, and no mention is made of Nvidia exchanging money for the data. Plus, Nvidia has yet to comment directly on this particular filing.

However, it has previously admitted to using the likes of the Books3 dataset, which includes many copyrighted works. Defending this use, Nvidia claimed that it's not liable to copyright law, as AI models don't read in the way that humans do, but simply "measure[s] statistical correlations in the aggregate, across a vast body of data."

"Plaintiffs cannot use copyright to preclude access to facts and ideas, and the highly transformative training process is protected entirely by the well-established fair-use doctrine. […] Indeed, to accept Plaintiffs' theory would mean that an author could copyright the rules of grammar or basic facts about the world. That has never been the law, for good reason," the company concluded in this previous response.

البحث
الأقسام
إقرأ المزيد
Technology
Alexa+ is now available to everyone in the US, and free for Prime members
Alexa+ is now available to everyone in the US, and free for Prime members...
بواسطة Test Blogger7 2026-02-04 17:00:19 0 987
Technology
How to spot AI e-books and audiobooks in the Kindle Store and beyond
How to spot AI e-books and audiobooks in the Kindle Store and beyond...
بواسطة Test Blogger7 2026-03-04 11:00:24 0 235
Music
Better Metallica Album — 'Master of Puppets' vs. 'Black Album'?
VOTE: Better Metallica Album — 'Master of Puppets' vs. 'The Black Album'Elektra (2) / Getty...
بواسطة Test Blogger4 2026-03-02 22:00:05 0 291
Religion
Why Our Spouses Get Our Leftover Prayers (And How to Change It) - Crosswalk Couples Devotional - January 29
Why Our Spouses Get Our Leftover Prayers (And How to Change It) - Crosswalk Couples Devotional -...
بواسطة Test Blogger5 2026-01-29 07:00:17 0 1كيلو بايت
Home & Garden
Sellout Risk—The No. 2 Best-Selling Style from the Michael Kors Outlet Is Now Marked Down to $59
Sellout Risk—The No. 2 Best-Selling Style from the Michael Kors Outlet Is Now Marked Down to $59...
بواسطة Test Blogger9 2026-03-06 15:00:15 0 158