Fallo Histórico sobre IA y Copyright: Juez Valida Entrenamiento de Anthropic como ‘Uso Justo’ pero Condena su ‘Biblioteca Pirata’
En una decisión de 32 páginas, un tribunal federal de California dictamina que el proceso de entrenar a la IA Claude fue 'transformador', pero la adquisición inicial de millones de libros de sitios piratas por parte de la compañía es una infracción. El fallo también aprueba el controvertido método de 'escaneo destructivo' de libros físicos comprados legalmente
En una de las decisiones judiciales más significativas y detalladas hasta la fecha sobre los derechos de autor en la era de la inteligencia artificial, un juez federal ha trazado una línea crítica en la arena legal, emitiendo un veredicto matizado que podría redefinir las prácticas de toda una industria. En el caso de los autores Andrea Bartz, Charles Graeber y Kirk Wallace Johnson contra la firma de IA Anthropic, el Juez William Alsup ha dictaminado que, si bien el uso de obras protegidas por derechos de autor para entrenar un modelo de IA es un “uso justo” (fair use), el acto de construir la biblioteca de datos para ese entrenamiento a través de la piratería masiva no lo es.
El fallo, que deniega parcialmente la moción de juicio sumario de Anthropic, desvela las prácticas de la compañía para alimentar su aclamado modelo de lenguaje Claude, que genera más de mil millones de dólares en ingresos anuales. Estas prácticas incluyeron la descarga de millones de libros de repositorios piratas y, posteriormente, una operación multimillonaria para comprar y “escanear destructivamente” libros físicos, un proceso en el que se desencuadernaron, cortaron y digitalizaron para luego ser desechados.
La decisión establece que Anthropic deberá enfrentarse a un juicio por su uso de material pirateado, mientras que simultáneamente valida aspectos clave de sus métodos de entrenamiento, creando un complejo precedente que será analizado minuciosamente por ejércitos de abogados en los sectores tecnológico y editorial.
La Anatomía de una Infracción: De la Piratería a la Biblioteca Permanente
La orden del tribunal detalla cómo Anthropic, fundada en enero de 2021, se embarcó desde el principio en una agresiva campaña de adquisición de datos. Para evitar lo que el cofundador y CEO Dario Amodei describió como el “lento proceso legal/práctico/de negocios” de las licencias con los editores, la compañía recurrió directamente a la piratería.
Entre 2021 y 2022, Anthropic descargó colecciones masivas de libros de sitios notorios:
- Books3: Una biblioteca en línea con 196,640 libros, que Anthropic sabía que contenía copias no autorizadas.
- Library Genesis (LibGen): Se descargaron al menos cinco millones de copias de libros de este conocido sitio pirata.
- Pirate Library Mirror (PiLiMi): Se adquirieron al menos dos millones de copias de libros adicionales de este repositorio.
En total, Anthropic acumuló “más de siete millones de copias de libros” de fuentes piratas, incluyendo múltiples obras de los autores demandantes como “The Herd” de Bartz, “The Good Nurse” de Graeber y “The Feather Thief” de Johnson. Estas copias digitales, en formatos como .pdf y .epub, fueron integradas en una “biblioteca de investigación” centralizada o “área de datos generalizada”. El plan de la compañía, según evidencia interna, era “almacenar todo para siempre”, reteniendo las copias incluso si se decidía no utilizarlas nunca para el entrenamiento de IA.
El Giro Estratégico: La Misión de Obtener “Todos los Libros del Mundo”
A medida que las preocupaciones legales sobre el uso de datos pirateados crecían internamente, Anthropic cambió su estrategia. En febrero de 2024, la compañía contrató a Tom Turvey, el exjefe de asociaciones del proyecto de escaneo de libros de Google, con una directiva ambiciosa: obtener “todos los libros del mundo”
En lugar de negociar licencias, lo cual Turvey intentó brevemente antes de dejar “marchitar esas conversaciones” , Anthropic se embarcó en una campaña de compra masiva de libros físicos, a menudo usados, invirtiendo “muchos millones de dólares”. A continuación, se llevó a cabo un proceso de “escaneo destructivo”:
- Los proveedores de servicios de Anthropic “despojaron los libros de sus encuadernaciones”.
- Las páginas fueron cortadas a un tamaño manejable.
- Las páginas sueltas fueron escaneadas, creando un archivo PDF con texto legible por máquina para cada libro.
- Los “originales en papel” fueron desechados.
Este método, aunque legal en su adquisición inicial, generó millones de copias digitales que se sumaron a la biblioteca central de Anthropic, junto con las copias pirateadas previamente obtenidas.
El Bisturí del Juez Alsup: Separando los “Usos” para el Análisis de “Fair Use”
El corazón de la decisión del Juez Alsup reside en su negativa a tratar las acciones de Anthropic como un único evento monolítico. En su lugar, diseccionó el proceso en tres “usos” distintos y aplicó a cada uno el análisis de cuatro factores del “uso justo” (fair use).
- 1. El Entrenamiento como “Espectacularmente Transformador”: El juez consideró que el uso de los libros dentro del proceso de entrenamiento de la IA era un uso justo. La finalidad no era reproducir o sustituir los libros originales, sino extraer patrones estadísticos del lenguaje para enseñar a una máquina a leer y escribir. El resultado era un producto completamente nuevo (un modelo de IA) que no ofrecía a los usuarios copias de las obras de los autores. El juez lo calificó como “espectacularmente transformador”, y este propósito superó la naturaleza creativa de las obras y el hecho de que se utilizaran en su totalidad.
- 2. La Conversión de Formato como “Uso Justo”: El fallo también validó el “escaneo destructivo” de los libros comprados como un uso justo, pero por una razón diferente. El juez determinó que este acto era transformador porque cambiaba el formato de la obra para un propósito distinto y no explotador: la eficiencia del almacenamiento y la capacidad de búsqueda. Citando precedentes como
Sony Corp. v. Universal City Studios, Inc. (Sony Betamax), el juez razonó que, al igual que grabar un programa de televisión para verlo más tarde, cambiar el formato de un libro que ya se posee legalmente para un uso interno no usurpa los derechos del autor, siempre y cuando el original sea reemplazado (en este caso, destruido) y la nueva copia no se distribuya.
- 3. La Biblioteca Pirateada como Infracción Flagrante: El tribunal fue inflexible al dictaminar que la adquisición inicial de copias piratas no era un uso justo. El propósito de esta acción era simplemente el de “construir una biblioteca de investigación sin pagar por ella”, sustituyendo directamente el mercado legítimo de los libros. El juez rechazó el argumento de Anthropic de que un futuro uso justo (el entrenamiento) podría excusar la infracción inicial. En una frase contundente, la orden establece: “No hay una excepción en la Ley de Copyright para las empresas de IA”. El juez concluyó que esta práctica, si se permitiera, “destruiría el mercado editorial”.
Un Veredicto con Réplicas: Implicaciones para la Industria
La decisión del Juez Alsup envía ondas de choque a través de múltiples industrias:
- Para la Industria de la IA: Se establece un camino riesgoso pero claro. El uso de datasets de entrenamiento obtenidos de la piratería, una práctica que se sospecha extendida, queda confirmado como una infracción sujeta a graves responsabilidades legales. Sin embargo, el modelo de “comprar y escanear destructivamente” emerge como una alternativa legalmente viable, aunque inmensamente cara, a las complejas y a menudo infructuosas negociaciones de licencias.
- Para Autores y Editores: El fallo representa una victoria significativa contra la piratería, reafirmando que las empresas de IA no pueden simplemente tomar contenido de Internet sin consecuencias. Refuerza su posición para exigir compensación por el uso de obras pirateadas. No obstante, la validación del escaneo de copias compradas limita su capacidad para controlar o monetizar ese canal específico de adquisición de datos.
- Para el Futuro del Copyright: El fallo es una clase magistral sobre la aplicación de una doctrina centenaria a una tecnología del siglo XXI. Al separar meticulosamente los “usos”, el Juez Alsup evita una solución única para un problema multifacético y afirma que cada eslabón de la cadena de suministro de datos de la IA debe ser legalmente justificable por sí mismo.
Anthropic ahora se prepara para un juicio centrado en los daños, ya sean reales o estatutarios, por la infracción deliberada de millones de derechos de autor. La batalla legal está lejos de terminar, pero el Juez Alsup ha proporcionado un detallado mapa legal que guiará sin duda las futuras escaramuzas en la frontera entre la creatividad humana y la inteligencia artificial.
¿Rompieron libros?
Sí, la empresa de inteligencia artificial Anthropic rompió y destruyó millones de libros físicos como parte de su proceso para entrenar a sus modelos de IA, como Claude.
Este proceso, denominado “escaneo destructivo” en los documentos judiciales, consistía en varios pasos:
- Primero, Anthropic gastó millones de dólares en comprar libros impresos, a menudo en grandes cantidades y en condición de usados.
- Luego, sus proveedores de servicios despojaron los libros de sus encuadernaciones.
- Cortaron las páginas a un tamaño manejable para poder escanearlas eficientemente.
- Escanearon las páginas sueltas para crear archivos digitales en formato PDF con texto legible por máquina.
- Finalmente, una vez completada la digitalización, desecharon los originales en papel.
El objetivo era crear una vasta “biblioteca de investigación” digital para alimentar y entrenar a sus modelos de lenguaje. Sorprendentemente, el juez del caso dictaminó que este acto específico de comprar un libro, destruirlo para escanearlo y luego descartar el original, calificaba como “uso justo” (fair use) bajo la ley de derechos de autor, principalmente porque la copia física original fue comprada legalmente y reemplazada por una única copia digital para uso interno, sin ser redistribuida.
Sentencia
UNITED STATES DISTRICT COURT
NORTHERN DISTRICT OF CALIFORNIA
ANDREA BARTZ, CHARLES GRAEBER,
and KIRK WALLACE JOHNSON,
Plaintiffs,
No. C 24-05417 WHA
v.
ORDER ON FAIR USE
ANTHROPIC PBC,
United States District Court
Northern District of California
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 1 of 32
Defendant.
INTRODUCTION
An artificial intelligence firm downloaded for free millions of copyrighted books in
digital form from pirate sites on the internet. The firm also purchased copyrighted books
(some overlapping with those acquired from the pirate sites), tore off the bindings, scanned
every page, and stored them in digitized, searchable files. All the foregoing was done to amass
a central library of “all the books in the world” to retain “forever.” From this central library,
the AI firm selected various sets and subsets of digitized books to train various large language
models under development to power its AI services. Some of these books were written by
plaintiff authors, who now sue for copyright infringement. On summary judgment, the issue is
the extent to which any of the uses of the works in question qualify as “fair uses” under
Section 107 of the Copyright Act.
STATEMENT
Defendant Anthropic PBC is an AI software firm founded by former OpenAI employees
in January 2021. Its core offering is an AI software service called Claude. When a userUnited States District Court
Northern District of California
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 2 of 32
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
prompts Claude with text, Claude quickly responds with text — mimicking human reading and
writing. Claude can do so because Anthropic trained Claude — or rather trained large
language models or LLMs underlying various versions of Claude — using books and other
texts selected from a central library Anthropic had assembled. Claude was first released
publicly in March 2023. Seven successive versions of Claude have been released since. Users
may ask Claude some questions for free. Demanding users and corporate clients pay to use
Claude, generating over one billion dollars in annual revenue (Opp. Exh. 18).
Plaintiffs Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson are authors of books
that Anthropic copied from pirated and purchased sources. Anthropic assembled these copies
into a central library of its own, copied further various sets and subsets of those library copies
to include in various “data mixes,” and used these mixes to train various LLMs. Anthropic
kept the library copies in place as a permanent, general-purpose resource even after deciding it
would not use certain copies to train LLMs or would never use them again to do so. All of
Anthropic’s copying was without plaintiffs’ authorization.
Author Bartz wrote four novels Anthropic copied and used: The Lost Night: A Novel,
The Herd, We Were Never Here, and The Spare Room. Author Graeber wrote two non-fiction
books likewise at issue: The Good Nurse: A True Story of Medicine, Madness, and Murder,
and The Breakthrough: Immunotherapy and the Race to Cure Cancer. And, Author Johnson
penned three non-fiction books also copied and used: To Be A Friend Is Fatal: The Fight to
Save the Iraqis America Left Behind, The Feather Thief: Beauty, Obsession, and the Natural
History Heist of the Century, and The Fishermen and the Dragon: Fear, Greed, and a Fight
for Justice on the Gulf Coast. Plaintiffs Bartz Inc. and MJ + KJ Inc. are corporate entities that
Author Bartz and Author Johnson respectively set up to market their works. Between them,
these five plaintiffs (“Authors”) own all the copyrights in the above-listed works.
From the start, Anthropic “ha[d] many places from which” it could have purchased
books, but it preferred to steal them to avoid “legal/practice/business slog,” as cofounder and
chief executive officer Dario Amodei put it (see Opp. Exh. 27). So, in January or February
2021, another Anthropic cofounder, Ben Mann, downloaded Books3, an online library of
2United States District Court
Northern District of California
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 3 of 32
196,640 books that he knew had been assembled from unauthorized copies of copyrighted
books — that is, pirated. Anthropic’s next pirated acquisitions involved downloading
distributed, reshared copies of other pirate libraries. In June 2021, Mann downloaded in this
way at least five million copies of books from Library Genesis, or LibGen, which he knew had
been pirated. And, in July 2022, Anthropic likewise downloaded at least two million copies of
books from the Pirate Library Mirror, or PiLiMi, which Anthropic knew had been pirated
(Opp. Exh. 6 at 4; Opp. Expert Zhao ¶¶ 17–29; see Class Cert. (“CC”) Opp. Expert Iyyer
¶¶ 45–46). Although what was downloaded and later duplicated from these sources was
sometimes referred to as data or datasets, at bottom they contained full-text “ebooks or scans of
books” saved in individual files in formats like .pdf, .txt, and .epub (see, e.g., Opp. Exh. 12 at –
0391318). For Books3, most filenames identified the book inside. For LibGen and PiLiMi,
Anthropic downloaded a separate catalog of bibliographic metadata for each collection, with
fields like title, author, and ISBN (see, e.g., ibid.; Opp. Exh. 16 -0533972–73). Anthropic
thereby pirated over seven million copies of books, including copies of at least two works at
issue for each Author.1
As Anthropic trained successive LLMs, it became convinced that using books was the
most cost-effective means to achieve a world-class LLM. During this time, however,
Anthropic became “not so gung ho about” training on pirated books “for legal reasons” (Opp.
Exh. 19). It kept them anyway (e.g., Opp. Exh. 17 at 93–94; CC Opp. Exh. 35 at -0273474).
To find a new way to get books, in February 2024, Anthropic hired the former head of
partnerships for Google’s book-scanning project, Tom Turvey. He was tasked with obtaining
“all the books in the world” while still avoiding as much “legal/practice/business slog” as
1 Specifically, those works were (see Opp. Expert Zhao ¶ 36; CC Br. Expert Zhao ¶ 66):
Author Bartz’s The Herd (five copies total) (in LibGen and PiLiMi);
Author Bartz’s The Lost Night (three copies total) (in Books3, LibGen, and PiLiMi);
Author Graeber’s The Breakthrough (four copies) (in Books3, LibGen, and PiLiMi);
Author Graeber’s The Good Nurse (five copies total) (in Books3 and LibGen);
Author Johnson’s To Be A Friend Is Fatal (one copy) (in Books3); and
Author Johnson’s The Feather Thief (four copies total) (in Books3, LibGen, PiLiMi).
Some evidence suggests Anthropic downloaded still more copies before culling empty files,
duplicates, and so on to reach the numbers kept in the central library and counted here.
3United States District Court
Northern District of California
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 4 of 32
possible (Opp. Exhs. 21, 27). So, in spring 2024, Turvey sent an email or two to major
publishers to inquire into licensing books for training AI. Had Turvey kept up those
conversations, he might have reached agreements to license copies for AI training from
publishers — just as another major technology company soon did with one major publisher
(e.g., Opp. Expert Malackowski ¶¶ 50, 64). But Turvey let those conversations wither.
Instead, Turvey and his team emailed major book distributors and retailers about bulk-
purchasing their print copies for the AI firm’s “research library” (Opp. Exh. 22 at 145; Opp.
Exh. 31 at -035589). Anthropic spent many millions of dollars to purchase millions of print
books, often in used condition. Then, its service providers stripped the books from their
bindings, cut their pages to size, and scanned the books into digital form — discarding the
paper originals. Each print book resulted in a PDF copy containing images of the scanned
pages with machine-readable text (including front and back cover scans for softcover books).
Anthropic created its own catalog of bibliographic metadata for the books it was acquiring. It
acquired copies of millions of books, including of all works at issue for all Authors.2
Anthropic may have copied portions of Authors’ books on other occasions, too — such
as while copying book reviews, academic papers, internet blogposts, or the like for its central
library. And, Anthropic’s scanning service providers may have copied Authors’ print books
along the way to delivering the final digital copies to Anthropic. But neither side here
specifically raises legal issues implicated by any such copies. Nor will this order.
From all the above sources, Anthropic created a general “research library” or
“generalized data area.” What was this for? As Turvey said, this was a “way of creating
information that would be voluminous and that we would use for research,” or otherwise to
2 In other words, within the scanned books were one or more copies of the following works:
Author Bartz’s The Herd;
Author Bartz’s The Lost Night;
Author Bartz’s We Were Never Here;
Author Bartz’s The Spare Room;
Author Graeber’s The Breakthrough;
Author Graeber’s The Good Nurse;
Author Johnson’s To Be A Friend Is Fatal;
Author Johnson’s The Feather Thief; and,
Author Johnson’s The Fishermen.
4United States District Court
Northern District of California
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 5 of 32
“inform our — our products” (Opp. Exh. 22 at 145–46, 194). The copies were kept in the
original “version of the underlying” book files Anthropic had “obtained or created,” that is,
pirated or scanned (Opp. Exh. 30 at 3, 4). Anthropic planned to “store everything forever; we
might separate out books into categories[, but t]here [wa]s no compelling reason to delete a
book” — even if not used for training LLMs. Over time, Anthropic invested in building more
tools for searching its “general purpose” library and for accessing books or sets of books for
further uses (see CC Br. Exh. 12 at -0144509; CC Reply Exh. 45 at -0365931–32, -0365939–
42 (reviewing and seeking to improve “[w]hat [ ] researchers do today if they want to search
for a book,” including improving bibliographic metadata and consolidating varied resources)).
One further use was training LLMs. As a preliminary step towards training, engineers
browsed books and bibliographic metadata to learn what languages the books were written in,
what subjects they concerned, whether they were by famous authors or not, and so on —
sometimes by “open[ing] any of the books” and sometimes using software. From the library
copies, engineers copied the sets or subsets of books they believed best for training and
“iterate[d]” on those selections over time. For instance, two different subsets of print-sourced
books were included in “data mixes” for training two different LLMs. Each was just a fraction
of all the print-sourced books. Similarly, different sets or “subsets” or “parts of” or “portions”
of the collections sourced from Books3, LibGen, and PiLiMi were used to train different
LLMs. Anthropic analyzed the consequences of using more books, fewer books, different
books. The goal was to improve the “data mix“ to improve each LLM and, ultimately,
Claude’s performance for paying customers.3
3 (See, e.g., Opp. Exh. 12 at -0391318 (engineers were able to “open any of the books”); CC
Reply Exh. 45 at -0365941 (some engineers “want[ed] to search for a book” and get its “scanned
book file[ ]”); Opp. Exh. 30 at 3 (made copies of “each such dataset or portions thereof” for
training); Opp. Exh. 6 at 3–4 (trained on “portions of datasets,” with at least two such portions
from LibGen and four from PiLiMi); Opp. Expert Zhao ¶¶ 27–28, 30–31 (plus two more from
PiLiMi, and at least three from scanned books); CC Opp. Exh. 35 at -0273477–82 (tested subsets
of pirated and purchased-and-scanned books to see consequences for training); CC Br. Exh. 12 at –
0144508–09 (“iterate[d]” selections from library and “train[ed] new models on the best data”); Br.
Expert Kaplan ¶¶ 42–45 (explained goals of improving data mixes); Br. Expert Peterson ¶ 14
(similar)).
5United States District Court
Northern District of California
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 6 of 32
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Over time, Anthropic came to value most highly for its data mixes books like the ones
Authors had written, and it valued them because of the creative expressions they contained.
Claude’s customers wanted Claude to write as accurately and as compellingly as Authors. So,
it was best to train the LLMs underlying Claude on works just like the ones Authors had
written, with well-curated facts, well-organized analyses, and captivating fictional
narratives — above all with “good writing” of the kind “an editor would approve of” (Opp.
Exh. 3 at -03433). Anthropic could have trained its LLMs without using such books or any
books at all. That would have required spending more on, say, staff writers to create
competing exemplars of good writing, engineers to revise bad exemplars into better ones,
energy bills to power more rounds of training and fine-tuning, and so on. Having canonical
texts to draw upon helped (e.g., Opp. Expert Zhao ¶ 81).
Each work selected for training any given LLM was copied in four main ways — and in
fact so many times that Anthropic admits it would be impractical even to estimate.
First, each work selected was copied from the central library to create a working copy for
the training set.
Second, each work was cleaned to remove a small amount of lower-valued or repeating
text (like headers, footers, or page numbers), with a “cleaned” copy resulting. If the same book
appeared twice, or if while looking across the entire provisional training set it became clear
there was some other reason to cull a book or category, Anthropic had the capability to delete
relevant copy(ies) from the set at this step (see CC Br. Expert Zhao ¶¶ 71–72).
Third, each cleaned copy was translated into a “tokenized” copy. Some words were
“stemmed” or “lemmatized” into simpler forms (e.g., “studying” to “study”). And, all
characters were grouped into short sequences and translated into corresponding number
sequences or “tokens” according to an Anthropic-made dictionary. The resulting tokenized
copies were then copied repeatedly during training. By one account, this process involved the
iterative, trial-and-error discovery of contingent statistical relationships between each word
fragment and all other word fragments both within any work and across trillions of word
6United States District Court
Northern District of California
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 7 of 32
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
fragments from other copied books, copied websites, and the like. Other steps in training are
not at issue here (id. ¶¶ 73–76; see Opp. Expert Zhao ¶ 38 & n.6).
Fourth, each fully trained LLM itself retained “compressed” copies of the works it had
trained upon, or so Authors contend and this order takes for granted. In essence, each LLM’s
mapping of contingent relationships was so complete it mapped or indeed simply “memorized”
the works it trained upon almost verbatim. So, if each completed LLM had been asked to
recite works it had trained upon, it could have done so (e.g., Opp. Expert Zhao ¶ 74). Further
steps refining the LLM are not at issue here.
However, that was as far as the training copies propagated towards the outside world.
When each LLM was put into a public-facing version of Claude, it was complemented by other
software that filtered user inputs to the LLM and filtered outputs from the LLM back to the
user (id. ¶¶ 75–77). As a result, Authors do not allege that any infringing copy of their works
was or would ever be provided to users by the Claude service. Yes, Claude could help less
capable writers create works as well-written as Authors’ and competing in the same categories.
But Claude created no exact copy, nor any substantial knock-off. Nothing traceable to
Authors’ works. Such allegations are simply not part of plaintiffs’ amended complaint, nor in
our record.
Neither side puts directly at issue any copies of any works that might have been used for
the filtering software. Nor will this order.
In sum, the copies of books pirated or purchased-and-destructively-scanned were placed
into a central “research library” or “generalized data area,” sets or subsets were copied again to
create training copies for data mixes, the training copies were successively copied to be
cleaned, tokenized, and compressed into any given trained LLM, and once trained an LLM did
not output through Claude to the public any further copies. Finally, once Anthropic decided a
copy of a pirated or scanned book in the library would not be used for training at all or ever
again, Anthropic still retained that work as a “hard resource” for other uses or future uses. At
least one work from each Author was present in every phase described above.
* * *
7United States District Court
Northern District of California
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 8 of 32
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
In August 2024, the three individual authors brought this putative class action
complaining that Anthropic had infringed its federal copyrights by pirating copies for its
library and by reproducing them to train its LLMs (Compl. ¶¶ 45–46, 71; see Amd. Compl.
¶¶ 47–48, 75). In October 2024, a scheduling order required that any motion for class
certification be brought by March 6, 2025 (Dkt. No. 49).
The individual authors soon amended their complaint to include affiliated corporate
entities as named plaintiffs, with consent. And, Anthropic chose not to move to dismiss the
amended complaint, as it earlier had planned (see Dkt. No. 37). Instead, Anthropic moved to
allow an early motion for summary judgment on fair use, even before class certification
(Dkt. No. 88; see Feb. 25, 2025 Tr. 15). Permission was granted.
Anthropic now moves for summary judgment on fair use only. Fair use is a legal
question for the judge with underlying fact questions, if any, for the jury. To prevail on
summary judgment, Anthropic must rely on undisputed facts and/or factual inferences favoring
the opposing side. Anthropic thus bears the burdens of production and persuasion in this
motion. See Google LLC v. Oracle Am., Inc., 593 U.S. 1, 23–24 (2021); Andy Warhol Found.
for the Visual Arts, Inc. v. Goldsmith, 598 U.S. 508, 547 n.21 (2023); Campbell v. Acuff-Rose
Music, Inc., 510 U.S. 569, 590 & n.20, 594 (1994); see also Nissan Fire & Marine Ins. Co. v.
Fritz Cos., 210 F.3d 1099, 1102–03 (9th Cir. 2000).
Notably, in its motion, Anthropic argues that pirating initial copies of Authors’ books and
millions of other books was justified because all those copies were at least reasonably
necessary for training LLMs — and yet Anthropic has resisted putting into the record what
copies or even sets of copies were in fact used for training LLMs. For example, at oral
argument, Anthropic asserted that if a purported fair user had retained pirated copies for uses
beyond the fair use, then her piracy would not be excused by the fair use (Tr. 53, 56). But
when Authors earlier interrogated Anthropic in discovery about what library copies (the
original copies “obtained or created” by Anthropic) Anthropic had recopied for further uses,
Anthropic responded that providing information about any copies made for uses beyond
training commercially released LLMs would be overbroad, and that it could not count up all its
8United States District Court
Northern District of California
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 9 of 32
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
copying even for LLMs in any case (e.g., Opp Exh. 30 at 3). We know that Anthropic has
more information about what it in fact copied for training LLMs (or not). Anthropic earlier
produced a spreadsheet that showed the composition of various data mixes used for training
various LLMs — yet it clawed back that spreadsheet in April (Opp. Fredricks Decl. ¶¶ 2–3). A
discovery dispute regarding that spreadsheet remains pending. But Anthropic did not need a
court order to offer up what it possessed in support of its motion. All deficiencies must be held
against Anthropic and not the other way around.
This is the first substantive order in this case. A contemporaneous motion for class
certification remains pending. It proposes one class related to works that were pirated
(whether or not used to train LLMs), and a second class related to works that were purchased,
scanned, and used in training LLMs. This order follows full briefing, a hearing, and
supplemental briefing.
To summarize the analysis that now follows, the use of the books at issue to train Claude
and its precursors was exceedingly transformative and was a fair use under Section 107 of the
Copyright Act. And, the digitization of the books purchased in print form by Anthropic was
also a fair use but not for the same reason as applies to the training copies. Instead, it was a
fair use because all Anthropic did was replace the print copies it had purchased for its central
library with more convenient space-saving and searchable digital copies for its central
library — without adding new copies, creating new works, or redistributing existing copies.
However, Anthropic had no entitlement to use pirated copies for its central library. Creating a
permanent, general-purpose library was not itself a fair use excusing Anthropic’s piracy.
ANALYSIS
Section 107 of the Copyright Act identifies four factors for determining whether a given
use of a copyrighted work is a fair use:
[T]he fair use of a copyrighted work . . . for purposes such as
criticism, comment, news reporting, teaching (including multiple
copies for classroom use), scholarship, or research, is not an
infringement of copyright. In determining whether the use made
of a work in any particular case is a fair use the factors to be
considered shall include —
9United States District Court
Northern District of California
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 10 of 32
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
(1) the purpose and character of the use, including whether such
use is of a commercial nature or is for nonprofit educational
purposes;
(2) the nature of the copyrighted work;
(3) the amount and substantiality of the portion used in relation to
the copyrighted work as a whole; and
(4) the effect of the use upon the potential market for or value of
the copyrighted work.
These factors presuppose a “use.” So, at the threshold, a court must decide whether a
“copyrighted [work] has been used in multiple ways,” then evaluate each. Warhol, 598 U.S. at
533. Uses do not turn on “the subjective intent of the user” but on “an objective inquiry into
what use was made, i.e., what the user d[id] with the original work.” Id. at 544–45. A “use”
should be construed narrowly enough to not “swallow” distinguishable infringing uses, much
less categories of exclusive rights in toto. Id. at 541, 543 n.18, 546–48. Sometimes, the
challenged copying involves just one use: In Perfect 10, Inc. v. Amazon.com, Inc., Google
visited websites having full-sized images, made only reduced-sized copies, and incorporated
those directly into its search engine — the sole use of the thumbnails being as “pointer[s]” to
the images themselves. 508 F.3d 1146, 1157, 1160, 1165 (9th Cir. 2007). Sometimes, the
copying involves many uses: In the Google Books cases, Google borrowed books from
libraries, made both full-image and text-only copies, and incorporated different copies into
different tools — one use being to reveal information “about those books,” another use being
to provide the books to print-disabled patrons, and still another being to back up the print
books if lost. Authors Guild v. Google, Inc., 804 F.3d 202, 217 (2d Cir. 2015) (quoted);
Authors Guild, Inc. v. HathiTrust, 755 F.3d 87, 97, 101, 103 (2d Cir. 2014) (other cited uses).
Our parties debate an instructive decision. In American Geophysical Union v. Texaco
Inc., Texaco employees used scientific articles in a central library, used copies of them in
personal desk libraries, and used selected copies again in the scientific laboratory — the first
use paid for, the second infringing, and the third plausibly fair but in fact a rare occurrence.
802 F. Supp. 1, 4–5, 14 (S.D.N.Y. 1992) (Judge Pierre Leval), aff’d, 60 F.3d 913, 918–19, 926
(2d Cir. 1994).
10United States District Court
Northern District of California
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 11 of 32
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Here, our parties contest what use or uses are at issue. Anthropic contends it copied
Authors’ books only for one use: Only to train LLMs. By contrast, Authors contend it did so
for at least two uses: First to build a vast, central library of potentially useful content, and
second to train specific LLMs using shifting sets and subsets of that content — over time
selecting the more well-organized and well-expressed works for training. Authors also
complain that the print-to-digital format change was itself an infringement not abridged as a
fair use (Opp. 15, 25). Authors do not allege, however, that any LLM outputs infringing upon
their works ever reached users of the public-facing Claude service.
This order addresses each of the four factors in turn, pointing out how each applies to the
training copies and to the purchased and pirated library copies. It concludes with an integrated
analysis.
1. THE PURPOSE AND CHARACTER OF THE USE.
For a given use at issue, the first factor addresses “the purpose and character of th[at] use,
including whether [it] is of a commercial nature or is for nonprofit educational purposes.” 17
U.S.C. § 107(1).
A. THE COPIES USED TO TRAIN SPECIFIC LLMS.
All agree that one use at issue was training LLMs to receive text inputs and return text
outputs. More specifically, Anthropic used copies of Authors’ copyrighted works to iteratively
map statistical relationships between every text-fragment and every sequence of text-fragments
so that a completed LLM could receive new text inputs and return new text outputs as if it were
a human reading prompts and writing responses. Authors further argue — and this order takes
for granted — that such training entailed “memoriz[ing]” works by “compress[ing]” copies of
those works into the LLM (Opp. 16–17; see Opp. Expert Zhao ¶ 74). The LLMs “memorize[d]
A LOT, like A LOT” (Opp. Exh. 35 at -029109). Regardless, the “purpose and character” of
using works to train LLMs was transformative — spectacularly so.
To repeat and be clear: Authors do not allege that any LLM output provided to users
infringed upon Authors’ works. Our record shows the opposite. Users interacted only with the
Claude service, which placed additional software between the user and the underlying LLM to
11United States District Court
Northern District of California
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 12 of 32
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
ensure that no infringing output ever reached the users. This was akin to the limits Google
imposed on how many snippets of text from any one book could be seen by any one user
through its Google Books service, preventing its search tool from devolving into a reading tool.
Google, 804 F.2d at 222. Here, if the outputs seen by users had been infringing, Authors
would have a different case. And, if the outputs were ever to become infringing, Authors
could bring such a case. But that is not this case.
Instead, Authors challenge only the inputs, not the outputs, of these LLMs. They point to
the fully trained LLMs and the Claude service only to shed light on how training itself uses
copies of their works and the ways the Claude service could be used to produce still other
works that would compete with their works. This order does the same. Authors’ arguments
that the training use is not transformative are unavailing.
First, Authors argue that using works to train Claude’s underlying LLMs was like using
works to train any person to read and write, so Authors should be able to exclude Anthropic
from this use (Opp. 16). But Authors cannot rightly exclude anyone from using their works for
training or learning as such. Everyone reads texts, too, then writes new texts. They may need
to pay for getting their hands on a text in the first instance. But to make anyone pay
specifically for the use of a book each time they read it, each time they recall it from memory,
each time they later draw upon it when writing new things in new ways would be unthinkable.
For centuries, we have read and re-read books. We have admired, memorized, and internalized
their sweeping themes, their substantive points, and their stylistic solutions to recurring writing
problems.
Second, to that last point, Authors further argue that the training was intended to
memorize their works’ creative elements — not just their works’ non-protectable ones (Opp.
17). But this is the same argument. Again, Anthropic’s LLMs have not reproduced to the
public a given work’s creative elements, nor even one author’s identifiable expressive style
(assuming arguendo that these are even copyrightable). Yes, Claude has outputted grammar,
composition, and style that the underlying LLM distilled from thousands of works. But if
someone were to read all the modern-day classics because of their exceptional expression,
12United States District Court
Northern District of California
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 13 of 32
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
memorize them, and then emulate a blend of their best writing, would that violate the
Copyright Act? Of course not. Copyright does not extend to “method[s] of operation,
concept[s], [or] principle[s]” “illustrated[ ] or embodied in [a] work.” 17 U.S.C. § 102(b); see,
e.g., Nichols v. Universal Pictures Corp., 45 F.2d 119, 120–22 (2d Cir. 1930) (Judge Learned
Hand) (stage properties and storytelling elements); Apple Comput., Inc. v. Microsoft Corp., 35
F.3d 1435, 1445 (9th Cir. 1994) (“user-friendly” design principles and elements); Swirsky v.
Carey, 376 F.3d 841, 848 (9th Cir. 2004) (music theory principles and chord progressions).
Third, Authors next argue that computers nonetheless should not be allowed to do what
people do.
Authors cite a decision seeming to say as much (Opp. 16–17). But the judge there twice
emphasized while discussing “purpose and character” of the use that what was trained was “not
generative AI (AI that writes new content itself).” Rather, what was trained — using a
proprietary system for finding court opinions in response to a given legal topic — was a
competing AI tool for finding court opinions in response to a given legal topic. That was not
transformative. Thomson Reuters Enter. Centre GmbH v. Ross Intell. Inc., 765 F. Supp. 3d
382, 398 (D. Del. 2025) (Judge Stephanos Bibas), appeal docketed, No. 25-8018 (3d Cir. Apr.
14, 2025).
A better analogue to our facts would be an AI tool trained — using court opinions, and
briefs, law review articles, and the like — to receive legal prompts and respond with fresh legal
writing. And, on facts much like those, a different court came out the other way. It found fair
use. White v. W. Pub. Corp., 29 F. Supp. 3d 396, 400 (S.D.N.Y. 2014) (Judge Jed Rakoff).
The latter use stood sufficiently “orthogonal” to anything that any copyright owner
rightly could expect to control. See Warhol, 598 U.S. at 538–40. It could thus be freed up for
the copyist to use, “promot[ing] the progress of science and the arts, without diminishing the
incentive to create.” Id. at 531 (emphasis added); see U.S. CONST. art. I, § 8, cl. 8.
In short, the purpose and character of using copyrighted works to train LLMs to generate
new text was quintessentially transformative. Like any reader aspiring to be a writer,
Anthropic’s LLMs trained upon works not to race ahead and replicate or supplant them — but
13United States District Court
Northern District of California
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 14 of 32
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
to turn a hard corner and create something different. If this training process reasonably
required making copies within the LLM or otherwise, those copies were engaged in a
transformative use.
The first factor favors fair use for the training copies.
B. THE COPIES USED TO BUILD A CENTRAL LIBRARY.
But that is not the only use at issue. Recall that Anthropic purchased millions of print
books for its central library and pirated millions of digital books for its central library, too. It
used specific sets and subsets of books for training specific LLMs. And, it then retained all the
copies in its central library for other uses that might arise even after deciding it would not use
them to train any LLM (at all or ever again). Anthropic seems to believe that because some of
the works it copied were sometimes used in training LLMs, Anthropic was entitled to take for
free all the works in the world and keep them forever with no further accounting. There is no
carveout, however, from the Copyright Act for AI companies.
Because the legal issues differ between the library copies Anthropic purchased and
pirated, this order takes them in turn.
(i) The Purchased Library Copies Converted from Print to Digital.
Anthropic purchased millions of print copies to “build a research library” (Opp. Exh. 22
at 145, 148). It destroyed each print copy while replacing it with a digital copy for use in its
library (not for sharing nor sale outside the company). As to these copies, Authors do not
complain that Anthropic failed to pay to acquire a library copy. Authors only complain that
Anthropic changed each copy’s format from print to digital (see Opp. 15, 25 & n.14). On the
facts here, that format change itself added no new copies, eased storage and enabled
searchability, and was not done for purposes trenching upon the copyright owner’s rightful
interests — it was transformative.
Anthropic purchased its print copies fair and square. With each purchase came
entitlement for Anthropic to “dispose[ ]” each copy as it saw fit. 17 U.S.C. § 109(a). So,
Anthropic was entitled to keep the copies in its central library for all the ordinary uses. Yes,
14United States District Court
Northern District of California
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 15 of 32
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Anthropic changed the format of these library copies from print to digital — giving rise to the
issue here.
All agree on the facts of the format change. Anthropic “destructively scan[ned]” the
print copies to create the digital ones. Anthropic or its vendors stripped the bindings from the
print books, cut the pages to workable dimensions, and scanned those pages — discarding each
print copy while creating a digital one in its place. The digital copy was then housed in the
“research library” or “generalized data area” in place of the print copy (Opp. Exh. 22 at 145–
46, 193–94). Authors do not allege and our record does not show that Anthropic provided its
converted digital copies of print books to anyone outside Anthropic.
The parties disagree about the legal consequences of the format change. Was scanning
the print copies to create digital replacements transformative? Anthropic argues it was because
it was reasonably necessary to training LLMs. Authors argue it was a distinguishable step
requiring independent justification.
Here, for reasons narrower than Anthropic offers, the mere format change was a fair use.
Storage and searchability are not creative properties of the copyrighted work itself but
physical properties of the frame around the work or informational properties about the work.
See Texaco, 802 F. Supp. at 14 (physical), aff’d, 60 F.3d at 919; Google, 804 F.3d at 225
(informational); Sony Corp. of Am. v. Universal City Studios, Inc. (“Sony Betamax”), 464 U.S.
417, 447 (1984) (rightful interests). In Texaco, the court reasoned that if a purchased scientific
journal article had been copied “onto microfilm to conserve space, this might [have been] a
persuasive transformative use.” 802 F. Supp. at 14 (Judge Pierre Leval), aff’d, 60 F.3d at 919
(reducing “bulk[ ]” “might suffice to tilt the first fair use factor in favor of Texaco if these
purposes were dominant“). In Google Books, the court reasoned that a print-to-digital change
to expose information about the work was transformative. Google, 804 F.3d at 225 (Judge
Pierre Leval). And, in Sony Betamax, the Supreme Court held that making a recording of a
television show in order to instead watch it at a later time was copying but did not usurp any
rightful interest of the copyright owner. 464 U.S. at 447, 455. Important to the Supreme
Court’s reasoning was the expectation that most such copiers would not distribute the
15United States District Court
Northern District of California
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 16 of 32
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
permanent copies of the work. Finally, in A&M Records, Inc. v. Napster, Inc., our court of
appeals recognized the reasoning just explained, and therefore rejected by contrast a
digitization effort that was touted as space-shifting but in fact resulted in the multiplication of
copies shared with outsiders through a file-sharing service. 239 F.3d 1004, 1019 (9th Cir.
2001), aff’g in this part 114 F. Supp. 2d 896, 912–13, 915–16 (N.D. Cal. 2000) (Judge Marilyn
Hall Patel) (citing Sony Betamax and Texaco).
Here, every purchased print copy was copied in order to save storage space and to enable
searchability as a digital copy. The print original was destroyed. One replaced the other. And,
there is no evidence that the new, digital copy was shown, shared, or sold outside the company.
This use was even more clearly transformative than those in Texaco, Google, and Sony
Betamax (where the number of copies went up by at least one), and, of course, more
transformative than those uses rejected in Napster (where the number went up by “millions” of
copies shared for free with others).
Yes, Anthropic is a commercial outfit. And, this order takes for granted that Anthropic in
fact benefited from the print-to-digital format change — or it would not have gone to all the
trouble. But the crux of the first fair use factor’s concern for “commercial” use is in protecting
the copyright owners and their entitlements to exploit their copyright as they see fit (or not).
See, e.g., Harper & Row, Publishers, Inc. v. Nation Enters., 471 U.S. 539, 562 (1985). That
the accused is a commercial entity is indicative, not dispositive. That the accused stands to
benefit is likewise indicative. But what matters most is whether the format change exploits
anything the Copyright Act reserves to the copyright owner. Anthropic already had purchased
permanent library copies (print ones). It did not create new copies to share or sell outside.
Yes, Authors also might have wished to charge Anthropic more for digital than for print
copies. And, this order takes for granted that Authors could have succeeded if Anthropic had
been barred from the format change. “But the Constitution’s language [in Clause 8] nowhere
suggests that [the copyright owner’s] limited exclusive right should include a right to divide
markets or a concomitant right to charge different purchasers different prices for the same
book, [merely] say to increase or to maximize gain.” See Kirtsaeng v. John Wiley & Sons,
16United States District Court
Northern District of California
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 17 of 32
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Inc., 568 U.S. 519, 552 (2013); see also U.S. CONST. art. I., § 8, cl. 8. Nor does the Copyright
Act itself. Section 106 sets out exclusive rights that fair uses under Section 107 abridge.
Section 106(1) reserves to the copyright owner the right to make reproductions. But on our
facts we face the unusual situation where one copy entirely replaced the another. And,
Section 106(2) reserves to the copyright owner the right to make derivative works that add or
subtract creative material — as occurs in a “translation, musical arrangement, dramatization,
fictionalization, motion picture version, sound recording, art reproduction, abridgment, [or]
condensation” of a book, 17 U.S.C. § 101 (definitions). For some “other modification[ ]” of a
book to constitute a “derivative work,” it must itself “represent an original work of
authorship.” Ibid. But on our facts the format was changed but no content was added or
subtracted. See Mirage Editions, Inc. v. Albuquerque A.R.T. Co., 856 F.2d 1341, 1342, 1343–
44 (9th Cir. 1988) (yes where elements added to create new decorative ceramic).4
Section 106(3) further reserves to the copyright owner the right to distribute copies. But again,
the replacement copy here was kept in the central library, not distributed. Cf. Fox News
Network, LLC v. TVEyes, Inc., 883 F.3d 169, 176–78 (2d Cir. 2018) (enabling searching for
“information about the material” can be transformative use, even if some distribution results);
Lewis Galoob Toys, Inc. v. Nintendo of Am., Inc., 964 F.2d 965, 968, 971 (9th Cir. 1992)
(using nifty converter to “merely enhance[ ]” audiovisual displays emitted from purchased
videogame cartridge was fair use of those displays partly because no surplus copies of
cartridge or displays were ever created).
As a result, Anthropic’s format-change from print library copies to digital library copies
was transformative under fair use factor one. Anthropic was entitled to retain a copy of these
works in a print format. It retained them instead in a digital format, easing storage and
4 Even if print-to-digital format change did infringe the right to prepare derivative works,
Authors have conceded that “Plaintiffs’ infringement claims are predicated on Anthropic’s
unauthorized reproduction (17 U.S.C. § 106(1)); Plaintiffs are not alleging infringement by
Anthropic of any right to prepare derivative works (id. at § 106(2))” (Dkt. No. 203 at 2 (citations
original)). Whether this concession had consequence for copies tokenized and used for training or
“compressed” into the trained LLMs is not reached by this order because Anthropic does not rely
on Authors’ concession and those copies were here used transformatively.
17United States District Court
Northern District of California
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 18 of 32
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
searchability. And, the further copies made therefrom for purposes of training LLMs were
themselves transformative for that further reason, as above.
To be clear, this print-to-digital conversion involved a different and narrower form of
transformative use than the broader one advanced by Anthropic. Anthropic argues that the
central library use was part and parcel of the LLM training use and therefore transformative.
This order disagrees. However, this order holds that the mere conversion of a print book to a
digital file to save space and enable searchability was transformative for that reason alone.
Therefore, the digital copy should be treated just as if the purchased print copy had been placed
in the central library.
In sum, the first fair use factor favors fair use for the digital library copies converted from
purchased print library copies — but these do not excuse the pirated library copies.
(ii) The Pirated Library Copies.
Before buying books for its central library, Anthropic downloaded over seven million
pirated copies of books, paid nothing, and kept these pirated copies in its library even after
deciding it would not use them to train its AI (at all or ever again). Authors argue Anthropic
should have paid for these pirated library copies (e.g., Tr. 24–25, 65; Opp. 7, 12–13). This
order agrees.
The basic problem here was well-stated by Anthropic at oral argument: “You can’t just
bless yourself by saying I have a research purpose and, therefore, go and take any textbook you
want. That would destroy the academic publishing market if that were the case” (Tr. 53). Of
course, the person who purchases the textbook owes no further accounting for keeping the
copy. But the person who copies the textbook from a pirate site has infringed already, full
stop. This order further rejects Anthropic’s assumption that the use of the copies for a central
library can be excused as fair use merely because some will eventually be used to train LLMs.
This order doubts that any accused infringer could ever meet its burden of explaining
why downloading source copies from pirate sites that it could have purchased or otherwise
accessed lawfully was itself reasonably necessary to any subsequent fair use. There is no
decision holding or requiring that pirating a book that could have been bought at a bookstore
18United States District Court
Northern District of California
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 19 of 32
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
was reasonably necessary to writing a book review, conducting research on facts in the book,
or creating an LLM. Such piracy of otherwise available copies is inherently, irredeemably
infringing even if the pirated copies are immediately used for the transformative use and
immediately discarded.
But this order need not decide this case on that rule. Anthropic did not use these copies
only for training its LLM. Indeed, it retained pirated copies even after deciding it would not
use them or copies from them for training its LLMs ever again. They were acquired and
retained, as a central library of all the books in the world.
Building a central library of works to be available for any number of further uses was
itself the use for which Anthropic acquired these copies. One further use was making further
copies for training LLMs. But not every book Anthropic pirated was used to train LLMs.
And, every pirated library copy was retained even if it was determined it would not be so used.
Pirating copies to build a research library without paying for it, and to retain copies should they
prove useful for one thing or another, was its own use — and not a transformative one (see
Tr. 24–25, 35, 65; Opp. 4–10, 12 n.6; CC Br. Exh. 12 at -0144509 (“everything forever”)).
Napster, 239 F.3d at 1015; BMG Music v. Gonzalez, 430 F.3d 888, 890 (7th Cir. 2005).
Anthropic’s briefing contains other reasons why it believes its pirated library copies are
irrelevant to our fair use analysis, notwithstanding its own statements at our oral argument.
First, Anthropic accepts in this posture that it acted in bad faith but argues that its bad
faith in pirating copies cannot “somehow short-circuit[ ]” the fair use analysis (Reply 6
(downplaying Atari Games Corp. v. Nintendo of Am., Inc., 975 F.2d 832, 843 (Fed. Cir. 1992)
(applying law of Ninth Circuit))). But its bad faith is not the basis for this decision. Each use
of a work must be analyzed objectively. Warhol, 598 U.S. at 544–45. The objective analysis
here shows the initial copies were pirated to create a central, general-purpose library, as a
substitute for paid copies to do the same thing. (Of course, if infringement is found, bad faith
would matter for determining willfulness. 17 U.S.C. § 504(c)(2).)
Second, Anthropic argues that its goal to put the copies eventually “to a highly
transformative use” requires that each copy and use along the way be justified as having a
19United States District Court
Northern District of California
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 20 of 32
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
transformative use, too (Reply 14). But now Anthropic seeks to take the shortcut Anthropic
just said cannot be taken. Again, the Supreme Court tasks us with looking past the “subjective
intent of the user” to the objective use made of each copy. Warhol, 598 U.S. at 544–45
(emphasis added). Put another way, what a copyist says or thinks or feels matters only to the
extent it shows what a copyist in fact does with the work. Indeed, the same copy can be used
one way, then another, each with a different result. Id. at 533. Here, what Anthropic said
about its acquisitions at the time — that they were made to “build[ ] a research library” while
avoiding a “huge legal/practice/business slog” — are relevant in this regard. And, Anthropic’s
actual use of these pirated copies was to create its central library of texts that, like any
university or corporate library, stored the works’ well-organized facts, analyses, and expressive
examples for various contingent uses, one being training.5
Third, Anthropic argues that Texaco — the case involving copies used in a central
library, copies used in desk libraries, and copies used in the laboratory — is inapposite.
Anthropic argues that the disputed copies in Texaco were never used in the laboratory but
instead in personal desk libraries for a use “identical to the original purpose and use” of the
central library copies, and so not for a transformative use (Reply 8 (summarizing 60 F.3d at
922–23)). By contrast, says Anthropic, here it did use copies in the laboratory to train
LLMs — a very transformative use. But this is a fast glide over thin ice. Like Texaco,
Anthropic possessed copies it did not put into use in the laboratory and it kept those copies in a
central library even after its transformative use had been completed. But, unlike Texaco,
which bought those copies, Anthropic never paid for the central library copies stolen off the
5 Our court of appeals has not yet reappraised how bad faith (or good faith) figures in fair use
after Warhol. Its prior appraisal applied the Supreme Court’s statement that “[f]air use
presupposes good faith and fair dealing,” Harper & Row, 471 U.S. at 562 (cleaned up). See
Perfect 10, 508 F.3d at1164 n.8. Since then, the Supreme Court has renewed its “skepticism about
whether bad faith has any role.” Oracle, 593 U.S. at 32–33 (reiterating doubts of Campbell, 510
U.S. at 585 n.18). And, recently, the Supreme Court has held squarely that it is not the “subjective
intent” of a copyist that counts, but the “objective . . . use” of the copy. Warhol, 598 U.S. at 544–
45. This order applies this most recent analysis. Miller v. Gammie, 335 F.3d 889, 900 (9th Cir.
2003) (en banc).
20United States District Court
Northern District of California
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 21 of 32
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
internet. Texaco also shows why Anthropic is wrong to suppose that so long as you create an
exciting end product, every “back-end step, invisible to the public,” is excused (Br. 10).
Notably, this is not a case where source copies were unavailable for separate purchase or
loan. See, e.g., NXIVM Corp. v. Ross Inst., 364 F.3d 471, 475–76, 478–79 (2d Cir. 2004)
(using selections of training manual — otherwise available only to cult’s trainees subject to
NDAs — to expose cult in critical review); Time Inc. v. Bernard Geis Assocs., 293 F. Supp.
130, 135–36, 138, 146 (S.D.N.Y. 1968) (Judge Inzer Bass Wyatt) (making charcoal drawings
of photographs taken of originals otherwise not on sale or loan out to illustrate a history
book).6 Nor were the copies made only incidentally and necessarily from pirated copies. See,
e.g., Perfect 10, 508 F.3d at 1164 n.8 (copies of images that had been pirated by third-party
websites were used to index those same websites while indexing the entire web). Here, piracy
was the point: To build a central library that one could have paid for, just as Anthropic later
did, but without paying for it.
Nor were the initial copies made immediately transformed into a significantly altered
form. In Perfect 10, images were copied by the search engine in thumbnail form only and
deployed immediately into the transformative use of identifying the full-sized images and the
pages from which they came. 508 F.3d at 1160, 1165, 1167. And, in Kelly v. Arriba Software
Corp., images were copied at full size and then into thumbnails for immediate use in building a
search engine, after which the full-sized copies were immediately deleted. 336 F.3d 811, 815
(9th Cir. 2003). Not here. The full-text copies of books were downloaded and maintained
“forever.”
Nor does the initial copying here even resemble the full-text copying in the Google Books
cases. There, libraries of authorized copies already had been assembled, and all copies
6 Anthropic repeats the misleading characterization of the copyright holder in Oracle that the
initial copies were there purloined (Reply 5). Not so. “All agree[d] that Google was and
remain[ed] free to use the Java language itself. All agree[d] that Google’s virtual machine [wa]s
free of any copyright issues. All agree[d] that the six-thousand-plus method implementations by
Google [we]re free of copyright issues. The copyright issue, rather,” was the use of Java for
purposes of creating competing software having the same familiar, functional schema. Oracle
Am., Inc. v. Google Inc., 872 F. Supp. 2d 974, 978 (N.D. Cal. 2012), aff’d and rev’d in part, 750
F.3d 1339 (Fed. Cir. 2014).
21United States District Court
Northern District of California
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 22 of 32
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
therefrom were made for direct employment in a one-to-one further fair use — whether the
transformative use of pointing to the works themselves, the use of providing the works in
formats for print-disabled patrons, or the use of insuring against going out of print, getting lost,
and becoming otherwise unavailable. HathiTrust, 755 F.3d at 97, 101, 103; Google, 804 F.3d
at 206, 216–18, 228 (further distinguishing search and snippet uses, which “test[ed] the
boundaries of fair use”). Not so here concerning the pirated copies. No authorized copies
existed from which Anthropic made its first copies. No full-text copy therefrom was put
immediately into use training LLMs. Not every copy was even necessary nor used for training
LLMs. No initial copy was ever deleted, even if never used or no longer used.7 The university
libraries and Google went to exceedingly great lengths to ensure that all copies were secured
against unauthorized uses — both through technical measures and through legal agreements
among all participants. Not so here. The library copies lacked internal controls limiting access
and use.
Nor do the decisions on intermediate copying require anything less than the analysis
applied here. Anthropic argues that our court of appeals in Sega Enterprises Ltd. v. Accolade,
Inc. looked only at the “ultimate use” and “did not analyze a series of atomized acts of
‘infringement’ distinct from that overall purpose” (Reply 3). To the contrary, the appeals court
examined the initial, intermediate, and ultimate copies used by the copyist. The court
explained that the copyist initially purchased commercially available copies of game
cartridges and then made further copies necessarily and “solely in order to discover the
functional requirements for compatibility.” 977 F.2d 1510, 1522 (9th Cir. 1992). Thus, it
reached only one result because on those facts there was only one “overall purpose” for the
unauthorized copies. Indeed, the court reaffirmed prior caselaw holding that “intermediate
7 Training LLMs was not a use where perpetually maintaining a library copy was intrinsic to the
proffered fair use (e.g., for a plagiarism-checker service). Nor is this an instance where retaining
at least one copy was authorized by contract with the copyright owners (e.g., by agreement to
express terms upon submission to a plagiarism-checker service, notwithstanding proposed terms
scrawled on a paper prior to submission). A.V. ex rel. Vanderhye v. iParadigms, LLC, 562 F.3d
630, 635–36 & n.5, 645 n.8 (4th Cir. 2009), aff’g in relevant parts 544 F. Supp. 2d 473, 480 (E.D.
Va. 2008) (Judge Claude Hilton). Anthropic mischaracterizes this case.
22United States District Court
Northern District of California
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 23 of 32
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
copying of [a work] may infringe the exclusive rights granted to the copyright owner in
[S]ection 106 of the Copyright Act regardless of whether the end product of the copying also
infringes those rights.” Id. at 1518–19 (reaffirming Walker v. Univ. Books, 602 F.2d 859, 864
(9th Cir. 1979)).
Similarly, in Sony Computer Entertainment, Inc. v. Connectix Corp., our appeals court
applied the same law to similarly focused conduct. Another copyist allegedly had purchased
an authorized copy and then made further copies solely and necessarily to reverse-engineer
compatibility requirements. 203 F.3d 596, 601, 602–03 (9th Cir. 2000).
Both Sega and Sony avoided imposing an “artificial hurdle” to fair use by generously
construing the intermediate copying necessary to the fair use. As one example, Sega stated
that an engineer should be permitted to reboot her computer while undertaking to reverse-
engineer software loaded onto it — even if doing so creates another digital copy of the
software and is not strictly necessary to reverse-engineering. Id. at 605. But neither Sega nor
Sony fathomed gifting an “artificial head start” to a fair user, either, by treating even the initial
copy as an intermediate one.
And, yes, some courts have “not inquire[d]” into intermediate or initial copying at all
(Reply 2 (citing Campbell as not inquiring into surplus copies in the studio)). But if a “close
reading of those cases [ ] reveals that in none of them was the legality of the [initial or]
intermediate copying at issue,” then it was not raised and not necessarily decided. Sega, 977
F.2d at 1519; see Webster v. Fall, 266 U.S. 507, 511 (1925). It was expressly decided
elsewhere: Our analysis must attend to different uses of different copies, and even to different
uses of the same copies. Warhol, 598 U.S. at 533.
Finally, Anthropic argues that even if the initial copies served a different use than the
intermediate and ultimate copies, it was not a use for which Anthropic necessarily would have
needed to pay Authors for a copy. In theory, argues Anthropic, it could have done as Google
did in Google Books — find an existing reference library willing to loan its copies for free as
source copies. Or, in theory, it could have done as Anthropic did later — go buy used copies
without having to pay Authors at all. See 17 U.S.C. § 109(a). But Anthropic did not do those
231
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
things — instead it stole the works for its central library by downloading them from pirated
United States District Court
Northern District of California
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 24 of 32
libraries.
In sum, the first factor points against fair use for the central library copies made from
pirated sources — and no damages from pirating copies could be undone by later paying for
copies of the same works.
2. THE NATURE OF THE COPYRIGHTED WORK.
The second fair use factor is “the nature of the copyrighted work.” 17 U.S.C. § 107(2).
This factor “calls for recognition that some works are closer to the core of intended copyright
protection than others, with the consequence that fair use is more difficult to establish when the
former works are copied.” Campbell, 510 U.S. at 586. For one thing, less protection is due
published works than unpublished ones. For another, less protection is due “factual works than
works of fiction or fantasy.” Harper & Row, 471 U.S. at 563. But less protection is not no
protection. Even the arrangement of otherwise unprotectable facts surpasses the low bar for a
protectable original work of authorship. Google, 804 F.3d at 220.
Here, Anthropic accepts that all of Authors’ books — all published, whether non-fiction
or fiction — contained expressive elements (Reply 9). And, as set out above, this order
accepts Authors’ view of the evidence that their works were chosen for their expressive
qualities in building a central library and then in training specific LLMs (Opp. 11, 17 (citing,
e.g., Opp. Exh. 3 at -03433)).
The main function of the second factor is to help assess the other factors: to reveal
differences between the nature of the works at issue and the nature of their secondary use
(above), and to reveal any relation between the amount and substantiality of each work taken
and the secondary use (next). E.g., Campbell, 510 U.S. at 586; Kelly, 336 F.3d at 820; Google,
804 F.3d at 220; HathiTrust, 755 F.3d at 98; Bill Graham Archives v. Dorling Kindersley Ltd.,
448 F.3d 605, 612–13 (2d Cir. 2006).
The second factor points against fair use for all copies alike.
24United States District Court
Northern District of California
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 25 of 32
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
3. THE AMOUNT AND SUBSTANTIALITY OF THE PORTION USED.
The third fair use factor is “the amount and substantiality of the portion” of the
copyrighted work used by the accused. 17 U.S.C. § 107(3). The crux of this factor is whether
the amount was “reasonable in relation to the purpose of the copying.” Campbell, 510 U.S. at
586. Thus, the amount of copying is considered first against the work itself, then more
importantly against the proposed transformative purpose. See Warhol, 598 U.S. at 543 & n.18.
A. THE COPIES USED TO TRAIN SPECIFIC LLMS.
Copies selected for inclusion in training sets were selected because they were complete
and because they contained rich protectible expression, or so this order accepts the record
shows for Authors. Was all this copying reasonably necessary to the transformative use?
Yes.
“What matters [ ] is not so much ‘the amount and substantiality of the portion used’ in
making a copy, but rather the amount and substantiality of what is thereby made accessible to a
public [in the purported secondary use] for which it may serve as a competing substitute [for
the primary use].” Google, 804 F.3d at 222. Here, once again, there is no allegation of any
traceable connection between the Claude service’s outputs and Authors’ works. The copying
used to train the LLMs underlying Claude was thus especially reasonable.
In response, Authors object primarily that the copying used in training was both
extremely extensive and not strictly necessary.
As to extensive copying, it is true that entire works were copied. And, “copying [ ] entire
work[s] ‘militate[s] against a finding of fair use.’” Worldwide Church of God v. Philadelphia
Church of God, Inc., 227 F.3d 1110, 1118 (9th Cir. 2000) (quoting Hustler Mag. Inc. v. Moral
Majority Inc., 796 F.2d 1148, 1155 (9th Cir. 1986)); see Campbell, 510 U.S. at 587. But we
just addressed why Authors’ argument is misdirected. The copies that count for this factor are
those that would merely serve the same use as the work’s ordinary one. Authors do not allege
such copying. The accused use here of the incremental copies is as orthogonal as can be
imagined to the ordinary use of a book.
25United States District Court
Northern District of California
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 26 of 32
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
As to strict necessity, Authors make a stronger point. When a productive use is made
possible only by borrowing from a specific work, fair use climbs towards its zenith. When a
productive use is possible without that borrowing, fair use falls to its nadir — and the
borrowing deserves a particularly compelling justification. See Warhol, 598 U.S. at 543 &
n.18, 547. Here, it is true that Anthropic could have used some other books or no books at all
for training its LLMs — or so this order accepts the record shows for Authors. But Anthropic
has presented a compelling explanation for why it was reasonably necessary to use them
anyway.
For one thing, all agree Anthropic needed billions of words to train any given LLM. If
using only books, Anthropic would have needed millions of books per model. If using a set
comprising only a small fraction of books and a larger fraction of other texts, Anthropic still
would have needed hundreds of thousands of books. Authors contend that because Anthropic
showed it could use such smaller sets of books, it surely could have used no books at all — or
at least not their books (Opp. 23). But Authors forget that “reasonably necessary” does not
mean “strictly necessary.” Authors do not contest that the volume of text required to train an
LLM is monumental. Because using so many works was reasonably necessary, using any one
work for actually training LLMs was about as reasonable as the next.
For another thing, no output to the public was even alleged to be infringing. So, yes,
Authors’ works were chosen as the strongest examples of writing. But the compelling benefits
of training the LLMs on strong examples were not offset by revelations to the public of any
portion of the works themselves. What was copied was therefore especially reasonable and
compelling.
The third factor thus favors fair use for the training copies.
B. THE COPIES USED TO BUILD A CENTRAL LIBRARY.
But again, there was a separate use — a distinction that makes some difference as to
whether the amount and substantiality of the copying was “reasonable in relation to the
purpose of the copying” for the library copies. Campbell, 510 U.S. at 586.
261
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
(i) The Purchased Library Copies Converted from Print to Digital.
For the print library copies that Anthropic purchased and then converted into digital
library copies, Anthropic already enjoyed entitlement to keep the copies in its library. The
purpose of the copying was to keep them in its library but with more favorable storage and
searchability properties. Copying the entire work was exactly what this purpose required.
There was no surplus copying. The source copy was destroyed.
The third fair use factor favors fair use for the purchased library copies converted from
print to digital.
United States District Court
Northern District of California
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 27 of 32
(ii) The Pirated Library Copies.
For the pirated library copies, however, Anthropic lacked any entitlement to hold copies
of the books at all. Its purpose, it says, was to train LLMs. But its objective conduct was to
seek “all the books in the world” and then retain them even after deciding it would not make
further copies from them for training — indicating there were other further uses. Against the
purpose of acquiring all the books one could on the chance some might prove useful for
training LLMs and maybe other stuff too, almost any unauthorized copying would have been
too much. Anthropic copied millions of books in toto, Authors’ works among them.
The third factor points against fair use for the pirated library copies.
4. THE EFFECT OF THE USE UPON THE MARKET FOR OR VALUE OF THE
COPYRIGHTED WORK.
The final factor is “the effect of the use upon the potential market for or value of the
copyrighted work.” 17 U.S.C. § 107(4). This factor points against fair use when a copyist
makes copies available that displace demand for copies the copyright owner already makes
available or readily could. Texaco, 60 F.3d at 926–28 (reproduced copies); Dr. Seuss Enters.,
L.P. v. ComicMix LLC, 983 F.3d 443, 461 (9th Cir. 2020) (derivative copies). “While the first
factor considers whether and to what extent an original work and secondary use [in principle
could] have substitutable purposes, the fourth factor focuses on actual or potential market
substitution.” Warhol, 598 U.S. at 536 n.12 (emphasis added).
27United States District Court
Northern District of California
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 28 of 32
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
A. THE COPIES USED TO TRAIN SPECIFIC LLMS.
The copies used to train specific LLMs did not and will not displace demand for copies
of Authors’ works, or not in the way that counts under the Copyright Act.
Again, Authors concede that training LLMs did not result in any exact copies nor even
infringing knockoffs of their works being provided to the public. If that were not so, this
would be a different case. Authors remain free to bring that case in the future should such
facts develop.
Instead, Authors contend generically that training LLMs will result in an explosion of
works competing with their works — such as by creating alternative summaries of factual
events, alternative examples of compelling writing about fictional events, and so on. This
order assumes that is so (Opp. 22–23 (citing, e.g., Opp. Exh. 38)). But Authors’ complaint is
no different than it would be if they complained that training schoolchildren to write well
would result in an explosion of competing works. This is not the kind of competitive or
creative displacement that concerns the Copyright Act. The Act seeks to advance original
works of authorship, not to protect authors against competition. Sega, 977 F.2d at 1523–24.
Authors next contend that training LLMs displaced (or will) an emerging market for
licensing their works for the narrow purpose of training LLMs (Opp. 21–22). Anthropic
argues that transactional costs would exceed Anthropic’s expected benefit from any such
bargain, prompting it to cease dealing with any rightsholders or else to cease developing such
technology altogether (Br. 22–23). Our record could support either account — so this order
must assume Authors are correct. A market could develop (Opp. 19–21 (citing record)). Even
so, such a market for that use is not one the Copyright Act entitles Authors to exploit.
None of the cases cited by Authors requires a different result. All contemplated losses of
something the Copyright Act properly protected — not the kinds of fair uses for which a
copyright owner cannot rightly expect to control. See TVEyes, Inc., 883 F.3d at 181 (use of a
right legally reserved to and factually already being licensed by copyright owner); Texaco, 60
F.3d 931 (same); Ringgold v. BET, Inc., 126 F.3d 70, 80–81 (2d Cir. 1997) (use of a right
legally reserved to and factually likely to be marketable by copyright owner — displaying
28United States District Court
Northern District of California
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 29 of 32
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
images of her artistic work in television shows); cf. Seltzer v. Green Day, Inc., 725 F.3d 1170,
1179 (9th Cir. 2013) (no evidence use could be or “was likely to” be marketable).
The fourth factor thus favors fair use for the training copies.
B. THE COPIES USED TO BUILD A CENTRAL LIBRARY.
(i) The Purchased Library Copies Converted from Print to Digital.
For these copies, this order assumes Anthropic’s format change from print to digital
displaced purchases of new digital copies that Anthropic would have made directly from
Authors (had it not been able to purchase print copies in used condition). But for reasons
stated under the first factor, such losses did not relate to something the Copyright Act reserves
for Authors to exploit. It was a format change.
Authors’ next argument, it seems, is that the format change nonetheless exposed it to
usurpation of the opportunity to sell rightful copies because Anthropic might transmit
additional unauthorized digital copies more readily than it could have transmitted additional
unauthorized print copies — and that the same would be true for all format converters (cf. Opp.
25 n.14; Opp. Expert Malackowski ¶ 52). But after much discovery, there is no inkling in our
record of intent to redistribute library copies once acquired nor of inability to secure that
valuable library against outside actors. And, if the internal, central library copies did or do in
fact lead to further reproduction or distribution, those further copies remain redressable
separately by Authors. The format change did not itself usurp the Authors’ rightful
entitlements.
This factor is thus neutral for the purchased library copies converted from print to digital.
(ii) The Pirated Library Copies.
The copies used to build a central library and that were obtained from pirated sources
plainly displaced demand for Authors’ books — copy for copy. Not every person who merely
intends to make a fair use of a work is thereby entitled to a full copy in the meantime, nor even
to steal a copy so that achieving this fair use is especially simple or cost-effective. Here, the
copies employed in training LLMs were one thing, but the copies acquired to assemble a
29United States District Court
Northern District of California
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 30 of 32
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
convenient, general-purpose library of works for various uses for which the company might
have of them, if any, was a different use altogether.
Anthropic has almost no rebuttal on these points. First, Anthropic argues that “Claude’s
services do not reduce [or usurp] the value of Plaintiffs’ works through substitution in their
traditional markets” (see Br. Expert Peterson ¶ 33). But stealing pirated copies of Authors’
works plainly did. Second, Anthropic argues that it may have been able to purchase some
books on the open market (and some other texts), but not other texts it copied (cf. id. ¶ 48 (re
licensing)). But this case does not concern those other texts it could not have purchased. It
could have purchased Authors’ books (and many others). In fact it later did. Finally,
Anthropic argues that the effect on these texts from one book foregone was too small to be
considered (see id. ¶ 77). But the test requires that we contemplate the likely result were the
conduct to be condoned as a fair use — namely to steal a work you could otherwise buy (a
book, millions of books) so long as you at least loosely intend to make further copies for a
purportedly transformative use (writing a book review with excerpts, training LLMs, etc.),
without any accountability. As Anthropic itself suggested, “That would destroy the [entire]
publishing market if that were the case” (see Tr. 53; see also Tr. 32, 41; Opp. Expert
Malackowski ¶¶ 31–34, 38).
The fourth factor points against fair use for the pirated library copies.
5. OVERALL ANALYSIS.
After the four factors and any others deemed relevant are “explored, [ ] the results [are]
weighed together, in light of the purposes of copyright.” Campbell, 510 U.S. at 578.
The copies used to train specific LLMs were justified as a fair use. Every factor but the
nature of the copyrighted work favors this result. The technology at issue was among the most
transformative many of us will see in our lifetimes.
The copies used to convert purchased print library copies into digital library copies were
justified, too, though for a different fair use. The first factor strongly favors this result, and the
third favors it, too. The fourth is neutral. Only the second slightly disfavors it. On balance, as
301
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
the purchased print copy was destroyed and its digital replacement not redistributed, this was a
fair use.
United States District Court
Northern District of California
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 31 of 32
The downloaded pirated copies used to build a central library were not justified by a fair
use. Every factor points against fair use. Anthropic employees said copies of works (pirated
ones, too) would be retained “forever” for “general purpose” even after Anthropic determined
they would never be used for training LLMs. A separate justification was required for each
use. None is even offered here except for Anthropic’s pocketbook and convenience.
And, as for any copies made from central library copies but not used for training, this
order does not grant summary judgment for Anthropic. On this record in this posture, the
central library copies were retained even when no longer serving as sources for training copies,
“hundreds of engineers” could access them to make copies for other uses, and engineers did
make other copies. Anthropic has dodged discovery on these points (e.g., Opp. Exh. 17 at 93–
94 (retained); Opp. Exh. 22 at 196 (no limits); Opp. Exh. 30 at 3, 4 (no accounting); see also
Opp. 15). We cannot determine the right answer concerning such copies because the record is
too poorly developed as to them. Anthropic is not entitled to an order blessing all copying
“that Anthropic has ever made after obtaining the data,” to use its words (Opp. Exh. 30 at 3, 4).
CONCLUSION
With respect to the training copies and the print-to-digital converted copies, this order has
drawn all ambiguities and inferences in favor of the opposing side, namely Authors. With
respect to the pirated copies, this order has also accepted the Authors’ version of the facts.
Authors did not move for summary judgment but if they had, then we would have been
obligated to accept all reasonable views given the evidence in defendant’s favor instead.
This order grants summary judgment for Anthropic that the training use was a fair use.
And, it grants that the print-to-digital format change was a fair use for a different reason. But it
denies summary judgment for Anthropic that the pirated library copies must be treated as
training copies.
We will have a trial on the pirated copies used to create Anthropic’s central library and
the resulting damages, actual or statutory (including for willfulness). That Anthropic later
311
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
bought a copy of a book it earlier stole off the internet will not absolve it of liability for the
theft but it may affect the extent of statutory damages. Nothing is foreclosed as to any other
copies flowing from library copies for uses other than for training LLMs.
IT IS SO ORDERED.
Dated: June 23, 2025.
WILLIAM ALSUP
UNITED STATES DISTRICT JUDGE
United States District Court
Northern District of California
Case 3:24-cv-05417-WHA Document 231 Filed 06/23/25 Page 32 of 32
32
Los comentarios están cerrados, pero trackbacks Y pingbacks están abiertos.