As we previously blogged, multiple generative AI platforms are facing lawsuits alleging that the unauthorized use of copyright-protected material to train artificial intelligence constitutes copyright infringement. A key defense in those cases is fair use. Specifically, AI platforms contend that they don’t need a license to use copyright-protected content—whether scraped from the Internet or obtained from a pirate trove of books—for the purpose of developing and improving large language models (LLMs) under the theory that such use is transformative and fair use under the Copyright Act. Whether fair use prevails in this battle is one of the biggest copyright questions of the day.
While many of the generative AI actions are pending in the U.S. District Court for the Northern District of California, a federal court in Delaware recently had the opportunity to opine on the merits of this important fair use question. In Thomson Reuters v. Ross Intelligence, 2023 WL 6210901 (D. Del. Sept. 25, 2023), the owner of Westlaw (Thomson Reuters) claims, among other things, that an AI startup (Ross Intelligence) infringed Thomson Reuters’ copyright by using Westlaw’s headnotes to train Ross’s legal AI model. The parties cross moved for summary judgment on various grounds, including on Ross’s fair use defense.
Though the decision explores multiple interesting questions of copyright law, including the copyrightability of Westlaw headnotes (maybe) and whether the Copyright Act preempts Thomson Reuters’ claim for tortious interference (yes), its analysis of Ross’s fair use defense—in particular, the court’s assessment of whether Ross’s alleged use of Westlaw’s headnotes (assuming they are protected by copyright) is “transformative—is where the court appears to have broken new ground.
The court begins its fair use analysis by discussing two cases from the Ninth Circuit that deal with so-called “intermediate copying.” In Sega Enterprises v. Accolade, 977 F.2d 1510 (9th Cir. 1992), the court held that it was fair use for a company to copy Sega’s copyright-protected console code for the purpose of learning the software’s functional components and making new games that were compatible with Sega’s console. Similarly, in Sony Computer Entertainment v. Connectix, 203 F.3d 596 (9th Cir. 2000), the Ninth Circuit held it was fair use for a company to create a copy of Sony’s software in order to create a new gaming platform that was compatible with Sony’s games. The Thomson Reuters court noted that the Supreme Court “has cited these intermediate copying cases favorably, particularly in the context of ‘adapting the doctrine of fair use in light of rapid technological change.’” 2023 WL 6210901, at *8 (quoting Google v. Oracle, 141 S. Ct. 1183, 1198 (2021)) (cleaned up).
Thomson Reuters attempted to distinguish the intermediate-copying cases by arguing that, unlike the companies in Sega and Sony that merely sought to “study functionality or create compatibility,” Ross sought to train its AI with Westlaw’s “creative decisions” specifically to “replicate them” in the AI’s output. Ross, on the other hand, contended that “its AI studied the headnotes and opinion quotes only to analyze language patterns, not to replicate Westlaw’s expression,” and thus was lawful “intermediate copying.” The court held that whether Ross’s use was transformative would turn on the “precise nature of Ross’s actions.”
Here’s the key text:
It was transformative intermediate copying if Ross’s AI only studied the language patterns in the headnotes to learn how to produce judicial opinion quotes. But if Thomson Reuters is right that Ross used the untransformed text of headnotes to get its AI to replicate and reproduce the creative drafting done by Westlaw’s attorney-editors, then Ross’s comparisons to cases like Sega and Sony are not apt.
The court concluded that “this is a material question of fact that the jury needs to decide" and thus refused to grant Ross's motion for summary judgment. However, the court's application of the intermediate-copying doctrine in the context of machine learning could serve as an important precedent and a roadmap for other courts adjudicating copyright infringement claims related to LLMs and other generative AI models.
To the extent that LLMs are ingesting copyright-protected material solely to understand language patterns and not to replicate their creative expression (which may very well be the case for many LLMs), this opinion suggests that using such material to train AI is transformative. But if the material is being used to train AI to output the “creative drafting” discerned from the original, then the use is likely not transformative. Thus, as the Thomson Reuters court observes, the fair use question in these cases may turn on the exact nature of the AI training process.