The New York Times has filed a lawsuit against OpenAI and Microsoft, its close partner (and investor), for allegedly breaking copyright law by training generative AI models on Times material.
The Times claims in its complaint, filed in Federal District Court in Manhattan, that millions of its stories were used to train AI models, including those behind OpenAI’s popular ChatGPT and Microsoft’s Copilot, without its permission. The Times demands that OpenAI and Microsoft “destroy” the offending models and training data and that they be held liable for “billions of dollars in statutory and actual damages” resulting from the “unlawful copying and use of The Times’s uniquely valuable works.”
According to the case The Times filed, “If The Times and other news organizations cannot produce and protect their independent journalism, there will be a vacuum that no computer or artificial intelligence can fill.” “Less journalism will be produced, and the cost to society will be enormous.”
An OpenAI spokeswoman stated in an emailed statement, “We respect the rights of content creators and owners and are committed to working with them to ensure they benefit from AI technology and new revenue models.” Our continuous discussions with The New York Times have been fruitful and helpful; therefore, we are startled and dismayed by this development. We want to find a mutually advantageous approach to collaboration, as we have with many other publications.”
Generative AI models “learn” from examples to create essays, code, emails, articles, and other content, and vendors such as OpenAI mine the web for millions to billions of these instances to contribute to their training sets. Some examples are available in the public domain. Others aren’t or are subject to limited licensing that requires citations or specified forms of payment.
Vendors contend that the fair use theory protects their web-scraping techniques indefinitely. Owners of intellectual property disagree; hundreds of news organizations are now deploying coding to block OpenAI, Google, and others from scanning their websites for training data.
The vendor-outlet issue has resulted in an increasing number of court challenges, the most recent being The Times’.
In July, actress Sarah Silverman joined a pair of lawsuits accusing Meta and OpenAI of having “ingested” Silverman’s book to train their AI models. Thousands more writers, including Jonathan Franzen and John Grisham, have filed a separate lawsuit alleging that OpenAI used their work as training data without their consent or knowledge. In addition, numerous programmers are suing Microsoft, OpenAI, and GitHub over Copilot, an AI-powered code-generation tool that the plaintiffs claim was created using their IP-protected code.
While The Times is not the first to sue generative AI vendors for alleged intellectual property violations involving written works, it is the largest publisher involved in such a suit to date and one of the first to highlight potential brand damage caused by “hallucinations,” or made-up facts from generative AI models.
The Times’ complaint cites several instances in which Microsoft’s Bing Chat (now known as Copilot), which is powered by an OpenAI model, provided incorrect information purportedly sourced from The Times, including results for “the 15 most heart-healthy foods,” 12 of which were not mentioned in any Times article.
The Times also claims that OpenAI and Microsoft are effectively building news publisher competitors using The Times’ works, harming The Times’ business by providing information that isn’t always cited, sometimes monetized, and stripped of affiliate links that The Times uses to generate commissions.
As The Times’ allegation implies, generative AI models have a propensity to repeat training data, such as replicating almost exact findings from articles. Aside from regurgitation, OpenAI has mistakenly helped ChatGPT users circumvent paywalls on at least one occasion.
“Defendants seek to free-ride on The Times’ massive investment in its journalism,” according to the lawsuit, accusing OpenAI and Microsoft of “using The Times’s content without payment to create products that substitute for The Times and steal audiences away from it.”
The effects on the news subscription business—and publisher site traffic—are at the center of a somewhat related action launched earlier this month against Google by publishers. The defendants in the lawsuit, including The Times, said that Google’s GenAI projects, such as its AI-powered Bard chatbot and Search Generative Experience, siphon off publishers’ content, users, and ad income via anticompetitive ways.
The allegations by publishers are credible. According to The Atlantic’s latest estimate, if a search engine like Google integrated AI into search, it would answer a user’s query 75% of the time without needing a click-through to its website. Publishers in the Google action predict that they will lose up to 40% of their traffic.
That doesn’t guarantee they’ll win in court. Heather Meeker, a founding partner of OSS Capital and an expert on intellectual property problems like licensing agreements, likened The Times’ example of regurgitation to “using a word processor to cut and paste.”
“In the complaint, The New York Times gives an example of a ChatGPT session about a 2012 restaurant review,” Meeker said in an email to Eltrys. “The ChatGPT question is, ‘What were the first paragraphs of his review?’ The following prompts then continuously request ‘the next phrase.’ Using a chatbot to trick it into replicating input is not a reasonable ground for copyright infringement. It is the user’s responsibility if the chatbot is purposefully copied. And that is why most [lawsuits like this] are likely to fail.”
Rather than fighting generative AI firms in court, several news organizations have entered into license deals with them. The Associated Press reached an agreement with OpenAI in July, and Axel Springer, the German publisher that owns Politico and Business Insider, did the same this month.
According to the lawsuit, The Times sought to strike a license agreement with Microsoft and OpenAI in April, but the discussions were unsuccessful.