Court rulings on alleged copyright infringement by AI pit creators against tech companies

Last year, multiple Canadian lawsuits alleged copyrighted works were used to enhance AI products
Court rulings on alleged copyright infringement by AI pit creators against tech companies

In recent years, lawsuits alleging tech companies are using copyrighted works to train or otherwise enhance their AI products have cropped up in Canada and the US, raising but not yet resolving a pressing question amid the ongoing AI boom: what’s fair game?

For intellectual property experts, potential answers to this question are loaded, with implications for the global AI race and creators.

“Do we as a society promote this kind of internet scraping so that our AI tools are as powerful as possible, or do we elevate the rights of creators of original content so that creators have a say in how their content is being used or accessed in the service of these AI models?” asks Nathaniel Lipkus, an intellectual property partner at Osler, Hoskin & Harcourt LLP.

Lawsuits on these issues are being litigated “against the geopolitical backdrop where we’re nervous about which country is going to win on AI,” Lipkus adds. Those issues have “ethical dynamics to it that can’t be ignored, but also geopolitical ones.”

Marc Crandall, a partner at Gowling WLG who specializes in intellectual property and tax law, says issues the law has yet to resolve include who owns the data that AI systems rely on if it’s stored on a cloud, what data can be used to train AI systems, how you can access it, and what considerations are there regarding using someone’s copyrighted work to train AI.

“These are all live issues right now before courts around the world, including ours,” Crandall says.

Last fall, a group of Canadian media companies – including Toronto Star Newspapers Limited, the Canadian Broadcasting Corporation, the Globe and Mail Inc., and Canadian Press Enterprises Inc. – announced they were suing OpenAI, alleging the company had infringed on their copyright by scraping news websites to train its ChatGPT service.

Around the same time, a British Columbia artist filed a proposed class action against OpenAI and Microsoft. The artist claimed the companies scraped or reproduced copyrighted works that he and other class members owned to train their generative AI models. Nonprofit legal database CanLII meanwhile sued Caseway AI, a company that describes itself as an artificial intelligence-driven legal research assistant, alleging it had scraped and poached CanLII’s catalogued and annotated work and was offering it to consumers for a subscription fee.

The Canadian plaintiffs filed their lawsuit roughly a year after the New York Times sued OpenAI and Microsoft in the US on similar grounds. The news outlet took issue with the ChatGPT function, which memorizes content and reproduces it upon prompting. Because such reproductions of news content can be inaccurate or include hallucinations, the New York Times argued they could expose the news outlet to commercial harm.

Crandall represents OpenAI and cannot comment on litigation involving the company. However, he notes that the issues in the recent spate of lawsuits against AI entities generally boil down to whether and how AI systems can use information – particularly copyrighted works.

These issues have become pressing in recent years as the race to develop competitive AI products ramped up. Developing AI systems requires heavy computation and vast amounts of data to refine their sophistication and efficacy. Crandall notes that many companies seeking intellectual property protections for their AI products pursue coverage not only for the software itself but for the “data that [they] have that is so key to making AI systems useful.” These protections come in various forms, including trade secrets – which typically require implementing attendant mechanisms like non-disclosure agreements and cybersecurity safeguards – or copyright.

Crandall offers the hypothetical example of a medical imaging company that owns 10 billion categorized X-ray images of broken and non-broken limbs. That data set is incredibly valuable, he says, since it “can be used to train – or more accurately, to refine – an AI model to give you some very valuable ability to output useful things.”

As indicated by recent lawsuits targeting AI companies, however, some AI companies are also sourcing data from parties without the latter’s explicit knowledge or permission, even in cases where the data is copyrighted.

Securing copyright protection for works is fairly unchallenging, according to Lipkus, who says that “the standard of originality to justify copyright is not particularly high.” Copyright protection also covers more than singular works, like a novel or a piece of art. A company that runs a database of court judgments, for instance, would not be able to copyright the decisions themselves, but they could claim copyright over their summaries of judgments or over their depiction of case information, Lipkus says. CanLII drew on this argument in its case against Caseway AI, claiming the labour it puts into reviewing, analyzing, curating, annotating, and otherwise enhancing public court data turns its presentation of that data into copyright-protected material.

However, Monica Sharma, a partner at Clark Wilson LLP, says she anticipates many AI entities will likely try to justify accessing copyrighted works by citing a fair dealing exemption under federal copyright law. This exemption aims to strike a balance between the rights of creators and users, allowing copyright-protected material to be used in limited ways without risk of infringement, even when copyright owners have not provided permission.

While Sharma says she isn’t aware of any Canadian court decisions that address whether and how fair dealing applies to AI entities’ use of copyrighted data, at least one court in the US has already ruled on the issue.

In February, a US federal court in Delaware ruled that a now-defunct legal research firm, Ross Intelligence, could not use Thomson Reuters’ copyrighted works to train its AI system. Thomson Reuters, which owns the legal research platform Westlaw, sued Ross in 2020, alleging the company had unlawfully used Westlaw’s headnotes summarizing key points from published court decisions.

Ross responded with various defences, including fair use – a concept analogous to fair dealing in Canada. The Delaware court upheld the validity of Thomson Reuters’ copyrights apart from those that expired. The court also rejected several of Ross’s defences, including its fair use argument, concluding that Ross’s use of the copyrighted data harmed the market for Thomson Reuters’s headnotes and related products.

The Delaware decision is the first US ruling on how fair use applies to AI-related copyright litigation. However, it does not necessarily indicate how Canadian courts will rule on similar issues.

Sharma says that for AI more generally, she isn’t surprised that the law is still a few steps behind the rapidly developing technology.

The current legal landscape isn’t “equipped, really, to deal with the quick advancement that's happened in the AI space,” she says, adding she has seen a similar dynamic play out with other tech developments.

“It’s always a bit of catch-up that governments are playing with technology,” she says. “That’s definitely the case with AI.” 

Lawyer(s)

Nathaniel Lipkus Marc Crandall Monica Sharma

Firm(s)