Digital data mining and machine learning technology design for computer brain

Copyright law in the age of AI

Generative AI technology is throwing up new challenges in the area of copyright law. Several court cases are arising that will have to answer the question of what relationship, if any, the content created by huge datasets of copyrighted work has to the creators of those original works. While the courts wrestle with this dilemma, another technology could potentially have the answer: blockchain.

Recent innovations in artificial intelligence (AI) are raising as many new questions as they are providing new possibilities, particularly for developers, creators, governments, and lawmakers, not all of whom have a similar stake in the technology.

One area where interests collide is copyright law. AI-generated content has thrown up issues around authorship, infringement, and fair use, with so-called “generative AI” programs—such as OpenAI’s DALL-E and ChatGPT, and Stability AI’s Stable Diffusion—able to generate new images, texts, and other content, in response to a user’s textual prompts or inputs.

These generative AI programs, also known as large language models (LLMs), are AI algorithms that use deep learning techniques and huge datasets to understand, summarize, and generate new content. LLMs are trained to create content partly by exposing themselves to large quantities of existing works such as writings, photos, paintings, and other artworks.

Herein lies the problem. Some of this work will be copyrighted material, and using it in AI training datasets without authorization can result in disputes, and this has been done.

Two distinct legal issues arise with regard to copyright and IP.

First, who, if anyone, owns the created work and holds the copyright to content created using these programs, given that the AI’s user, the AI’s programmer, and the AI program itself all play a role in creating these works?

The second, perhaps more complex, issue to answer is whether that created work/content has violated the copyright of one of the creators from whom the AI pooled its vast dataset of information to create its ‘new’ work.

The AI training process often involves making digital copies of existing works, carrying a risk of copyright infringement. As the U.S. Patent and Trademark Office has described, “the ingestion of copyrighted works for purposes of machine learning will almost by definition involve the reproduction of entire works or substantial portions thereof.”

Then there is the possibility of the AI reproducing works, depending on the prompts it’s given, that are too similar to certain works that were part of its training—or even exact copies of the original works. This could damage artists’/creators’ profits and infringe copyrights.

Courts are beginning to be forced to wrestle with these difficult questions, an example being a recent class action lawsuit filed by the Authors Guild in the United States against OpenAI for allegedly infringing on the copyrights of fiction writers in the training of its generative AI model, ChatGPT.

“By necessity, large language models need millions and millions of words to teach and develop a model that examines the relationships between words and can actually function to generate new texts,” says Joseph Petersen, managing partner of U.S. law firm Kilpatrick Townsend’s Silicon Valley office.

Speaking to CoinGeek, Petersen suggests that the nature of AI learning naturally brings it into conflict with issues around copyright and IP, but that perhaps the ends justify the potentially infringing means.

“It’s going to further the progress of science and the arts. It’s going to result in new works, new insights,” says Petersen. “That’s the heart of the issue that I think the cases are really going to turn on, many different facts including the very nature of the data that was used, how fleeting was the use of that data, what types of insights and works will come out of that data set.”

To begin to get to grips with some of these hotly debated issues, the U.S. Copyright Office has confirmed the start of a public consultation while pledging to consult with key stakeholders, including AI developers, copyright holders, and consumer groups.

Some options would avoid the need for copyright owners to go down the litigious route. If new technology such as AI is the cause of the problem, perhaps another new—or relatively new—technology, blockchain, can be the solution.

The combination of automatically executing smart contracts, micropayments, and the ability to record mass transactions on a ledger could make blockchain technology, or certain particularly scalable blockchains, the ideal method of monetizing the mass transfer, copying, and reproducing of creative works and saving copyright owners and AI users/developers from unnecessary and expensive disputes.

This tantalizing prospect is worth unpacking, but first, the courts are currently doing the legwork of attempting to apply existing laws to this novel situation.

Cases being fought

Generative AI is the subject of several ongoing copyright violation claims in the U.S., with creators and artists across the spectrum taking tech developers to court to protect their IP.

One such case began in January when three visual artists, Sarah Andersen, Kelly Mckernan, and Karla Ortiz, sued multiple generative AI platforms, namely Stability AI, Midjourney Inc., and DeviantArt Inc.

The suit alleges that the companies used the artists’ works without consent or compensation to build the training sets that inform their AI algorithms. This allows users to generate artworks and images that may be insufficiently transformative from their existing, protected works.

“Defendants are using copies of the training images… to generate digital images and other output that are derived exclusively from the Training Images, and that add nothing new,” stated the initial filing. In turn, these generated images “will substantially negatively impact the market for the work of plaintiffs and the class.”

For example, both Andersen and McKernan claim that their art has been used in LAION (Large-Scale Artificial Intelligence Open Network) datasets, which were used by Stability to create its Stable Diffusion AI image creation tool, which uses AI to provide computer-synthesized images in response to text prompts.

Stability AI was the subject of another complaint in February, this time from image licensing service Getty, who filed a lawsuit accusing the Stable Diffusion text-to-image AI program of improper use of its photos, violating copyright and trademark rights it has in its watermarked photograph collection.

Getty accused Stability AI of “brazen infringement of Getty Images’ intellectual property on a staggering scale” in order to build its dataset and train the AI, claiming Stability has copied more than 12 million photographs from Getty Images’ “collection, along with the associated captions and metadata,” without permission from or compensation.

These two cases both involve the art and images of creators being used to build and train AI datasets, which can then produce ‘new’ works that are potentially—depending on how the court rules—not sufficiently transformative of the originals. In some cases, such as the Getty lawsuit, the AI programs can even reproduce the originals if given the correct prompts, creating a route to bypass Getty’s fee for images and damaging the company’s profits.

But it’s not just image-producing AI that is allegedly infringing copyrights.

A few months after Getty filed suit, now in the realm of the written word, U.S.-based comedian and writer Sarah Silverman, along with two other authors, filed a claim against OpenAI, alleging that their copyrights had been infringed in the training of the firm’s AI systems.

“Much of the material in OpenAI’s training datasets… comes from copyrighted works—including books written by Plaintiffs—that were copied by OpenAI without consent, without credit, and without compensation,” said the July complaint.

Most recently, a September 19 court filing saw the U.S. Authors Guild sue OpenAI for allegedly infringing on the copyrights of fiction writers, including John Grisham, George R.R. Martin, Jodi Picoult, and David Baldacci, in the training of ChatGPT.

According to the plaintiffs, OpenAI used existing books to train its AI model without seeking the express consent or permission of the copyright owners, in what the Authors Guild described as “flagrant and harmful infringement.”

“Defendants copied Plaintiffs’ works wholesale, without permission or consideration,” claimed the filing. “These algorithms are at the heart of Defendants’ massive commercial enterprise. And at the heart of these algorithms is systematic theft on a mass scale.”

The Authors Guild also argued that the indiscriminate use of copyrighted material in the training of AI models could put the entire literary sector at risk and potentially impact the earnings of fiction writers, with ChatGPT users able to generate works that imitate established authors, potentially even selling them as original works.

The Silverman and Authors Guild cases both claim the initial reproduction of the original works in order to train the AI algorithm amounted to copyright infringement; but the latter’s claim is also concerned with the potential damage it could cause to the Authors Guild’s members if the AI, using a dataset built on the work of those members, starts to produce works that take business away from those members.

“LLMs allow anyone to generate—automatically and freely (or very cheaply)—texts that they would otherwise pay writers to create. Moreover, Defendants’ LLMs can spit out derivative works: material that is based on, mimics, summarizes, or paraphrases Plaintiffs’ works, and harms the market for them,” claimed the filing.

However, in legal terms, the cases revolve around similar concepts of law, as Petersen explains:

“It really will depend upon an analysis of market harm and of the nature of the work and of the principal uses of the tool. All of those things will be in play… In the current round of AI cases, there’s going to be similar arguments made that this was not done to provide access to the underlying works that were used to train models, but rather create new works and that that type of use is consistent with the copyright clause in the US Constitution.”

The clause in question is ‘fair use.’

OpenAI has admitted that its programs are trained on “large, publicly available datasets that include copyrighted works” and that this process “necessarily involves first making copies of the data to be analyzed.”

Creating such copies without permission from the various copyright owners could be seen as infringing the copyright holder’s exclusive right to make reproductions of their work. However, Open AI has also argued that using copyrighted material is legal because “under current law, training AI systems constitutes fair use.”

Fair use or theft?

According to Harvard Business School, the outcomes of many of these, or similar, AI cases “hinge on the interpretation of the fair use doctrine, which allows copyrighted work to be used without the owner’s permission.”

Specifically, whether or not copying constitutes “fair use” depends on four key factors under U.S. Code 17 (§ 107):

1. The purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;
2. The nature of the copyrighted work;
3. The amount and substantiality of the portion used in relation to the copyrighted work as a whole;
4. And the effect of the use upon the potential market for or value of the copyrighted work.

Reasons for so-called “fair use” might include criticism (including satire), comment, news reporting, teaching (such as copies for classroom use), scholarship, or research, and for transformative use of the copyrighted material in a manner for which it was not intended. It’s easy to see how ideas around teaching, scholarship, and research could apply to AI learning, but transformative use is also a factor worth considering when it comes to generative AI cases.

“That’s the critical issue, how is what AI does any different than what a human being does in reading background material in connection with assembling something new?” says Petersen, who suggests that there is a thin line between how AI trains from datasets and transforms the work therein to produce new works, and how human creators learn from previous work to create new, transformative, works inspired by what has gone before.

“How is that any different than if someone wants to write a work of fiction and use it as kind of source material, in other words, wants to create the same sort of a feel as, say, a Hemingway novel or Salinger novel or any other type of novel,” argues Petersen. “They read it, absorb it, learn from it, and then create a different work that’s non-infringing but has certain elements that result from the author’s learning what was done before.”

So, if the work is different or transformative enough materially from the original works (in the eyes of the court), it could be considered fair use, even if it has a similar feel or tone to the dataset works.

But a work can also be considered fair use if it has a transformative purpose. For example, works of literary fiction being used to create something other than literary fiction could be considered transformative if the intended purpose of the new work is different enough from the intended purpose of the original work/s.

These are the very nuanced and difficult distinctions being fought in the various ongoing AI cases.

“In these cases challenging the use of copyrighted works in connection with developing AI models, fair use is the cornerstone. It will be interesting to see the outcomes,” says Petersen, who admits that matters are complicated because currently, “there isn’t really clear precedent.”

This is part of the challenge due to the recency of the technology. Until one of these cases revolving around fair use and AI comes to fruition, there is no direct precedent or case law from which to work.

However, there are other non-AI cases that could give an indication of how the courts treat the products of generative AI.

Peripheral precedents

One such peripherally relevant ruling came in 2015 when Google (NASDAQ: GOOGL) successfully defended itself against another Authors Guild’s lawsuit by arguing that transformative use allowed for the scraping of text from books to create its search engine.

The court held that “Google’s unauthorized digitizing of copyright-protected works, creation of a search functionality, and display of snippets from those works are non-infringing fair uses” due to the “highly transformative” purpose—specifically, allowing users to search terms of interest in books, which they would still need to buy or rent to read the full text, thus not impacting the authors’ profits.

However, while it set a certain precedent in the area of information scraping, this transformative use angle might not fly for generative AI if the intended purpose of the work created is more similar to that of the original work, i.e., if the AI was used to create works of fiction, from a dataset made up of existing copyrighted works of fiction, or even used to recreate the entirety of the original work.

Another, more recent ruling could provide a different kind of precedent for the AI debate. In May this year, in a 7-2 vote, the U.S. Supreme Court ruled Andy Warhol infringed on photographer Lynn Goldsmith’s copyright when he created a series of silk screen images based on a photograph Goldsmith shot of the late musician Prince in 1981.

Goldsmith sued the Andy Warhol Foundation for the Visual Arts (AWF) for copyright infringement after the foundation licensed an image of Warhol titled “Orange Prince” (based on Goldsmith’s image of the pop artist) to Conde Nast in 2016 for use in its publication, Vanity Fair.

Before the case reached the Supreme Court, a federal district court ruled in favor of the Andy Warhol Foundation. It found Warhol’s work transformative enough in relation to Goldsmith’s original to invoke fair use protection. But that ruling was subsequently overturned by the 2nd U.S. Circuit Court of Appeals.

The Warhol Foundation took it to the Supreme Court, which upheld the Appeals Court’s decision.

“Goldsmith’s original works, like those of other photographers, are entitled to copyright protection, even against famous artists,” wrote Justice Sonia Sotomayor in her opinion, concluding that the work produced by Warhol was not “sufficiently distinct from the original” and therefore Goldsmith’s copyright was violated when it was licensed to Conde Nast—rather than hung in Warhol’s home, as was originally intended.

Goldsmith had an agreement with Warhol for him to use her photograph in his work if it was hung in a gallery (in this case, his private gallery), but their agreement didn’t extend to the work being licensed to third parties for a print publication. Since the terms of the original agreement were broken, the court reverted to the fair use doctrine to assess the legality of the licensing to Conde Nast. It deemed Warhol’s work not sufficiently transformative.

“I think the notable thing that the court did in Warhol was it didn’t look at the work and compare it to the prior work, the analysis was much more focused on the particular use at issue in that case, which was the recent article in Vanity Fair, and the licensing models that the original author had developed,” explains Petersen. “Looking at it not so much from a work-focused analysis but from a use-focused analysis and finding that it was not fair use, but opening the door to say that display of that artwork in the museum could be fair use.”

So, again, the use or purpose of the new work was key.

Transposing this argument to AI, it would seem the Supreme Court sided with the original artist’s position in a copyright dispute regarding fair use and its transformative requirement. This would seem to favor the likes of the Authors Guild regarding generative AI using their work to create new works.

Yet, it’s a fine line. Based on these two cases, the plaintiffs in the various generative AI lawsuits will seemingly need to prove that the work created by the AI is not sufficiently distinct from their original work and also are being used for the same or similar purpose, whether that be creative fiction, art, photography or comedy.

Whether either of these cases are used or referenced in the AI lawsuits will depend on how they play out and the arguments being made, but there is one precedent that at least the Warhol decision may have set. Petersen states, “I think it’s highly likely that the decision in terms of fair use and AI ultimately will go to the Supreme Court.”

“Fair use in the US is so incredibly fact-specific and subjective, particularly in the wake of the Warhol decision. I think courts are going to really require an examination of exactly what was done, how the datasets were used, what kind of output is generated, what kind of impact that’s going to have on the market. So it’s a very fluid analysis.”

The stakes for both artists and AI developers are high, and with such fluid and subjective analysis involved, it seems likely that the Supreme Court will be involved again at some point.

However, there is potentially a better way to ensure copyright protection in the age of generative AI, outside of the long and expensive litigious route to the Supreme Court, through the use of blockchain technology.

A possible solution in the blockchain

Blockchain allows users to store and track peer-to-peer transactions securely on a network without the need for an intermediary and—depending on your view on the veracity of “decentralization” claims—without any centralized control or authority. Blockchain also provides an immutable record of transactions, meaning all information about who created what and when can be stored on the public timestamped ledger.

If creators stored their data on-chain or the datasets that AI algorithms draw from were put on-chain, a relationship could be formed that might solve the copyright issue.

Blockchain could provide full transparency into how generative AI algorithms use copyrighted material, verifying AI-generated content authenticity and detecting unauthorized use; blockchain networks could be programmed so that each use (transaction) requires a payment, allowing creators to automatically receive payments whenever their content is used by AI algorithms, with gated content being timestamped every time the AI draws from it; copyrighted material can be tokenized for control while enabling specific usage rights through tokens; and the ability to process huge amounts of microtransactions could allow for repeated mass use of an algorithm, based on copyrighted work, to be fairly monetized for the original creators, without bankrupting the developers or their users.

Firstly, this would need a scalable blockchain capable of storing this amount of data, and second, it would require a blockchain that can handle such a huge number of micro- and nano-transactions, as generative AI might frequently and repeatedly draw from certain authors’ work in its dataset.

The BSV blockchain would be ideally placed to offer the infrastructure necessary for the number of micro-transactions that would be required to solve the generative AI copyright dilemma. Scalability is key to the utility of the BSV blockchain in this regard.

In 2022, a gigantic BSV block of 3.65 GB was processed. In March this year, BSV blockchain ecosystem member mintBlue, the Blockchain-as-a-Service (BaaS) platform, processed 18GB of data over the course of a day at a minimal cost of $325, registering 91% of all transactions across all major blockchains. This was topped in August when, for a 24-hour period, 128.691 million on-chain transactions were processed on the BSV blockchain, proving that so-called scaling limits can be pushed and broken repeatedly. Despite this massive number of transactions, which would have crippled all other blockchains, the fee per transaction remained a tiny $0.000005, according to BSVdata.com. BSVdata.com also showed that over 1 billion transactions were processed between January and August 2023 on the BSV blockchain.

This kind of scalability shows how blockchain could power applications that utilize micropayments at scale, such as would be necessary if artists’ works or generative AI datasets were to be put on chain and monetized.

In theory, it’s a solution that suits both parties: the authors/creators get paid for the use of their work, avoiding court costs and a possible loss that results in their work being used for free anyway, while the AI users/developers could be charged cumulative micropayments rather than having to buy at face value every work they pool from in their datasets, as well as avoiding the courts’ costs and possible losses resulting in even steeper payouts.

However, the success of such a solution would depend on cooperation between developers, content creators, and legal authorities to establish clear standards and regulations at the intersection of blockchain, copyright, and AI. It is also, at this stage, just a theory.

“I think we’re a long way off from having those sorts of models actually created and implemented,” says Petersen. “It could be a long time before that is developed and implemented, or it could be next month. I never cease to be amazed by the swiftness of a progress.”

While we wait for such a solution to be market-ready and wait to see if it gets embraced or rejected by creators and AI developers, the courts will continue to wrestle with the copyright law implications of AI technology until a solution can be implemented that suits everyone, or case law catches up.

In order for artificial intelligence (AI) to work right within the law and thrive in the face of growing challenges, it needs to integrate an enterprise blockchain system that ensures data input quality and ownership—allowing it to keep data safe while also guaranteeing the immutability of data. Check out CoinGeek’s coverage on this emerging tech to learn more why Enterprise blockchain will be the backbone of AI.

Watch: AI is for ‘augmenting’ not replacing the workforce

YouTube video

New to blockchain? Check out CoinGeek’s Blockchain for Beginners section, the ultimate resource guide to learn more about blockchain technology.