Artificial intelligence (AI) firms face a new allegation from the News Media Alliance (NMA) of illegal data scraping to train their large language models (LLMs).
The news industry group submitted a 77-page white paper and other accompanying documents to the United States Copyright Office, raising alarm over the worrying trend of copyright violation by AI chatbots. The NMA argued that the bulk of data used to train AI models stems from copyrighted news publications.
Apart from illegally obtaining data, the NMA said the outputs of AI chatbots bring it into direct competition with news outlets through their “narrative answers to search queries.” NMA noted that the responses from these chatbots eliminate the need for consumers to visit news sources, adversely affecting the revenues of news outlets.
In their submission, the NMA argues that AI developers are netting impressive revenues without bearing any of the risks associated with reporting, which it describes as an anomaly. The report namechecked leading generative AI models like Bing Chat, Bard, Claude, and ChatGPT as offerings breaching the copyrights of news publishers.
“The members of the News/Media Alliance are deeply concerned about this unauthorized and unlawful use of their expressive content by large technology companies,” the paper read. “Such companies do not shoulder the cost or risk of reporting the news or producing creative content but capitalize on that valuable work.”
The NMA points to the soaring valuations of leading AI developers from using unauthorized third-party content. OpenAI and Anthropic have seen their market capitalizations balloon to new highs, with revenue pouring in from a pivot to paid subscriptions despite starting as non-profit research organizations.
Rather than slug it out in court, the NMA says it will be turning to dialogue to resolve the disputes, stating that generative AI offers several benefits to journalism.
“Notably, NMA members stand ready to come to the table and discuss reasonable licensing solutions to facilitate reliable, updated access to trustworthy expressive content, something that will benefit all interested parties and society at large, rather than engage in litigation to protect their rights,” it said.
Another day, another trouble for AI firms
AI firms have been in the thick of things with aggrieved copyright holders heading to courts over claims of violations. Meta (NASDAQ: META), Anthropic AI, and OpenAI have had their fair share of class-action lawsuits, with the companies citing fair use as a shield against the increasing number of legal cases.
Amid the growing concerns for AI and intellectual property, experts have theorized that a convergence of blockchain technology and AI may improve the state of data collection by AI firms. They argue that blockchain can be used to identify AI-generated content while providing traceability for training data for LLMs.
Watch: Artificial intelligence needs blockchain
New to blockchain? Check out CoinGeek’s Blockchain for Beginners section, the ultimate resource guide to learn more about blockchain technology.