Working moments during World Economic Forum Annual Meeting

Even Davos is calling for blockchain networks to store AI training data

The Davos crowd and its partners in the mainstream media have finally acknowledged it: blockchain and artificial intelligence (AI) technology could be a great combination. This week, CNBC even said it could be the long-awaited “killer use case” for blockchain, addressing concerns over “biases” and “misinformation.”

This isn’t new to readers from the blockchain world (especially BSV blockchain), who’ve been touting it as a solution to questions that have mounted as generative AI use becomes more widespread: How do we know (and trust) the AI is giving us true, reliable output?

There are definitely differences of opinion over what counts as “bias” and “misinformation,” especially where organizations like the World Economic Forum (WEF) are concerned. Storing training data on a verifiable, timestamped, public ledger could at least provide a neutral record of which sources were being used—and to what extent.

A panel at the WEF’s Davos conference last week also mentioned AI “hallucinations.” These are clearly wrong answers that appear at times when prompting AI chatbots to respond on a topic you know well, e.g., yourself or your own area of expertise. Like GPS navigation systems that lead hapless drivers into rivers or into dangerous areas, AI hallucinations could result in negative downstream effects if used to make critical decisions. The panel suggested it would be easier to “roll back” an AI found to be hallucinating if it’s easier to identify/locate the source data causing the problem.

Tracking data integrity, ownership

In a CoinGeek article late last year, Charles Miller asked ChatGPT itself if blockchain would be beneficial to verifying AI training data. The popular AI chatbot answered that it could indeed, for transparency and data integrity, though there would likely be issues with scalability and computational demands.

Both concerns can be addressed with a basic two-word reply: Use BSV. BSV, which follows the rules of Satoshi Nakamoto’s original Bitcoin protocol released in 2009, has no bounds to scaling, and the power of the network increases as more transaction processing power joins the network.

Some, including George Siosi Samuels, have noted that tokenizing AI training data and storing it on a blockchain anyone in the world can access would save scarce resources. AI companies, even large ones, face massive costs in applying computational power, data storage and management, and energy consumption.

Ownership of a training dataset’s content has also become an issue, with the New York Times suing OpenAI and Microsoft (NASDAQ: MSFT) for copyright infringement
in December 2023. Both these companies’ generative AI networks, ChatGPT and CoPilot, are known to lean heavily on widely read mainstream media operations like NYT for source material. There have been similar rumblings in the visual art and music worlds.

Presumably, the Times would prefer a per-use income stream for its content than for AI models to ignore it altogether. Storing training data as on-chain digital tokens would make it far easier to identify their original owners and eventually automate the process of paying them. Early BSV project Codugh used a similar (non-tokenized) model for software developers, offering micropayments in real time whenever their code is used. Applied to data of all kinds, blockchain-based AI training data could create new micropayment income streams for academics, amateur and professional researchers, hobbyists, and even ordinary individuals.

Do AI companies actually want this kind of transparency?

From both a technological and public trust point of view, having timestamped AI training data recorded on a blockchain makes perfect sense. One question is, however, do those in charge of developing AI for public consumption actually want this kind of transparency?

When asked about the specific source material for training data and the weight it gives them, ChatGPT responded:

“As an AI developed by OpenAI, I don’t have access to or knowledge of specific sources used in my training data. OpenAI, the organization behind ChatGPT, has not publicly disclosed the specifics of individual datasets or the weight given to certain sources in the training process.”

It added with a reassurance that its sources were diverse, broad-scope, and constantly updated while maintaining “ethical and responsible” standards, with no manual intervention in the weight given to specific sources.

Microsoft’s Copilot AI gave a similar answer: “Microsoft has not publicly disclosed the sources it uses as training data for Copilot, nor the weight it gives to certain sources.” It also gave further explanations as to why its output remains reliable despite this non-disclosure.

These explanations might satisfy some, but really, it isn’t enough for everyone to put 100% trust in AI output. Blockchain may very well add transparency and data verification to the AI development process, but there’s still a big question mark over whether the large corporations working on AI networks actually want it.

Hopefully, the notion that blockchain data is the world’s most trustworthy will catch on among mainstream audiences, and in the future, there will be louder demands for its use.

In order for artificial intelligence (AI) to work right within the law and thrive in the face of growing challenges, it needs to integrate an enterprise blockchain system that ensures data input quality and ownership—allowing it to keep data safe while also guaranteeing the immutability of data. Check out CoinGeek’s coverage on this emerging tech to learn more why Enterprise blockchain will be the backbone of AI.

Watch: Blockchain can bring accountability to AI

YouTube video

New to blockchain? Check out CoinGeek’s Blockchain for Beginners section, the ultimate resource guide to learn more about blockchain technology.