Getting your Trinity Audio player ready...
|
2023 kicks off with the promise of artificial intelligence (AI) bolstered by the successes of ChatGPT for generating human-like conversations, generating art and music, and even writing working code for developers. For the first time, developers are doubting whether or not their jobs may be eventually replaced by AI. While this is not something that this author is worried about, the scope of what is possible with AI has grown by leaps and bounds in the last year, and for the first time, many questions about the economic feasibility of AI are being contemplated.
While AI has sat squarely within the realm of research folks for the better part of the last decade, 2022 has seen the release of practical consumer-level AI tools such as StableDiffusion, MidJourney, and ChatGPT, which allow for some powerful applications to be publicly realized—and seemingly for free1.
This, in turn, has brought to the surface some of the more mundane but critical questions surrounding the future economy powered by AI.
Data is everything
Firstly, if you are new to AI, the one thing you need to understand is that the most important thing about the sophistication of an AI model is that it depends solely on the data set it was trained on.
Nowhere else is it more true the age-old adage of “garbage in; garbage out.” Training an AI to be successful at simulating intelligence or creativity is all about how large/clean a data set you can give the ‘black box’ to learn on, and thanks to the last 20 years of the internet and the advent of the ‘oversharing generation’ (and folks such as Mark Zuckerberg who has trained a generation of humans that it is OK to give away your data for free) this data is readily available for anyone with the resources to collect, filter and curate.
Personal information privacy laws were late to the scene, and only in recent years have people become more conscious of what they are giving away for free just for the ability to use search engines, social media, and content aggregation portals of the internet.
Still, it is the minority of people who actually refuse to use a social media platform or a chat app because they disagree with their data privacy and ownership terms and conditions (assuming they even read the policy at all, instead of just blindly clicking “Agree” to the shrink wrap agreement). This is a dream come true for the AI industry, which has found a ‘free’ resource that only requires proper harvesting. Even if not free, the sad truth is that most social media and content platforms clearly have it in their terms that anything that you do or upload on their platform is owned by them, so if any AI researcher approaches and wishes to purchase a dataset, it would be Facebook, WhatsApp, Flickr or YouTube that would be earning money for the data sales, not the people who actually contributed the data/content/images.
This structure is a carryover from the ‘intermediary’ model of the internet. Similar to how your ISP has historically been your service provider for the internet, they have also been your agent for anything you want to do on the internet, such as hosting a website, running an app server, and keeping your email. This intermediary model works when the service provider is solely providing a service for you, the same way the electrical utility company provides power to your home or the gas company provides gas for your stove and water heaters.
However, with social media platforms, and the internet in general, service providers crossed the Rubicon when they started to steal your data for their own profit motives. At first, they did this just to improve their service platforms, for instance, to identify trends, usage patterns, or preferences that would help them customize your experience on their platform. But soon, they realized that advertisers and others would pay millions of dollars for access to this treasure trove of collected data —and the current monetization model of the internet was born.
On the one hand, without this initial monetization model, we wouldn’t have the internet we have today. It is responsible for the explosive growth of social media platforms in general, whose sole model is to capture as many users as possible while charging nothing for their services. The hidden cost, of course, is that your data, your activity, and your ‘meta’ information are farmed and used however they please.
How is this pertinent to AI? Well, because it is going to significantly increase this market demand for people’s metadata and content. Up to this point, the only buyers of data sets were advertisers and occasional criminal elements looking for potential targets for con games and identity theft, but with AI comes the single largest potential demand for data. Recall the quality of an AI model is a direct function of the quality and size of the training dataset. So if you think that AI is going to be increasingly used because of its potentially limitless utility, then the demand for (stealing) your data will skyrocket as well.
And you won’t see a cent of this.
Intellectual Property
The other side of the coin is that existing laws, such as copyright laws and intellectual property (IP) rights, may have already been violated en masse by AI. This is a new issue—because the data is being used to generate potentially derivative works, then any copyrighted image or work in a dataset used by an AI to train its models could be in gross violation of laws that protect the rights of the IP owners.
A lawsuit in California levied against Stability.ai and MidJourney for their use of the LAION-5B image data set in their StableDiffusion model, which generates art, could set the precedent on what use can be considered fair use of copyrighted works, which could drastically change how AI models can acquire their data, depending on the results of the lawsuit. The providers of the LAION data set clearly don’t profess to assert any restrictions over its use, but it also says that the images that are contained are under their own copyright, thus not conveying any implied license to use it outside of fair use.
If the use of such copyrighted datasets to produce AI output is considered derivative works, then the AI model providers are clearly in violation of copyright and could be liable for massive damage claims. However, if the produced work would be considered fair use or transformative—namely, the work has changed sufficiently in use or context from the source, then it would be in the clear. Unfortunately, it is the opinion of this author that the current use of StableDiffusion to come up with art using a trained AI model will likely be counted as derivative works, but I’m not a lawyer.
In any case, there seems to be a growing need for technology that can meter, report, detect and audit when a copyrighted work is used in a dataset and, importantly, allow for the licensing fee payment to the copyright owner in an automated way. The LAION data set is BILLIONS of images and hundreds of terabytes in size, and it would be completely impractical to track down and pay all the copyright holders via traditional means. What is needed is a way for image owners to embed some licensing conditions and terms2 (metadata) into the images so that when they are posted online, anyone copying the image and using it could programmatically pay the licensing use fee to the content owner, in a way in which their payment can be provable and verifiable.
This would solve the copyright problem and, more importantly, allow content creators to earn passive income from their uploaded content and cut out the (ineffectual) middleman of social media platforms in the upcoming Data-centric economy.
And this is where Bitcoin SV (BSV) and bitcoin can contribute.
/WallStreetTechnologist
If you want to know more, tune into my podcasts, and follow me @digitsu on Twitter.
[1] In truth, it is not free. It is just using the same monetization model of the current internet, which is they steal in exchange for your use of their service.
[2] If you are a fan of my column and works, you will likely recognize that lightbulb that just popped above your head as “Sounds like something that is easily done on BSV,” and you would be right.
In order for artificial intelligence (AI) to work right within the law and thrive in the face of growing challenges, it needs to integrate an enterprise blockchain system that ensures data input quality and ownership—allowing it to keep data safe while also guaranteeing the immutability of data. Check out CoinGeek’s coverage on this emerging tech to learn more why Enterprise blockchain will be the backbone of AI.
Watch: Blockchain for Data Integrity & Business Process Management