Big data is dead, long live big data

This is a guest contribution by Dave Mullen-Muhr of Unbounded Capital. If you would like to submit a contribution please contact Bill Beatty for submission details. Thank you.

Jackson Laskey from Unbounded Capital recently wrote a piece on owning your data with BSV. In it, he describes the problems with the current big data business models and outlines how true data ownership and micropayments enabled via Bitcoin will prove to be a powerful solution. Today, we will take this line of thinking one step further. How do we migrate the current big data scheme towards financially incentivized user owned data?

What might that system look like? What could it enable?

Perhaps counterintuitively, this improved future will likely include even bigger (and better) data!

Big data misnomer

Is Google better described as a web services company or a data and information brokerage? Google’s search algorithm, its email service, its cloud storage, its video platform YouTube, and the remainder of the Google suite of products used by billions of people per day are not typically direct money makers. Most users don’t pay for access to these products. Instead, users interact with these products and, in so doing, create data that Google then sells to the highest bidder.

Google’s product is not web services, it’s data and information. Accordingly, Google has built an array of enticing products to encourage data generation that they can harvest from users, turn into actionable information, and sell through B2B services like AdSense, which makes up the majority of Google’s revenue.

Importantly, it’s this relationship with data buyers and their demand for information that allows Google’s massive sea of raw user data to be refined into something useful, often as targeted advertisements. Google is actually a massive data/information brokerage, their flagship services are popular loss leaders.

Big data whiplash

This shift in framing creates many downstream implications, and discerning the good of big data from the bad of big data is key to navigating the potential future of the internet.

The problem with big data is not that there exists a lot of data. This is an unavoidable reality, and, in and of itself, turning this raw data into useful information is a great thing.

The problem with big data really stems from the business models of the companies dealing in data. A rapidly growing intuition is that the value provided to the average user of Google services is far lower than the value of the data they are generating for Google.

This disparity creates a feeling that users are getting the short end of the stick. Add to this the reality that Google often has incentives that are misaligned with those of their users which leads to poor user experiences and makes the intuited undercompensation even less palatable. If participants in any other exchange felt this dissatisfaction, they would take their business (in this case: their valuable data) elsewhere.

Unfortunately for Google’s users, virtually all of their competition uses the same data scheme, giving them no substantive alternative. If you want to use the most ubiquitous tools on the internet, you need to hand over your valuable data, and opting out has not been a realistic option.

Big babies and bathwater

As Laskey’s piece described, the invention of Bitcoin (today as BSV) allows users to realistically opt out of this system. But just because we can opt out of exchanging through these existing big data brokers doesn’t mean that big data itself will, or should, go away. Some in the crypto currency space have argued for an extreme privacy oriented future that would wholly do away with things like the wide collection of data.

This would be a costly overreaction to the downsides of the current data scheme without a pragmatic recognition of its upsides. With bitcoin enabling an own-your-data internet paradigm, we can instead have data exchanged and used in a manner that better compensates users. Contrary to the solitary vision of extreme privacy, this rebalancing of users’ cost/benefit on data creation and exchange will incentivize even bigger data with a myriad of use cases beyond today’s creepily pinpointed advertising.

The eerie examples of advertisers anticipating your exact desires at the very moment you do yourself have become so commonplace they are cliché. But in the spirit of avoiding the anti-data overreaction, it’s important to recognize that receiving ads for things that you might want rather than things that you absolutely don’t is certainly an improved user experience.

For better or worse, how does today’s big data enable this incredible predictive ability?

– Step 1: Google’s popular services, acting as data harvesters, collect then privately and centrally organize a vast range of data points. Exact numbers are hard to come by but it’s safe to suggest that Google has millions of data points per user.

– Step 2: Google acts as an analyzer to comb through and refine this raw data into actionable information with the help of proprietary algorithms and machine learning.

– Step 3: Google, via its AdSense service, brokers the now useful information to companies willing to pay, bringing in something to the tune of 11 to 12 figures USD revenue per year.

Google benefits from monopoly access to this entire pipeline, from raw data to proprietary algorithms, and has a favorable negotiation position on price. Consider if internet users used bitcoin to opt out of the service-for-data exchange and that raw data was no longer owned exclusively by Google. Without monopoly access they would not be able to charge their current rates and would instead have to compete on analysis.

With a competitive market for analysis, Google’s margin on that front would likely come down as well. In a bitcoin enabled own-your-data paradigm internet this data-to-information pipeline could start to move away from Google’s monopoly control and more closely resemble a free market that allows end users both greater upside and choice in how their data is used.

Big data markets

An especially noticeable part of this user upside will be in the form of profit. How much is your individual data worth? Because its value is collected, processed, and sold behind the curtain of Google’s monopoly, it’s hard to know. However, with the ability to opt out of opaque service-for-data contracts with Google and opt in to direct money-for-data exchanges through bitcoin we will soon find out. Price discovery on the value of the user owned raw data will likely lower the cost to advertisers who wish to purchase this information as well as incentivize users to generate and sell more quality data.

Additionally, through the use of on-chain cryptography, advertisers could reliably reach target markets by using derivatives of this data without jeopardizing the privacy of the individual. If you are currently concerned with Google’s data practices, you may have moved from using Google search to a privacy focused search engine like DuckDuckGo.

But what if your upside for using Google search wasn’t just the occasionally welcome targeted ad but was instead a monthly income stream? Would moving to DuckDuckGo be as attractive? With this incentive in place, users would feel inclined to create more data rather than less.

What about data more personal than shopping habits that users might prefer to keep private? Medical data comes to mind. With the ability to encrypt data on-chain, accessing that data is at the discretion of the user who both legally and practically owns it. Who receives access to this data, for how long, for what purposes, and at what price are all under the users’ control.

A user may not want Johnson & Johnson publicly targeting ads at them to treat an embarrassing illness, but they may want to privately sell the same personal medical information to a company seeking to cure it by leveraging data analysis only possible with a massive sample size. We know that this data in aggregate is hugely valuable, now an even more valuable commodity than oil. Now, with free markets for data enabled by bitcoin, we will discover the value of our individual data and, as a result, be able to make better informed decisions on what we want to do with it.

Another positive aspect of this user upside will be in the form of improved outcomes from the data analysis. Continuing the medical research example, data that is incentivized to be created and shared (sold) will be more numerous and of higher quality. Thanks to its location stored on bitcoin’s single public database, it will also be easily aggregated and interoperable with any service that is willing to pay for it. In today’s scheme, maybe Google owns valuable data on user health based on search history for medications.

Stored in another database owned by another company, say Apple, is valuable data regarding user health via food purchases with Apple Pay. Stored in a third database is valuable user health data from a biometrics company like Fitbit. If a medical research company wanted to analyze this array of data they would need to purchase and analyze data from disparate companies, stored on distinct servers, and possibly stored in different formats. While this is technically possible, the added friction from the data’s decentralization makes its analysis less likely to be viable and profitable.

To help increase their monopoly position and combat this decentralized inefficiency, giants like Google regularly increase their data monopoly position by buying this useful data for themselves. Just last month, Google bought biometric data collector Fitbit. Although Google claims they will not be selling user data for ads, users of the Fitbit platform who clicked agree on a complicated terms of service agreement without reading it, now have their data in the hands of another company. While this may be a good thing for the hypothetical medical researcher who now has one less company to negotiate with, this is a negative for the privacy concerned user. Now that Google owns Fitbit, what they do with their health data is effectively out of their control.

Now, shift to the own-your-data paradigm internet on bitcoin. All of this data from search history to purchase history to biometrics is still generated by users, but now kept private, optionally decrypted, and put up for sale if the price is right.

The data is now stored in one place, in the same format, and natively interoperable making analysis simple. The data is also priced on open markets where users can be confident they are being properly compensated for their information. And if they feel that they are not, they can deny access to any company they want. Users will also be solicited by companies looking to offer the best analysis of raw data (maybe including Google as one competing firm).

These companies will compete to serve as value-ad middlemen who can refine the raw data into actionable intelligence to then sell to buyers of information. Competing on price as well as analysis, this will result in higher quality and cheaper information for the medical researcher which could result in more diseases being cured.

What else could be improved from the aggregation of more data and the innovation of better methods of analysis? One often talked about technology that requires a lot of raw data is self-driving cars. Imagine necessary and valuable information like vehicle location, destination, speed, priority, gas/charge, etc. all being a monetizable income stream for the data provider (car owner). Once there is a financial incentive to provide this data users will, and car companies with self-driving software can real time transform the data into the necessary information for their software. When considering the current roundabout methods of incentivizing data generation in today’s paradigm, it begs the question of if something like a sustainable self-driving car ecosystem is even possible without a shift in the big data ownership paradigm enabled by Bitcoin?

Big data future

The concerns with today’s big data paradigm are warranted. It’s easy to see how the current scheme is not optimal for most participants involved. However, the solution is not to do away with big data and retreat to extreme solitude. We ought to want big data to get even bigger, so long as it is user owned. Now that bitcoin has made this a possibility, we can incentivize the efficient creation of more data and information that will benefit virtually everyone involved in the process.

It’s worth noting that this is not a death sentence for the current primary beneficiaries of the big data scheme. Companies like Google will be well positioned to compete on aggregation and analysis, but they will need to continue to innovate and properly compensate their users (now business partners) if they want to maintain that position. The user experience needs to improve over time rather than regress. With a data and information economy of better-behaved companies, users will want to provide data and be less suspicious of how it is being used behind the scenes.

The future of the Bitcoin enabled internet will provoke profound changes to today’s common business practices. But big data will not die. With Bitcoin, big data will grow bigger and work better for everyone.

Dave Mullen-Muhr is an investor, entrepreneur, writer, and ever-curious learner. A principal at Unbounded Capital, Dave is focused on leveraging Bitcoin to integrate the wisdom of the past with the technology of the present to innovate the future. 

New to blockchain? Check out CoinGeek’s Blockchain for Beginners section, the ultimate resource guide to learn more about blockchain technology.