The provenance of information

This article builds on the ideas and concepts from the article ‘Saving Research’, published by Craig Wright. 

One of the biggest issues holding back the data driven science today is the ownership and monetization of the data upon which that science is based. Scientists are generating more information more rapidly than ever before; however, it is often relocked away in databases localised to the project or university that generated it.

Schemes exist to enable the sharing and distribution of this data however they are isolated and all too often scientists are forced to produce brand new datasets comprised of almost identical information at great expense.

In addition, scientists are faced with the dilemma that when information is shared, it is duplicated, potentially losing the original data’s provenance, and creating the possibility of disparate versions of the same dataset being created.

As a result, scientists working on research in similar areas often invest money, time and effort gathering replicate results already held in other datasets around the world. Not only is this inefficient but it can lead to anomalous results as experimental processes vary between each project. This can even create an environment where scientists are encouraged to find results that fit an existing bias within a field rather than to base research on limited datasets that disagree with common outcomes.

A famous example of this is the ‘Ego Depletion’ research that showed that people have a certain amount of willpower, which can be depleted over time. The experiment was repeated many times over many years, leading to the publishing of books and the creation of ‘self-help’ style programs to improve people’s willpower. Eventually it was revealed that people had been repeating the experiments when the desired results weren’t found, and publishing research using only the datasets that proved the theory, believing datasets that didn’t show the effect of food on willpower to be erroneous.

Had researchers recorded all their data on an open and public system, statistical researchers from multiple institutions could have analyses a superset of data gathered from multiple experiments and shown that the effect was a statistical error rather than a real aspect of human behaviour, and prevented thousands of people from being misled into buying books and medicines designed to exploit a personality trait that did not exist.

With Bitcoin, we now have the option to create and store datasets in such a way that their provenance can be guaranteed, but also shared and monetized without researchers having to identify themselves to each other.

Using known techniques, encrypted research data can be recorded on the blockchain immutably and for relatively low cost compared to current techniques requiring dedicated staff, data centers and more. Researchers can push raw data directly to the Bitcoin network, retrieving it from the blockchain as needed for analysis.

Using a new, and as-yet unpublished technique developed by nChain called dealer-less blinded ECDSA threshold keys, researchers can now establish links between institutions, and offer direct access to the same raw data being used in their own research in such a way that they cannot know which institution or organisation is accessing the information. When the raw data generated from every iteration of their experiment is open to peer review from any institution, biased results such as were seen in the ‘Ego Depletion’ debacle are much less likely to occur, and research that produces surprise outcomes can be quickly checked by independent groups or bolstered by accessing similar data from elsewhere.

And because this is Bitcoin, every byte of information can be given a value and monetized, incentivising research institutions to apply themselves to producing the most valuable possible datasets.

By turning data into information and giving that information value, we solve a puzzle that has eluded academics and corporate researchers for decades, which is how to ensure that results collected from their experiments can be used by third parties in a provably unbiased way, and where researchers globally have not just access to those datasets, but the ability to augment them through their own contributions, helping to ensure that biases can be overcome through diversity.

Blind keys allow much more than the sharing of research. The techniques being developed are applicable to any situation where a registry of contributors must be maintained without knowledge of who contributed or accessed any given piece of information. This can be used in events such as elections, secret ballots, gaming and more. Additionally, because these techniques build on top of a robust and scaled Bitcoin network and require no changes to the existing protocol to be implemented, as soon as the software libraries that define them are ready, they can be deployed in the real world.

Exciting times indeed!

New to Bitcoin? Check out CoinGeek’s Bitcoin for Beginners section, the ultimate resource guide to learn more about Bitcoin—as originally envisioned by Satoshi Nakamoto—and blockchain.