Blockchains for Big Data

Posted Under: Articles
3 years ago

From Data Audit Trails to a Universal Data Exchange

by Trent McConaghy

Big Data is Big Business

Big data arose in the early and mid 2000s to meet internet-scale computation needs: ZooKeeper at Yahoo, BigTable and MapReduce at Google, Cassandra at Facebook; and so on. Then came open source projects like Hadoop File System (HDFS), Hadoop MapReduce, Cassandra, and more.

By the late 2000s and early 2010s, startups like MongoDB, Cloudera, and DataStax had created businesses to transform the open source successes into enterprise-grade offerings.

Now, big data technology is quietly transforming every enterprise backend on the planet. For example, in many places “data warehouses” of relational databases are getting replaced by “data lakes” running big data software. More than $100B annually is going towards big iron compute clusters, the software on top, and the services to keep it all running smoothly.

Big Data Challenges

But big data has its challenges, which include control, data authenticity and monetization.

First, who controls the infrastructure when there are multiple actors involved? For example:

If you’re a multinational enterprise, how do you share data around the planet? If you have multiple copies, how do you know which one is the most up-to-date? How do you reconcile a different system administrator role at each regional office?
If you’re an industry consortium, how to share control of the ecosystem infrastructure among the companies in your consortium? This is especially hard if those companies are competitors!
Why can’t there be data just “out there” as a single shared source of truth that no one on the planet owns or controls, per se? Rather, data would be a public utility like electricity or the internet itself.

Second, how well can you trust the data? For example:

If you generate the data yourself, how do you prove you were the originator? If you get data from others, how do you know it was truly them?
What about crashes and malicious behavior? Machines crash, glitches happen, bits flip. Zombie IoT toasters might be inputting garbage. So after all your fancy Spark calculations, is it still just garbage out?

Finally, how do you monetize the data? For example:

How do you transfer the rights of the data, or buy rights from others?
There’s a long standing dream of a universal data marketplace; how?

Full article by Trent is here.


# # # #

Trent McConagh has been raised in a pig farm in Canada, “hacking away on cold winter nights. 3D CAD tool, wordprocessor, dozens of games.” He holds a PhD in EE from KU Leuven, Belgium. Awarded #1 thesis worldwide in the field.

- Click to Read and Post Comments Total Comments: 0
  • No comments posted

Post Comment