As We May Think: Data.world Lays the Traceroutes For A Data Revolution

By

“There will always be plenty of things to compute in the detailed affairs of millions of people doing complicated things.” — V. Bush

data.world’s founders after closing their $14mm A Round: From L to R: Bryon Jacob, CTO; Jon Loyens, CPO; Brett Hurt, CEO; Matt Laessig, COO.

Quiet magic happens when an at-scale platform emerges unexpectedly — things previously thought impossible, or more aptly, things never imagined become commonplace faster than we can get used to them. Think of your first Google search. Your first flush of connection on Facebook. The moment a blue dot first guided you to a red destination. Coding before GitHub. Taxis before Uber. AR before Pokemon Go.

When a platform is built that allows for unexpected adjacencies, magic is unleashed and the world sparkles for a moment or two.

In 1945, well before the advent of personal computers or the Internet, Dr. Vannevar Bush authored a seminal essay positing a new knowledge platform titled As We May Think. Bush, who led US efforts to apply science to the war effort (including the Manhattan Project), outlined a challenge for the world’s scientists and researchers: Now that the war was over, it was time to harness knowledge for the good of all humankind.

Scientists, he wrote, “have been part of a great team. Now, as peace approaches, one asks where they will find objectives worthy of their best.”

His answer was the Memex — an entirely new kind of device designed to capture, classify, organize, and make available the entirety of human knowledge. As he described his proposed solution, Bush grew tantalizingly close to predicting the World Wide Web, digital computing, social networks, machine learning, and many other future developments — many years before they appeared. What he does nail is the link — the associative connection between two points of data or research. His Memex worked by creating “trails” of associated content — data, articles, photographs, or any other knowledge. He then suggests that researchers around the world could share their trails — creating rich associative lines of inquiry that taken together could change the course of human history.

For students of today’s media and technology world, Bush’s Memex remains frustratingly outside our grasp — we know such a system is now possible, but so far, no one’s built a platform capable of producing it. For my part, I believed that early search streams were precursors to a public, Memex-like platform, but the last decade has proven me wrong — private companies certainly have built splendid systems for their own internal leverage of data-driven associations (Facebook, Palantir, and Google come to mind), but so far, there’s not been an open, public platform that might propel humanity into the innovative leaps imagined by Bush.

Which brings me to the launch of data.world. Founded by an impressive set of Austin’s most experienced entrepreneurs, data.world is, well, a platform for data. On its face, that doesn’t sound particularly unique. There are already plenty of (mostly government inspired) platforms for open data. Then again, an index for the World Wide Web didn’t sound particularly new when Google showed up, and as for a list of your close friends — well, let’s just say Facebook wasn’t first, or even second. The genius isn’t in the concept, as we all know — it’s in the execution.

I met with Brett Hurt, CEO of data.world, and Jon Loyens, its Chief Product Officer, about a month ago in San Francisco. Data.world was about to launch, and they gave me a preview of what they’ve executed. And while it may seem a bit wonky, stick with me. If this thing tips, Hurt & co. may well have unleashed a blast of magic into the world.

Data.world sets out to solve a huge problem — one most of us haven’t considered very deeply. The world is awash in data, but nearly all of it is confined by policy, storage constraints, or lack of discoverability. Furthermore, one person’s work on a particular dataset is usually lost once that person’s work is published — a researcher may refine raw government census data into deep insights through a process of hygiene, association with other data sets, and clever scripting, but the results are usually confined to a published paper. The sparkling new data sets sit unused and disconnected on the researcher’s hard drive.

Woe to the next set of researchers who might want to pursue or build upon a similar line of inquiry — chances are they’ll never benefit from the work of their peers. Even if the original researcher publishes his or her new data to the Web, there’s no platform to unify that work with the work of others.

Data may be to the information economy what oil was to the industrial, but without new tools to refine and distribute it, it remains a gooey mess buried in the soil. 80% of work on data is preparing it for publication. Our information economy remains dependent on those with the capital and scale to privatize data’s insights — far from the reach of mere mortals like academics, journalists, analysts and students. But what if somehow we could unite every person with a data itch to scratch, on one single platform?

That’s essentially data.world’s mission. If you’re familiar with how GitHub works, it’s a bit like GitHub, but for data — which is a far larger market. One consistently formatted master repository, with social and sharing built in. Once researchers upload their data, they can annotate it, write scripts to manipulate it, combine it with other data sets, and most importantly, they can share it (they can also have private data sets). Cognizant of the social capital which drives sites like GitHub, LinkedIn, and Quora, data.world has profiles, ratings, and other “social proofs” that encourage researchers to share and add value to each others’ work.

In short, data.world makes data discoverable, interoperable, and social. And that could mean an explosion of data-driven insights is at hand.

But what really gets me excited about data.world is the decision Hurt and his co-founders have made about the essential purpose of the company. Like KickStarter and a growing number of other NewCos, data.world is a public benefit corporation, with a duty to its shareholders that goes beyond profit alone. Sure, the company is a for profit entity, and is backed by an impressive roster of investors. But here’s the company’s purpose statement:

The specific public benefit purposes of the Corporation are to (a) strive to build the most meaningful, collaborative and abundant data resource in the world in order to maximize data’s societal problem-solving utility, (b) advocate publicly for improving the adoption, usability, and proliferation of open data and linked data, and (c) serve as an accessible historical repository of the world’s data.

Ambitious? Yes. But the company is already seeing early signs of success. The site is in an invite-only beta period, but according to the company, applications to join have far exceeded expectations, with applicants as wide ranging as government space agencies to craft beer researchers (who knew?!).

If we are going to solve the world’s biggest problems, we’ll need new approaches to sharing data, insights, and learning. And when it comes to accelerating solutions, as Loyens told me, “Open beats closed.” It’s refreshing, and encouraging, to see a company so dedicated to such lofty principles.

If you liked this story, please recommend it by hitting the awesome green heart below. It really helps us spread the word!

Leave a Reply