“We started thinking about what blockchain means for capital markets, and particularly how important high-quality reference data was going to be for feeding into smart contracts that automatically execute, then trigger a whole cascade of other automated execution. That data going in has to be correct.” –Will Janensch, Co-founder of TruSet

In our high time of machine learning, computational analysis, and now decentralized databases, data, as many have said, has become the new oil. And it’s expensive. Traditional data vendors in the financial services industry charge institutions up to nine times for different business activities around the same set of data. These critical datasets aren’t necessarily accurate either. The community often ends up correcting and cleaning this data — which is already publicly available on the EDGAR website — in their own back offices and on their own dime.

The emerging blockchain token ecosystem is also starting to suffer a very similar data dilemma. Critical data — about project teams, token mechanics, token distribution — is unstructured and scattered across white papers, blog posts, websites, backchannels, even exchanges, preventing the kinds of informed analysis that would facilitate wider adoption and innovation.

TruSet, a team of market data veterans and game theoretic software engineers, has made it its mission to liberate reference data so communities in information-intensive industries can self-organize and publish, validate, and consume data at a much faster pace, and at significantly lower costs. I sat down with founders Tim Rice and Will Janensch to get their take on the past, present, and future of data management. They recalled war stories from their past lives as data providers, explored the irony of missing data in the blockchain space, and outlined their vision for helping the community create its own golden record.

What is reference data exactly?

Tim: Reference data is the core underlying facts that are associated with a project and memorialized when it goes to market. They can be as simple as the name of the project, the people involved, where the company is from, and as complex as what jurisdictions and regulatory entities a company is under.

In the crypto space, reference data concerns underlying token mechanics, token sale mechanics, token distributions, vesting schedules, lockup schedules, as well as information about project team members and legal entities. Reference data is the established facts that people will use if they’re investing in or looking to be a part of the project. The data gives them a frame of reference as to what are the known truths about a particular project.

What are the problems and opportunities in the reference data space?

Will: There’s a long term problem in traditional capital markets around the reference data for financial securities, particularly bonds. The data is needed in machine-readable form, but the quality of the data that vendors provide today is quite low. And so knowing that, every customer of that data spends significant money cleansing the data for themselves to get to a copy they would consider a trusted, golden record. But every customer across the industry repeats that same function and nobody regards getting to clean data in their back office as strategically differentiating.

An opportunity exists: if you can align incentives in the right way to mutualize that cleansing effort across the industry, then collectively, the entire industry can do it once and come up with a single, trusted, accurate set of data around financial instruments that the entire industry can use. That’s what TruSet is building. We’re building marketplaces that enable business communities to collectively create and maintain trusted, accurate business-critical reference data.

The innovation around blockchain enables separate entities to come together on a platform to collectively do this work, and the tokenized capability of blockchain enables us to create the right financial incentives so that those entities who are contributing to the quality of the data get paid for their contributions and are incentivized to actually do the work.

How would you characterize the state of data in the crypto space?

Will: What we started to see last year with the proliferation of tokens was that the information that exists in smart contracts about the tokens was trusted and accurate, but there’s a lot of associated data about projects that’s not in the blockchain, and therefore harder to trust. There was no consistency in how data was being reported, how much was being reported, in what venues, or in what formats. If you think about the needs of an institutional investor who’s evaluating a token, making buy-sell-hold decisions, or wanting to run typical financial services like risk analytics and accounting functions, you need more information than what is in the blockchain.

So it’s ironic: at its heart, blockchain is about creating a pristine, shared set of accurate data. But the ecosystem still needs data around tokens and projects that people can both trust and use. We’ve realized that this new blockchain-based industry is replicating the same problem that exists in the legacy market, where institutional investors, purchasers, and traders of these tokens need machine-readable data that describes what these tokens are and how they behave so they can run analytics against them.

“It’s ironic: at its heart, blockchain is about creating a pristine, shared set of accurate data. But the ecosystem still needs data around tokens and projects that’s not in the blockchain.”

The origin of that information is currently in unstructured sources such as white papers, blog posts, and websites. In order to get from those unstructured sources into a machine-readable data set, the TruSet platform is bridging the gap between on-chain data and associated “off-chain” data by using the same innovation of the blockchain to create and validate all that missing data that isn’t in smart contracts already. The industry can then use this source of truth to run all sorts of analytics and create derivative products. Users can trust that the underlying data layer is the accurate truth about what these instruments are.

What was the turning point that inspired you two to start TruSet?

Will: I used to work in strategy, and Tim ran pricing and reference data business, and in these roles, we got a chance to work together a few times, mostly on big M&A deals. There was one potential partnership with a very large asset manager who wanted to take the reference data that Tim sold and do something that we thought was actually really exciting and innovative.

But it was a somewhat contentious meeting. This potential partner explained to us how much effort they had to put in — with our data and with other vendors’ data — in order to get it to a level of quality so that they could extend the data and provide analytics on top that made the data much more powerful. They really raked us over the coals a bit for the pain of cleansing this data because of everything that could be unlocked once it was high-quality.

I think that meeting stuck in both our minds over the years. We started thinking about what blockchains mean for the world of capital markets, and particularly how important high-quality reference data was going to be for feeding into things like smart contract-based derivatives that automatically execute, then trigger a whole cascade of other automated execution. That data going in has to be correct.

So here we are, now confronted with an opportunity to make this data even more powerful. But how do we actually now, once and for all, solve this data accuracy challenge to enable this very exciting future? Thinking about the way in which this one potential partner described needing to cleanse our data, we realized we had the opportunity to create a model that actually solved this problem across the industry.

Tim: There are a huge number of pitfalls when you don’t know everything about all the data you’re distributing to people. I have a war story from the credit crisis. Part of my organization was putting end-of-day prices on fixed income securities, particularly in the mortgage-backed security area, and I got a phone call from the CEO saying he got a call from a very senior person at a large asset manager, threatening criminal charges and civil activities because we were so far off the market with prices we were putting on these bonds. You really need to understand everything you’re doing with your organization’s data and how that’s going to impact your clients.

But even more than data accuracy, one of the core things that Will and I believed when creating TruSet was to liberate the licensing and restrictions around the use of a data set that is ultimately cleansed and sourced by the community. As a traditional data vendor, we would charge you for any one of nine different business activities for the same set of data. You would pay us up to nine times to use that data in various parts of your organization.

Our main view is that maintaining reference data should be a crowdsourced effort. Not only will data management costs come down significantly, but the community will be paid for the work that it’s already doing. There will be a fee to participate in the TruSet marketplace, but you will have data at your open disposal. License requirements and rights will be lifted. You can build products and do what you want. An open source community — that’s how it should be.

How will you help legacy institutions adopt this new technology?

Tim: We have a firm understanding of what the workflows and risk profiles are within these banks. We’re dragging out certain components of building the new data set on the new technology infrastructure, so that legacy institutions can sandbox and test TruSet in areas where we believe they’ll need timely and correct data.

In that instance where they’re first trading and settling a bond, for example, the full reference data framework is yet to pervade the organization. These are small areas where they can test the speed and the accuracy of TruSet’s data marketplace against what they currently have from the incumbents. As they become more confident in the dataset, institutions can start to build up an inventory of data. We’re finding ways for them to grab bite size chunks — isolated, factual datasets — and test them while they figure out how to embrace the broader blockchain technology stack across their organization.

How has working in the blockchain space changed your thinking about capital markets?

Tim: Individuals, small institutions, and private groups brought fairly substantial capitalizations to the crypto marketplace, but to get to the next level, we need to bring the institutions on board, both as investors and as those who embrace the technology. I had always thought that TruSet and fixed income didn’t necessarily need to be the legacy financial institutions embracing, publishing, and validating. It could be someone anywhere in the world who knew data and saw data as a way to make money. They could start up a small business out of wherever they live, establish credibility, and partake in the ecosystem.

With regulators, institutions, and everybody else getting involved, crypto’s going to become fairly organized. But hopefully along the way we can still be disruptive, lose some of that centralization and friction, and get costs out our businesses.

Will: Since the financial crisis, banks and financial services firms generally have been looking for ways to significantly reduce their cost base, particularly around things that aren’t differentiating to them as a way to improve their profitability and their return on equity. Blockchain creates an opportunity to mutualize non-differentiated cost functions across the industry so banks can essentially reduce the amount they’re spending. TruSet is taking the repeated costs that exists in the back office at banks — that’s not differentiating them, not helping them make money over their competitors — and providing a platform where they can come together to share those costs and increase their efficiency and profits.

What makes TruSet different from the other data marketplaces and libraries emerging in the cryptosphere?

Tim: What we see in the current marketplace around crypto assets is a lack of holistic understanding and experience about what is actually needed to drive an ecosystem around data. The view right now is that data is going to be the new oil. It’s going to drive what goes on. In order to create good, solid data, you need to create excellent metadata. You need to have a deep understanding of creating data with a semantic ontology around what certain things mean. You need to use some of the legacy heuristics in how you create data and build a model for people to use going forward, rather than just create another paragraph in which the facts are buried.

“In order to create good, solid data, you need to have a deep understanding of creating data with a semantic ontology around what certain things mean, so you’re building a model for people to use going forward, rather than just creating another paragraph in which the facts are buried.”

One of our strengths is understanding how to build logical data models and how to pull apart the facts and generate something that a machine can actually interrogate without having to deploy a massive amount of technology against.

There are already situations in the crypto data ecosystem where someone will just sit there and think, “Oh, South Korean exchanges are showing a 20% premium to the rest of the market, so I’ll just take that data out one day and tank the markets.” That error should have been caught way before had it that magnitude of impact.

Being at ConsenSys also offers a broad perspective on how the Ethereum ecosystem is evolving and developing. Working with the ConsenSys Legal team and The Brooklyn Project has helped us strategize about what global disclosures will look like and understand the shape of the global marketplace — that less than 30% of the exchanges exist in the US, for example. There’s a localized component to developing for small markets, but we also have a global vision for how reference data management is going to work.

Will: The TruSet team combined has over 50 years experience in market data and reference data, which is not typical for a startup in this space — in any subject area, much less reference data. Tim ran a $300 million business with one of the leading companies in the world focused on reference data. He has the credibility with our core customer base. He has sold major infrastructure data sets to them in the past, and he knows what their needs are. He can sit across the table from a C-level executive at a bank and talk to them about their data needs in a way that I think few other startups could emulate.

How do you reconcile your past lives as data providers with this new crowdsourced data solution?

Tim: The initial disclosures that regulators required in traditional financial capital markets are all about disclosure to investors and for the well-being of the investors. What we did at these legacy data providers was take this free, publicly available information — which was published to the EDGAR website — and put it in some data model. I created a 300 million dollar business out of it.

That was great in the day, but that put a lot of friction in things. We’d sell that data to multiple institutions in an ecosystem where they relied on one another to do something. I think in order to enable blockchain projects, that whole mindset needs to be shifted. People want to put their money in. Not just the hedge funds. People want to be part of this thing. The data needs to be liberated.

“The whole mindset needs to be shifted … The data needs to be liberated.”

Will: This is a better paradigm for the customers. They are looking for ways to change their cost basis and mutualize non-differentiating work. TruSet enables them to do that. The traditional vendors have had decades to try to get this right. They still serve a product that customers find painful to use. If we can come up with a better way that is both lower-cost and better-quality, I think that’s right.

Also, for all this time, the vendors have been restricting the way in which this data gets used. That increases costs for the customers, but it also reduces innovation. We think we can work with the ecosystem to create a shared, unrestricted data set, out of which will come a lot of innovation. Once this data exists in a machine-readable and trusted form, there’s going to be a ton of value created around that data that makes markets more transparent and helps markets accelerate. It’s going to be a real benefit for financial services, and for the customers of this data.

If you can give customers a better service at a lower price, and that enables innovation better than the current industry can, then, you know, you should do it.

What’s the TruSet team excited about for 2018?

Will: Earlier this year, we ran an Alpha with our early version of the product that had actual users working on token data collectively, both publishing and validating data. It was a way for us to understand user behavior on our platform. That was really exciting, but what I’m excited about most is launching an actual platform in the market on which we are creating real data and proving that the data that’s getting published and validated by the community is high-quality. Then we can start sharing that data with the market so they can start using it in how they think about investing and for participating in this new token-based financial services economy. I think trusted, accurate data is the key to unlocking this burgeoning crypto economy for institutional investors. It should help the entire ecosystem take off.

In November, TruSet is launching a Beta program that addresses the key learnings from the Alpha and brings TruSet closer to the production launch of our Token Data Platform. The Beta will enable the community to generate a rich and machine-readable token dataset for all of the top tokens by market cap that will serve as an ecosystem-wide foundation for facts about tokens and token projects. We’re excited to watch the community test drive our Beta platform and work together to establish a foundational dataset for Web3.

Tim: I’m excited to get something out to the marketplace, but I’m more excited to build a big community who can get involved with this project, do some work, receive rewards, and support the ecosystem. We’re ready to engage with hundreds and hopefully thousands of people who want to work on building this with us.