We are never short of numbers in this age of big data — in fact we are overrun. What we are always short of is good questions that transform that wealth of numbers into actionable insights. Results that could help us make better decisions. I attended a genetics conference recently (believe it or not, I am not a programmer by profession, but rather a geneticist), and I had the chance to debate and agree with colleagues about how little yield we are getting from the next-gen genome sequencing studies: we are generating thousands of databases of gene expression with the resolution of each individual cell in an organism, yet only 5% or less of the research community can do something meaningful with them. We lack the tools to visualize and analyze that data, making it insightful for the other 95%.

In a similar fashion, blockchains are generating immense amounts of data and statistics every day, but I think most of the time we are not benefitting from actionable insights due to the lack of visualization and data processing tools. This is why I believe that blockchain monitors/explorers have a pivotal role in the cryptosphere — they don’t just provide a handy way to check if transactions are confirmed, but they also deliver answers to those who ask good questions. They provide awareness and visibility to blockchain projects; developers can learn if the network is behaving as they were expecting; issues needing correction can be detected long before they are problematic; mining pools get accounted, so miners get the safety that they won’t be easily cheated; and misbehaving actors can be detected and exposed.

Two years ago, I became fascinated by the idea of and the technology behind Sia and decided to build something helpful for visualizing the Sia blockchain: SiaStats. What started as a simple (and ugly) site displaying a handful of charts evolved into a full network monitor, and then into a provider of APIs and information services as part of the parent project keops. Now, not only do I generate and display information, but I also deliver insights directly to the renters’ desktops through the app Decentralizer.

My goal in this post is sharing with you how my projects work, the issues I faced and the solutions I found, as well as my future direction. I hope it will be useful for other developers looking to build an app on top of Sia, but also in general for those curious to learn how the tools they use work under the hood.

The Sia developer toolkit

When building apps for Sia, the Sia API is the primary way of interacting with the Sia network. While the database files of the software can be accessed directly (and even some of them are stored as plain JSON objects), the API is more comprehensive, can perform virtually any action the Sia client is capable of, and it is extensively well documented. For someone building a network monitor, it was fascinating to discover that not only the blockchain information, but also the information about the hosts network, is available: unlike other self-claimed decentralized networks that rely on “bridges” to interact with hosts (and so, only these centralized bridges are aware of the network stats, and no one else can audit them), in Sia every client communicates directly with every host to collect their settings and usage metrics. Every aspect about the storage network can be monitored and audited by an operator of a Sia node, including available and used storage, pricing info, software version, etc. It’s a dream come true for anyone looking into data analysis.

While these API calls can be performed in any programming language by using CURL commands, a second very handy resource are the wrappers — available for many languages — that simplify your code writing. A wrapper for Node.js is maintained by the Sia developers (and it is available in NPM!), but there are also wrappers for Java, C#, PHP and Python maintained by the community.

These two things were everything a third-party developer would ever need to build his app… until Luke Champine created us . And suddenly, the Sia universe expanded. us is a library that allows developers to interact with the Sia network at a very deep level: contracts can be signed with specific hosts; files can be uploaded to hosts of your choice; you can sign a contract from one machine and allow a second machine to use this contract for uploading (using these contracts as capabilities); a third machine can download those files even if it never interacted before with the host. In the case of a network monitor as SiaStats, it is allowing me to benchmark the individual capabilities of each host (more about this later). While us is still work in progress, so far in my hands it is 100% usable. I am convinced that over the next year we will see an explosion of apps and resources powered by it: host quality assessment, file sharing, lite clients, and more.

SiaStats under the hood

The core of SiaStats is Navigator: a blockchain indexer that saves the block information into an SQL database. A benefit of building your own DB instead of accessing the built-in Sia DB is the ability of enriching the data and building your own data structures. For example, it is not possible to know, just looking at the blockchain, when a file contract has failed; Sia nodes can determine it without any doubt because it is a contract without any storage proof associated during the 24-hour window at the end of it, but it is not indicated anywhere in the ‘consensus’ database if it failed or not. By building my customized DB, I can flag contracts as failed or succeeded. As I consider it to be a critical component for many projects, and I think Sia can benefit from more developers deploying explorers, I made Navigator open source, together with its own API server and an optional web frontend. In parallel, another script, Hosts Monitor , collects the information about the hosts network from the Sia API, geolocation databases, and internal benchmarks (more about this later).

Initially, my setup consisted of all my scripts connected to a single Sia client on the same OS and physical machine. I quickly realized how badly that worked and scaled up. There are advantages to using a Sia multi-node setup:

Reliability: if one node fails for some reason, the scripts can rely on a backup node. This is especially important when using the explorer module of Sia (a non-default module just used by a few app types, like explorers), as it has not been maintained during the last couple of years, and it is prone to severe crashes.

module of Sia (a non-default module just used by a few app types, like explorers), as it has not been maintained during the last couple of years, and it is prone to severe crashes. Robustness: while blockchain data is the same across all the Sia clients in the network, this is not the case of the information from the hosts network. Hosts communicate their settings and used storage using a P2P protocol, so connection timeouts or the network architecture will make the host databases of any two users appear to be slightly different. It’s not uncommon for 5–10 hosts to appear as offline for one user while online for another. Merging and averaging the data from more than two clients, located in different regions, avoids this variance and shows a more accurate picture of the network.

Some hosts just don’t want to be tracked. For example, last year a hosting farm tried to avoid my monitoring by blocking the IP of the SiaStats web server (not knowing this was actually a honeypot I made for detecting cheaters). Moreover, once websites start benchmarking hosts, these hosts can learn to cheat the system by behaving correctly with the IP of the evaluator (thus obtaining a good score) while refusing to serve files to the rest of renters. Nodes in alternative (and undisclosed) locations can mitigate this exploit.

SiaStats currently works as a multi-node and multi-location setup: a Sia client and us scripts run on each individual machine, and the scripts on the central sever connect to them to request information from Sia. One challenge here is that due to security constraints, the Sia daemon can’t be easily accessed from the outside world. To solve this, I use what I call Routers: API server scripts running on each remote machine. My scripts make a request to the Router, and the Router makes the API call to Sia on their behalf, returning to them the answer of the Sia daemon. They’re just an API middleman. But I also gave them some additional capabilities, like interacting with the us scripts and sending the file contracts I use for benchmarking back and forth from remote machines to the central server (so I can create a contract in one remote machine and afterwards reuse it across the rest of machines). The Routers also execute basic housekeeping tasks, like restarting a Daemon or the machine.

The next component in the stack is Siastats.js. It collects the raw blockchain data from the SQL database and analyzes it to build the additional databases that are presented on the charts of the SiaStats website. Some examples are the distribution of mining pools, the percentage of succeeded file contracts, and the amount of used storage among all the active hosts.

“Who watches the watchmen?” The last piece of my backend is a script called Custodium: it checks the health and sync status of all the Sia daemons in remote machines I operate by taking samples of certain API calls. It also detects misfunctioning scripts on the main server. In some cases, Custodium can solve the issues by itself (like ordering the appropriate Router to restart the Sia daemon or unlocking its wallet), and if not, it will send me an email alert so I can take action.

The data from Siastats.js and the SQL database is presented to the outside world using a RESTful API server script, and consumed by three different recipients:

The SiaStats website, which acts as a visual frontend of all my APIs; The Decentralizer desktop app, which receives the geolocation of hosts and a database of the known hosting farms of the network; Other websites, services and power users that directly use my APIs.

Towards a segmented storage network

Much of my recent work is related to facilitating an idea I think will benefit the whole Sia network: segmenting (or tiering) the storage marketplace. Currently, a host handled on a datacenter with top-tier hardware and connectivity does not look different on the host database of a Sia user from a host running on an ARM board on a residential network. Both would be scored similarly, even while having dramatically different performance during real use. I do not mean that there are “good” hosts and “bad” hosts, I just think that not every host suits the needs of every renter. A pool of low-spec and cheap hosts might suffice the needs of users that require a cold storage solution, but renters looking for warm and hot storage will benefit from datacenter-grade hosts, even if that means higher costs.

The solution for this is benchmarking each individual host. The Sia software of each user cannot do this, because if thousands of clients worldwide were benchmarking every host simultaneously, hosts connections would become saturated. Sia as a protocol should also not rely on external “trusted” evaluators for assessing the hosts quality, as that goes against the decentralization ethos of Sia. This does not mean, however, that third-party developers like me can’t deploy solutions on top of the network that try to address this problem: if I prove to be biased or dishonest as a host evaluator, you just need to blame me, kick me out of business, and use an alternative service instead.

Phase I of my plan consisted of providing renters with an interface that allows them to manually pick the hosts they want, instead of having to rely on the automatic hosts selection process of the Sia client. Enter Decentralizer. Decentralizer is a desktop companion app for renters that facilitates the use of some Sia API endpoints that, while powerful, are not easy to interact with. For example, Sia allows you to set a filter for restricting hosts selection, but requires you to provide the list of their pubkeys, which can be difficult for average users. Decentralizer presents the user a list of all the active hosts that can be arranged by country, Sia version, or pricing. The user just needs to click on those to include in the filter: the app then composes the API call and executes it. Thanks to this tool, right now any renter can easily create a customized list of hosts based on geolocation, pricing, or any other quality parameter he is aware of.

A set of contracts restricted to the European Union thanks to the Decentralizer app

But how can a renter be aware of the quality of each host? How can hosts be aware of it too, and set a pricing according to the value of the service they provide? Phase II of the plan consists of releasing the Hosts Monitor of SiaStats, which is available now. Thanks to the outstanding work of Luke Champine with us and its related tools, I’ve built a service that benchmarks every host on the Sia network by forming contracts with all of them and attempting to upload and download a small test file. These results are then compared across the network and transformed into numeric scores. Renters will be immediately empowered (using Decentralizer or any similar tool) to select those hosts that better fit their needs.

From the hosts perspective, they will be able to learn their strengths and weaknesses, correct potential issues and decide to adjust their pricing according to the results. This last point will be facilitated by a table comparing their pricing with the average pricing of other hosts in his same segment. Shockingly, my data indicates that the current situation is contradictory: the most performant hosts are often cheaper than the mid-performant ones. My hope is that, thanks to this tool, hosts in the future will be more fairly compensated according to the value of their service. On top of that, these benchmarks will reveal hosts suffering issues, like lack of enough collateral, locked wallets, or consensus issues. If a host is suffering an issue that prevents its normal operations, the Hosts Monitor will display an alert. Moreover, hosts have the option to subscribe to email alerts and smartphone notifications to be alerted in less than two hours when a problem arises. I am hoping this effort will help to increase the robustness of the storage network. All these features are indeed free to use.

A glimpse of the upcoming Hosts Monitor

Just with these two tools (Monitor and Decentralizer), any pro renter can start digging around and pick the hosts he needs, but we need ways to make this information useful to the broader renter community. Phase III of the project will consist of delivering these benchmarks to every renter. I am willing to collaborate with storage app developers to provide them with customized professional APIs displaying this wealth of information about the Sia hosts, so they can create the optimal list of hosts tailored for their customers. Also, later this year I will deploy “Decentralizer PLUS”: an add-on for Decentralizer that will present preset lists of hosts like “cheap and functional”, “performant”, “best for frequent downloads”, and so on.

I have many more plans for the future. For example, once I can provide lists of hosts tailored to specific needs, why not directly provide the already formed contracts, including their maintenance? Thanks to us , a contract with a host can be created in a server, but then be used by another device to upload and download files just by sharing a small file (an idea by Luke Champine explained here). I aspire to create the Keops Contracts Marketplace, a solution that allows storage app creators to delegate on Keops/SiaStats the task of creating and maintaining contracts with hosts. No blockchain download, no complex contract logics: the app builder can focus on creating the best app experience on whichever platform he prefers, while SiaStats delivers the ready-to-use contracts through an API, after being paid in SC, BTC, or other cryptos.

SiaStats.info and Keops.cc aspire to be a reference source of data, insights and APIs for the broad community of Sia users. As I mentioned at the start of the post, the key in the world of big data is not having more numbers, but rather the ability to make the appropriate questions to obtain the correct insights. Do you have any outstanding question about the Sia blockchain or network? Is there any piece of data that would be valuable for your workflow or the app you are building? If you find the question, please do not hesitate to contact me (at keops_cc@outlook.com, or on Discord: hakkane#0489): the answer might be in the blockchain.