A Distributed Search Engine for the Distributed Web

An update on the development of Dweb.page

While search neutrality might be open for discussion, it is pretty clear that Google’s centralized search engine with a market share of above 90 % and quarterly earnings of above 30 billion dollars are far from ideal. Monopolies not only are economically inefficient but also increase the chance of censorship and search bias.

If it comes to finding information on the distributed web, a centralized search engine seems counterintuitive, because it goes against the principles underlying the distributed web. That is why we are currently working hard to create the first fully functional, completely distributed search engine for our project Dweb.page.

First draft of our improved search engine

Problem

Despite the earlier mentioned downsides of current search engines, we believe multiple reasons have led to difficulties in changing the existing model. At the same time, a distributed and fully transparent search engine for the Dweb comes with a set of challenges:

Speed: The speed of the distributed search engine needs to be at least as high as the current solutions, and there are a lot of problems with the transaction times based on distributed ledgers. Device independence: Today more and more people are using mobile phones; the distributed search engine needs to run on PCs and mobile phones without any centralized backend. Indexing: How to collect, parse, and store data to facilitate fast and accurate information retrieval in a distributed way and still ensure that people don’t create fake search entries? Availability: How to ensure that distributed data is still available when requested? Especially since the data can be hosted locally and therefore only be available in certain time slots. Monetization and incentives: How to finance the storage and continuous development of the tool? Without this monetization part figured out, it will be difficult for decentralized solutions to compete with existing centralized ones for example regarding human talent or partnerships/integrations, etc.

A potential solution

To ensure high speed and feeless transactions, it was clear from the beginning that distributed ledger technologies which are limited by either one of the two performance issues were not an option. Therefore, we chose the combination of IPFS and IOTA. IPFS is fulfilling the obvious role of a fast and distributed way to share and host files, whereas IOTA provides the necessary distributed database layer. It is important to notice here that the database only uses a part of the IOTA technology which is already fully functional and independent of future research work (e.g., regarding the coordinator).

This combination allows us to provide an experience which works on all kinds of devices. We even had a prototype running in the Internet Explorer. The unique feature is that we can deliver a fully distributed experience without the additional installation of any software since all the code is running inside a simple, completely open source web page, which by itself is distributed on IPFS. It also means every single user will run their own search engine, which is the ultimate distribution.

Inspired by this distributed interface, we are working on the following concept for a distributed search engine:

The distributed and personalized search engine

We assume two types of users, who we call Authors and Consumers (one person could fulfill both roles though).

Authors upload content on the distributed web via Dweb.page. If they want their content to be publicly found by others, metadata, which is signed by the Authors, is upload on IOTA. This way anyone can create their own metadata instead of a centralized indexing system. On top of it, this signature system would make it impossible to pretend to be someone else, which today happens for example with news stories or bank websites.

When Consumers open for the first time Dweb.page, they will start loading the most recent metadata in the background. Based on this metadata a search engine running locally provides the user with initial and fully transparent search results. These first searches by the user will automatically be used to subscribe to the potentially interesting Authors and this way load additional metadata. This can be seen as a social network for metadata, where Consumers “follow” the Authors. Advantages of the approach include on the one hand that users do not have to load the complete metadata of the entire web and on the other hand, that they can easily block a provider of malicious metadata (e.g., wrongly labeled content). Furthermore, without this subscriber/block model, people could start spamming the search engine.

Additionally, everyone who uses the search engine of Dweb.page will generate information about the availability of content. This means that if someone tries to download some content on the distributed web which is no longer available, the information will be passed to other users. If multiple Authors tell you that a file is no longer available, it’s automatically removed from your search results. If only one tells you about it, the file would still be listed in your search results to give you the option to check, if the Author doesn’t try to prevent you from accessing certain content by lying about its availability.

The last key, challenging, and often overlooked part for every distributed project is how to monetize and provide incentives to storage providers and developers of the distributed web. In a distributed and open source solution without any centralization, it is possible to circumvent any incentive model. That is why a lot of decentralized projects end up having a centralized layer. Furthermore, donation-based systems don’t seem to work well for subscription or long-term based business models. That is why we are considering creating a model benefiting all participants while maintaining complete transparency. The following picture illustrates how this potential solution would work:

The business model of a distributed search engine

The search market is well positioned for advertisement since, even without giving up any privacy, it is possible to show advertisement based on search terms. This advertisement revenue can then be split and be used on the one hand to provide a certain amount of free storage to Authors and, on the other hand, to support the Developer to improve the tool further. If you think for example about Google providing you 15 GB of free cloud storage and still earning billions quarterly, you get the idea that the above model might result in a completely free web for Authors! Also, it is important to point out that, a large share of the population is not against advertisement per se, but against the misuse of their personal data, which would be impossible based on this model.

Naturally, this model needs to be set up completely transparent on a distributed ledger. If this is the case, a normal contract between all participants might be sufficient at the beginning since you could easily sue the malicious parties (e.g., if money gets misused instead of invested into the infrastructure). However, this contract should also contain right from the beginning the option to change over time based on a voting system for example. Otherwise, a model like this would be unable to adapt to future developments, for example, storage prices might get so cheap that it makes sense to use the money for other purposes. This and other aspects of the system, like the quality of the provided storage or advertisement, might be difficult to integrate into smart contracts. Nevertheless, at a later stage, this setup should be replaced with fully automated smart contracts.

This article provides a view of our current research, and it does not constitute a finished product. We believe that we can only achieve this vision if we are transparent right from the start and we appreciate any feedback or contribution. Help us in achieving this vision: