There's been much focus on MongoDB, Elastic and Redis in terms of data exposure on the Internet due to their general popularity in the developer community. However, in terms of data volume it turns out that HDFS is the real juggernaut. To give you a better idea here's a quick comparison between MongoDB and HDFS:

MongoDB HDFS Number of Servers 47,820 4,487 Data Exposed 25 TB 5,120 TB

Even though there are more MongoDB databases connected to the Internet without authentication in terms of data exposure it is dwarfed by HDFS clusters (25 TB vs 5 PB). Where are all these instances located?

Most of the HDFS NameNodes are located in the US (1,900) and China (1,426). And nearly all of the HDFS instances are hosted on the cloud with Amazon leading the charge (1,059) followed by Alibaba (507).

The ransomware attacks on databases that were widely publicized earlier in the year are still happening. And they're impacting both MongoDB and HDFS deployments. For HDFS, Shodan has discovered roughly 207 clusters that have a message warning of the public exposure. And a quick glance at search results in Shodan reveals that most of the public MongoDB instances seem to be compromised. I've previously written on the reason behind these exposures but note that both products nowadays have extensive documentation on secure deployment.

Technical Details

If you'd like to replicate the above findings or perform your own investigations into data exposure, this is how I measured the above.