tl;dr; Block adhoc usage of your system's database (even read-only)

There are two major use cases for a database in your system:

Storing state for the application to function properly. Looking at the state to gather information or statistics.

I want to argue that these use cases should be fully distinct and that they should be served by different databases.

The first use case (storing state) is absolutely critical to the system’s feature and reliability. As such we want to minimize the surface area of what it can encounter. On the other hand, the second use case (statistics) is mostly asynchronous work that isn’t correlated with the system’s workload.

By mixing them up we risk:

If the database is write accessible, an operator could accidentally modify production data.

If the database is read only, an operator could still overload or lock the database with a read query

If you need to revert to a previous known good state, you’ll also lose your statistics about the incident

However we usually want statistics about the state of the system, so if not from the database, where can I get it. There is a couple of strategies:

Use an asynchronous read-only replica to compute statistics. Even if you bring this one down, your overload system is unaffected.

Log events your care about into something else (honeycomb.io, or even a Hive table)

Use the backup to compute statistics. While this has the most delay it also has the advantage of being a continuous proof that the backups are functional.

In closing words, it is usually possible using your existing database layout to fully separate the use cases of storing state vs gathering statistics, so don’t risk your production system with something that could be done elsewhere.