Games starting, player count across games and modes



Sever performance, such as TPS and player counts



Language, rank and version distribution

Database performance, such as Redis and MongoDB

Systems performance, such as loaded player data, packets across the network, physical server usage



Game specific metrics, such as Capture the Wool for metrics such as queued spectators, friend joins, game length, gold earned/spent and more.

Punishment metrics, including punishments by reason, length, rank etc



We also moved some of our legacy monitoring systems into this, including proxy and login monitoring.

​

Hello all!Continuing from our previous Dev Blog on Housing , we'd like to share some insights on our internal work towards metrics for the team to provide further insights across the network.Over the last few months, we have undertaken the task of improving our metric based data collection and building a system around it that allows anyone on the team to view data that can assist in decisions for games and the network as a whole.The first step towards handling metrics in an efficient and stable manner was exploring solutions for how to store the data. After exploring a few solutions, we decided to go with InfluxDB. InfluxDB is a time series based database which is great for wanting to store metric data with large datasets and fast query times. Along with this, we also decided to use Grafana for displaying the data in a system that allows people on the team without database knowledge to view data for any time period.After configuring the hardware and software in the way we wanted it to work, we could get started on building a system to handle all the metric data we would be wanting to insert into the database. We decided to go ahead and create a submodule of our Goliath system, called StatsStorage. This allows any game instance, proxy, or another system on the network to simply send an internal packet that StatsStorage can then handle and insert into the database as required.At this point, we could start adding real data into InfluxDB and start displaying it within Grafana. We started with some basic metrics such as player count per game. This was a simple metric to start tracking since the MasterControl service already had this data. This sort of data can be very helpful as it can show us the impact of updates and other changes across the network.Beyond this the team kept adding more metrics where possible, including;Currently, at the time of writing this, we're handling on average 4,000 data points per second into InfluxDB and we expect this to increase even more as we start to track more metrics. Because of this, we decided to do some stress testing and see how far we could push InfluxDB until it started to have some noticeable slowdowns. We pushed our stress testing to 50,000 data points per second without any effect on other operations, such as reading data back from the database.This is just some insight towards our metrics setup, which has since helped us identify issues and tweak updates and systems. As you may have seen in the recent thread regarding the pregame improvements from Nitroholic.