System monitoring with osquery

Did you know...? LWN.net is a subscriber-supported publication; we rely on subscribers to keep the entire operation going. Please help out by buying a subscription and keeping LWN on the net.

Your operating system generates a lot of run-time data and statistics that are useful for monitoring system security and performance. How you get this information depends on the operating system you're running. It could be a from report in a fancy GUI, or obtained via a specialized API, or simply text values read from the filesystem in the case of Linux and /proc . However, imagine if you could get this data via an SQL query, and obtain the output as a database table or JSON object. This is exactly what osquery lets you do on Linux, macOS, and Windows.

Osquery is an open-source project created by Facebook and hosted at Github, which the company released to the world under a 3-clause BSD license in 2014. It initially supported only Linux and macOS, but a Windows port (with somewhat lesser capabilities) was released in 2016, allowing osquery to have a unified SQL-based query interface for quite different operating systems. A data center running multiple operating systems would be able to query the state of the infrastructure using a single querying interface, simplifying the data collection implementation for DevOps teams. The osquery development team wanted to created a fast, reliable, and easy instrumentation tool that did not require a lot of low-level programming to retrieve information from.

System information is presented in the form of database tables; it is dynamically generated at query time rather than retrieved from storage. Naturally, different operating systems have slightly different information to present to the user. However, the osquery developers have strived for "feature parity" between systems; there is a set of common tables available for every supported platform, along with a set of operating-system-specific tables. Querying the tables is done using a subset of SQL that follows the SQLite syntax.

Using osquery

Packages for osquery are available for CentOS and Ubuntu. I installed the provided package for Ubuntu 16.04 and it required no tweaking to get up and running immediately using the default configuration. There are two ways to use osquery: interactively or as an operating system service. The interactive program, osqueryi , can be used as a command interpreter that is similar to interactive prompt of SQLite. The interactive program is useful for exploring the running system. The available tables can be found in the documentation.

I ran a few sample queries to try out osquery in interactive mode. For example, getting a list of logged-in users, with their tty , login time, process id of the login, and host they are logging in from:

osquery> SELECT * from logged_in_users; +-----------+----------+-------+------------------+------------+------+ | type | user | tty | host | time | pid | +-----------+----------+-------+------------------+------------+------+ | boot_time | reboot | ~ | 4.8.0-52-generic | 1495211035 | 0 | | runlevel | runlevel | ~ | 4.8.0-52-generic | 1495211045 | 53 | | login | LOGIN | tty1 | | 1495211045 | 792 | | user | hussein | pts/8 | 10.0.2.2 | 1495211067 | 1134 | +-----------+----------+-------+------------------+------------+------+

The table lists my username as well as several pseudo-users, such as boot_time and runlevel .

Starting osqueryi with the --json flag gives JSON output:

osquery> SELECT version from os_version; [ {"version":"16.04.2 LTS (Xenial Xerus)"} ]

Querying basic system information is fun, but the true usefulness of this tool shines when pulling data from several different tables to make inferences about the system. For example, if I were a system administrator I might want to check for processes with uid 0 (privileged processes) opening network connections to the outside world. The appearance of such connections is symptomatic of fishy behaviour that might indicate unauthorized system use. To do this, we can do an SQL join of some tables to search for any process with uid 0 with an open socket. To test this, I first needed a root-owned process to create an open socket, so I opened a network connection with sudo using telnet to google.com's port 80. Then I ran this query:

osquery> SELECT DISTINCT processes.uid, process_open_sockets.pid, process_open_sockets.remote_address, process_open_sockets.local_port, process_open_sockets.remote_port FROM process_open_sockets INNER JOIN processes WHERE processes.pid=process_open_sockets.pid AND processes.uid=0 AND process_open_sockets.remote_address <> "" AND process_open_sockets.remote_address <> "0.0.0.0" AND process_open_sockets.remote_address <> "10.0.2.2" AND process_open_sockets.remote_address <> "::"; +-----+------+----------------+------------+-------------+ | uid | pid | remote_address | local_port | remote_port | +-----+------+----------------+------------+-------------+ | 0 | 2641 | 64.233.188.99 | 36600 | 80 | +-----+------+----------------+------------+-------------+

We do an inner join of the processes and process_open_sockets tables, using the uid information from the processes table to filter the results, and disregarding local addresses. As revealed from the table, there is a process with pid 2641 connected to 64.233.188.99 (one of Google's public-facing IPs) on port 80. I needed to run osqueryi using sudo, because otherwise it cannot display information that a regular user does not have the permissions to view. While it is possible to run other SQL commands beside SELECT , commands like UPDATE , INSERT , and DELETE don't do anything on the standard tables, as they are all understandably read-only. It is also possible to pipe queries into osqueryi and get the results back via stdout , which makes it possible to use as part of shell scripts.

Osquery can run as a service called osqueryd , which is a monitoring daemon that allows scheduling system queries or generating queries based on events. Query results are logged, either to the filesystem or via a plugin to a service such as Amazon's AWS. Configuring the daemon involves writing a script with a JSON-formatted list of SQL queries and the intervals at which they should run, much like a cron script. A scheduled query produces a log of the chosen system information at discrete intervals, so it is useful for seeing how the system state changes over time.

Logging, security, and monitoring

Osquery is useful for cross-platform monitoring of system infrastructure. The above example where the detection of root-owned processes that open network connections could be configured as a periodic query using osqueryd ; any suspicious activity would be monitored and logged.

Additionally, a feature of osquery called file integrity monitoring on Linux and macOS systems tracks filesystem changes. This is accomplished via inotify on Linux and on macOS using FSEvents. Any files that a user wants to track need to be specified in the configuration file, and the status of the files can be read in a special table called file_events . Logs can be generated when a file is either accessed or changed. When a file change event happens, the MD5, SHA-1, and SHA-256 hashes of the affected file will be recalculated and logged.

Finally, basic auditing is also available on Linux and on macOS (via a kernel extension); the table process_events records process creation details. A table called socket_events is available on Linux that stores reports from the bind() and connect() system calls, as well.

There are two types of logs generated by osquery: status logs and results logs. Status logs contain the execution information of osquery itself; they are created using the Glog framework. Results logs are the output of queries, and come in two varieties: snapshot and differential. A snapshot logs the output of the entire query; that can often result in large logs as entire tables are written to disk at the specified intervals. A more space-economical format is the differential log, which is a format that shows the user which records were either added, removed, or changed.

By default osquery will log to the filesystem, but additional log aggregation options are available via plugins. Osquery version 1.7.4 and later can log directly to Amazon Web Service's Kinesis Streams and Kinesis Firehose, which are cloud-based data buckets that can capture data streams for storage and analysis.

Custom tables, extensions, and plugins

Custom tables can be added to the osquery source code; those tables can be filled with data from any data source. There are two parts to this: defining the table structures themselves, and then creating custom C++ classes to obtain the data to populate them. The table structure definition is done via defining a schema written in Python that will auto-generate the necessary C/C++ implementation of the table. Then a class is created for populating the data, which can be done via system calls, reading in values from /proc , or be dynamically generated. To get an idea of what this involves, there is example code given in the documentation that creates a custom table that holds the current time derived from the time() system call.

Osquery's functionality can also be built upon with plugins and extensions. Extensions are separate processes built and linked together with the osquery core code; they communicate with the main osquery process (either osqueryd or osqueryi ) via a Thrift API. Extensions are useful for creating plugins for things such as logging and configuration.

Conclusion

Osquery is a unique tool for system monitoring. While it is useful on its own, one can imagine it as a basis for creating more complex monitoring systems with sophisticated malware and anomaly detection, where rules can be easily created by simply crafting an appropriate query, or even dynamic monitoring tools that create conditional monitoring rules based on real-time events. SQL is a powerful and expressive query language, and its use for log analysis gives a flexible tool to system administrators and DevOps teams.