So…we’ve got Syslog data for a bunch of different hosts. Let’s create a new Kafka topic that’s populated in realtime with Syslog messages just for our network devices. Which are my network devices? These ones:

BZ2,24a43cde91a0,v4.0.69.10871: BZ2,dc9fdbec6a10,v4.0.69.10871: U7PG2,18e829e930a0,v4.0.69.10871: U7PG2,f09fc2238301,v4.0.69.10871: USC8,fcecdabfcf6d,v4.0.66.10832 usgmoffattme

There are a couple of ways to select messages with which to populate this new topic. I could hardcode a predicate list of all the hostnames of my network devices.

CREATE STREAM UBNT_SYSLOG AS SELECT * FROM SYSLOG WHERE HOST = 'BZ2,24a43cde91a0,v4.0.69.10871:' OR HOST = 'BZ2,dc9fdbec6a10,v4.0.69.10871:' OR HOST = 'U7PG2,18e829e930a0,v4.0.69.10871:' OR HOST = 'U7PG2,f09fc2238301,v4.0.69.10871:' OR HOST = 'USC8,fcecdabfcf6d,v4.0.66.10832' OR HOST = 'usgmoffattme' EMIT CHANGES;

That’s only so useful whilst that list doesn’t change. I could wildcard it based on the patterns in the naming ( U7PG2 is the prefix of one of the access point types, etc):

CREATE STREAM UBNT_SYSLOG AS SELECT * FROM SYSLOG WHERE HOST LIKE 'BZ2%' OR HOST LIKE 'U7PG2%' OR HOST LIKE 'USC8%' OR HOST = 'usgmoffattme' EMIT CHANGES;

This is better because the stream will adapt as new devices are added—but only if they match those patterns. The best way to do it is simply have a list of network devices in a ksqlDB table (which is backed by a Kafka topic)…

CREATE TABLE UBNT_NETWORK_DEVICES (ROWKEY STRING KEY ) WITH (KAFKA_TOPIC = 'network_devices' ,VALUE_FORMAT = 'AVRO' , PARTITIONS = 1 ); INSERT INTO UBNT_NETWORK_DEVICES VALUES ( 'BZ2,24a43cde91a0,v4.0.69.10871:' ); INSERT INTO UBNT_NETWORK_DEVICES VALUES ( 'BZ2,dc9fdbec6a10,v4.0.69.10871:' ); INSERT INTO UBNT_NETWORK_DEVICES VALUES ( 'U7PG2,18e829e930a0,v4.0.69.10871:' ); INSERT INTO UBNT_NETWORK_DEVICES VALUES ( 'U7PG2,f09fc2238301,v4.0.69.10871:' ); INSERT INTO UBNT_NETWORK_DEVICES VALUES ( 'USC8,fcecdabfcf6d,v4.0.66.10832' ); INSERT INTO UBNT_NETWORK_DEVICES VALUES ( 'usgmoffattme' );

…and then join to this to create the stream. By using an INNER JOIN I force it to only return messages for which there is a corresponding host on the UBNT_NETWORK_DEVICES table. Now any time I add a new network device I just add it to the table (with an INSERT INTO ) and it gets picked up automagically in the join.

CREATE STREAM UBNT_SYSLOG AS SELECT S. * FROM SYSLOG S INNER JOIN UBNT_NETWORK_DEVICES U ON S. HOST = U.ROWKEY EMIT CHANGES;

Now I have a new Kafka topic called UBNT_SYSLOG :

ksql > SHOW TOPICS; Kafka Topic | Partitions | Partition Replicas ------------------------------------------------------------------- … network_devices | 1 | 1 syslog | 1 | 1 UBNT_SYSLOG | 1 | 1 … -------------------------------------------------------------------

and when I check the number of messages for each host, I can see that it only contains messages for my Unifi hosts: