You may have heard of ELK (Elasticsearch, Logstash, Kibana). EFK is the same stack where we replace Logstash by Fluentd.

So why replace Logstash with Fluentd?

First of all, Fluentd is now hosted by the Cloud Native Computing Foundation, the same which hosts Kubernetes. Also, I find it very easy to configure, there is a lot of plugins and its memory footprint is very low.

We could also replace Fluentd by Fluent Bit, which is a lighter log forwarder but has fewer functionalities than the first one.

How does it work

A picture is worth a thousand words, so here is a simple schema.

Basically, each Fluentd container reads the /var/lib/docker to get the logs of each container on the node and send them to Elasticsearch. Finally, when we access Kibana, it requests the logs from Elasticsearch.

What for?

You may ask yourselves why bother setting up such a stack? I can just run docker logs ... , and there I have my logs.

Well, yes, you could. However, if you have several replicas you’ll have to go through each one of the containers to find what you are looking for. Here you can go through all the logs at once.

Also, with Kibana you can create dashboards and very nice visualizations.

Take a look:

Another interesting point is the manipulation of the logs. With Fluentd you can filter, modify and backup your logs very easily. But we’ll look into that in another post.

Enough with the introduction, let’s build this stack!

Elasticsearch

Elasticsearch is a distributed, RESTful search and analytics engine capable of solving a growing number of use cases. As the heart of the Elastic Stack, it centrally stores your data so you can discover the expected and uncover the unexpected.

In the E*K stack, we use Elasticsearch to store and search the logs forwarded by Fluentd. To deploy it into our Kubernetes cluster, we can use the GitHub repository of pires: pires/kubernetes-elasticsearch-cluster

You can modify the values in es-master.yaml , es-client.yaml and in es-data.yaml , for changing the number of replicas, the names, etc.

By default, ES_JAVA_OPTS is set to -Xms256m -Xmx256m . This is a very low value but many users, i.e. minikube users, were having issues with pods getting killed because hosts were out of memory. One can change this in the deployment descriptors available in this repository.

README.md

It is also recommended to change the storage from emptyDir to the storage of your choosing to get persistent storage of the data.

Once you have setup your files you can run:

$ kubectl -n logging create -f es-discovery-svc.yaml

$ kubectl -n logging create -f es-svc.yaml

$ kubectl -n logging create -f es-master.yaml

$ kubectl -n logging rollout status -f es-master.yaml

$ kubectl -n logging create -f es-client.yaml

$ kubectl -n logging rollout status -f es-client.yaml

$ kubectl -n logging create -f es-data.yaml

$ kubectl -n logging rollout status -f es-data.yaml

Note: I will run the EFK stack in the logging namespace, use kubectl create ns logging to create it.

Check that your Elasticsearch is up and running by following pires instructions.

If everything is okay, we can now set up Fluentd (or Fluent Bit)!

Fluentd

Fluentd is an open source data collector, which lets you unify the data collection and consumption for a better use and understanding of data.

https://www.fluentd.org/architecture

Fluentd can be run directly on the host, or in a Docker container. Here we will use a DaemonSet to ensure that Fluentd is running on every node. The code is from fluent/fluentd-kubernetes-daemonset.

If you have RBAC enabled on your cluster (and I hope you have), check the ClusterRole , ClusterRoleBinding and ServiceAccount of fluentd-daemonset-elasticsearch-rbac.yaml.

Don’t forget to change the namespace to the one used when deploying Elasticsearch, it should be the same.

Then we can apply the DaemonSet :

apiVersion: apps/v1

kind: DaemonSet

metadata:

name: fluentd

namespace: logging

labels:

component: fluentd-logging

version: v1

kubernetes.io/cluster-service: "true"

spec:

selector:

matchLabels:

component: fluentd-logging

template:

metadata:

labels:

component: fluentd-logging

version: v1

kubernetes.io/cluster-service: "true"

spec:

serviceAccount: fluentd # if you have RBAC enabled

serviceAccountName: fluentd # if you have RBAC enabled

tolerations:

- key: node-role.kubernetes.io/master

effect: NoSchedule

containers:

- name: fluentd

image: fluent/fluentd-kubernetes-daemonset:elasticsearch

env:

- name: FLUENT_ELASTICSEARCH_HOST

value: "elasticsearch" # the name of the previous es-svc.yml

- name: FLUENT_ELASTICSEARCH_PORT

value: "9200" # port of the previous es-svc.yml

- name: FLUENT_ELASTICSEARCH_SCHEME

value: "http"

resources:

limits:

memory: 200Mi

requests:

cpu: 100m

memory: 200Mi

volumeMounts:

- name: varlog

mountPath: /var/log

- name: varlibdockercontainers

mountPath: /var/lib/docker/containers

readOnly: true

terminationGracePeriodSeconds: 30

volumes:

- name: varlog

hostPath:

path: /var/log

- name: varlibdockercontainers

hostPath:

path: /var/lib/docker/containers

Fluent Bit

We will use the code of fluent/fluent-bit-kubernetes-logging.

As for Fluentd, if you use RBAC, create the ClusterRole , ClusterRoleBinding and ServiceAccount with:

$ kubectl create -f https://raw.githubusercontent.com/fluent/fluent-bit-kubernetes-logging/master/fluent-bit-service-account.yaml

$ kubectl create -f https://raw.githubusercontent.com/fluent/fluent-bit-kubernetes-logging/master/fluent-bit-role.yaml

$ kubectl create -f https://raw.githubusercontent.com/fluent/fluent-bit-kubernetes-logging/master/fluent-bit-role-binding.yaml

We also need to create a ConfigMap for the configuration. You can either modify this one and apply it or use the default one:

$ kubectl create -f https://raw.githubusercontent.com/fluent/fluent-bit-kubernetes-logging/master/output/elasticsearch/fluent-bit-configmap.yaml

Once the ConfigMap is applied, here is the code for the DaemonSet :

apiVersion: apps/v1

kind: DaemonSet

metadata:

name: fluent-bit

namespace: logging

labels:

component: fluent-bit-logging

version: v1

kubernetes.io/cluster-service: "true"

spec:

selector:

matchLabels:

component: fluent-bit-logging

template:

metadata:

labels:

component: fluent-bit-logging

version: v1

kubernetes.io/cluster-service: "true"

spec:

containers:

- name: fluent-bit

image: fluent/fluent-bit:0.12.17

env:

- name: FLUENT_ELASTICSEARCH_HOST

value: "elasticsearch" # the name of the previous es-svc.yml

- name: FLUENT_ELASTICSEARCH_PORT

value: "9200" # the port of the previous es-svc.yml

volumeMounts:

- name: varlog

mountPath: /var/log

- name: varlibdockercontainers

mountPath: /var/lib/docker/containers

readOnly: true

- name: fluent-bit-config

mountPath: /fluent-bit/etc/

terminationGracePeriodSeconds: 10

volumes:

- name: varlog

hostPath:

path: /var/log

- name: varlibdockercontainers

hostPath:

path: /var/lib/docker/containers

- name: fluent-bit-config

configMap:

name: fluent-bit-config # name of the previously created ConfigMap

serviceAccountName: fluent-bit

tolerations:

- key: node-role.kubernetes.io/master

operator: Exists

effect: NoSchedule

Kibana

Once you have some logs into Elasticsearch, we can add a tool for exploring and analyze them like Kibana.

Kibana lets you visualize your Elasticsearch data and navigate the Elastic Stack.

So we will deploy Kibana as a deployment. We’ll use the 6.2 OSS version because it does not have X-Pack enabled. (If you want X-Pack, you can, of course, adjust the image name).

apiVersion: apps/v1

kind: Deployment

metadata:

namespace: logging

name: kibana

labels:

component: kibana

spec:

replicas: 1

selector:

matchLabels:

component: kibana

template:

metadata:

labels:

component: kibana

spec:

containers:

- name: kibana

image: docker.elastic.co/kibana/kibana-oss:6.2.2

env:

- name: CLUSTER_NAME

value: myesdb # name of the Elasticsearch cluster defined in the first part

resources:

limits:

cpu: 1000m

requests:

cpu: 100m

ports:

- containerPort: 5601

name: http

When the pod is up and running, we can make it accessible via a Service and an Ingress :

apiVersion: v1

kind: Service

metadata:

namespace: logging

name: kibana

labels:

component: kibana

spec:

selector:

component: kibana

ports:

- name: http

port: 5601

---

apiVersion: extensions/v1beta1

kind: Ingress

metadata:

namespace: logging

name: kibana

annotations:

kubernetes.io/ingress.class: traefik

ingress.kubernetes.io/auth-type: "basic"

ingress.kubernetes.io/auth-secret: kibana-basic-auth

spec:

rules:

- host: "<your-url>"

http:

paths:

- path: /

backend:

serviceName: kibana

servicePort: http

For protecting Kibana with a user/password, we will use Traefik as an Ingress controller.

To create the secret:

$ htpasswd -c ./auth <your-user>

$ kubectl -n logging create secret generic kibana-basic-auth --from-file auth

$ rm auth

When you will access to Kibana for the first time, you need to configure your first index.

If you did not change Fluentd’s default configuration, the index name/pattern is logstash-YYYY-MM-DD . You can use a wildcard to catch all index beginning with logstash- by using logstash-* .

And now you have a functional EFK stack!

In the next part, we will dive into the configuration of Fluentd and Kibana.