Monitoring Kubernetes clusters with Prometheus and Grafana

Opcito Technologies
3 min readJun 14, 2019

--

Monitoring Kubernetes clusters is different from monitoring client-server networks because of the master-node architecture. Some of the parameters that need consistent monitoring include pod resources, memory usage, CPU utilization, network bandwidth, disk pressure, etc. Kubernetes, by itself, does not self-monitor. However, it can be fine-tuned to detect problems in their early stages. Ideally, these clusters should take corrective action as soon as the parameters mentioned above slightly exceed their threshold. If this is not the case, it must at least ensure that alerts are generated so that issues arising in the clusters can be taken care of manually.

Here, we will take a look at how Prometheus and Grafana can be used for cluster monitoring. While Prometheus comes geared with exceptional querying, analytical, and alerting capabilities, Grafana is used to simply visualize the retracted information and draw meaningful insights from it. But first, let’s discuss the importance of monitoring for K8s.

Why monitoring is so important?
Clusters are used to maintain high reliability and efficient throughput. For effective working, clusters are established on various nodes so that they can communicate among themselves. There can be clusters on various nodes or a combination of clusters. This boosts the overall performance and appropriate use of system resources. With the growing need for clusters, regular monitoring is rapidly becoming an essential aspect for reliable and superior performance.

Cluster monitoring involves assessing the performance of cluster entities either as individual nodes or as a collection of nodes. It must ensure that the cluster is functionally stable and working efficiently. Cluster Monitoring systems should provide some basic information about the clusters like the communication & interoperability between various nodes of the cluster. Since cluster nodes are set up over different servers, broad coverage and consistent view pose a big challenge. Real-time visualization of cluster data (of individual nodes or collective) improves the overall performance of applications.

Prometheus
Originally built by SoundCloud, Prometheus is one of the popular open-source systems monitoring & alerting toolkit. Prometheus primarily supports a pull-based HTTP model, however, it also supports alerts. Prometheus records a purely numeric time series efficiently irrespective of the type of monitoring that could be machine-centric or of highly dynamic service-oriented architectures. While Prometheus server is the main component that scrapes and stores time series data, Alertmanager handles alerts.

Some of the interesting features of Prometheus are as follows:

  • Multidimensional data model with time series data identified by the metric name
  • PromQL flexible query language
  • Single server nodes, no distributed storage
  • Time series collection happens via a pull model over HTTP
  • Pushing time series is supported via an intermediary gateway
  • Targets are discovered via static configuration or service discovery.

Grafana
Grafana is an open-source visualization tool. It can be used on a variety of data stores but it is commonly used with Graphite, InfluxDB, and Prometheus. Grafana allows queries and metrics visualization.

We have to automate Grafana using Ansible playbook. To get the data from Prometheus server, we must configure the Prometheus data source in Grafana. We can create our own dashboards to visualize this data but we have imported the dashboards that are available on the Grafana website. We have also automated dashboard import using Ansible. Here is an example of …read more

--

--

Opcito Technologies
Opcito Technologies

Written by Opcito Technologies

Product engineering experts specializing in DevOps, Containers, Cloud, Automation, Blockchain, Test Engineering, & Open Source Tech

No responses yet