Introduction

Grafana is an open-source platform for monitoring and observability. It allows you to query, visualize, alert on, and understand your metrics no matter where they are stored. In this post, we will see how to monitor a Kubernetes cluster using Grafana.

Prerequisites

A Kubernetes cluster
Helm
The following argument in your kubernetes service file to enable additional metrics:

  --kube-controller-manager-arg bind-address=0.0.0.0
  --kube-proxy-arg metrics-bind-address=0.0.0.0
  --kube-scheduler-arg bind-address=0.0.0.0
  --etcd-expose-metrics true
  --kubelet-arg containerd=/run/k3s/containerd/containerd.sock

Install Grafana & Prometheus

Add the Helm repository

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

Create a namespace for monitoring

kubectl create namespace monitoring

Create a secret for Grafana

Create a secret manifest file

Create a file named grafana-admin-credential.yml with the following content:

apiVersion: v1
kind: Secret
metadata:
  name: grafana-admin-credentials
  namespace: monitoring
type: Opaque
stringData:
    GF_SECURITY_ADMIN_PASSWORD: "admin" # Change this password
    GF_SECURITY_ADMIN_USER: "admin" # Change this username

Apply the secret

kubectl apply -f grafana-admin-credential.yml

Create a values file for Grafana

Create a file named values.yaml with the following content:

fullnameOverride: prometheus

defaultRules:
  create: true
  rules:
    alertmanager: true
    etcd: true
    configReloaders: true
    general: true
    k8s: true
    kubeApiserverAvailability: true
    kubeApiserverBurnrate: true
    kubeApiserverHistogram: true
    kubeApiserverSlos: true
    kubelet: true
    kubeProxy: true
    kubePrometheusGeneral: true
    kubePrometheusNodeRecording: true
    kubernetesApps: true
    kubernetesResources: true
    kubernetesStorage: true
    kubernetesSystem: true
    kubeScheduler: true
    kubeStateMetrics: true
    network: true
    node: true
    nodeExporterAlerting: true
    nodeExporterRecording: true
    prometheus: true
    prometheusOperator: true

alertmanager:
  fullnameOverride: alertmanager
  enabled: true
  ingress:
    enabled: false

grafana:
  enabled: true
  fullnameOverride: grafana
  forceDeployDatasources: false
  forceDeployDashboards: false
  defaultDashboardsEnabled: true
  defaultDashboardsTimezone: utc
  serviceMonitor:
    enabled: true
  sidecar:
    dashboards:
      provider:
        allowUiUpdates: true
  admin:
    existingSecret: grafana-admin-credentials
    userKey: GF_SECURITY_ADMIN_USER
    passwordKey: GF_SECURITY_ADMIN_PASSWORD

kubeApiServer:
  enabled: true

kubelet:
  enabled: true
  serviceMonitor:
    metricRelabelings:
      - action: replace
        sourceLabels:
          - node
        targetLabel: instance

kubeControllerManager:
  enabled: true
  endpoints: # ips of master node 
    - 10.10.10.30
    - 10.10.10.31
    - 10.10.10.32

coreDns:
  enabled: true

kubeDns:
  enabled: false

kubeEtcd:
  enabled: true
  endpoints: # ips of master node
    - 10.10.10.30
    - 10.10.10.31
    - 10.10.10.32

  service:
    enabled: true
    port: 2381
    targetPort: 2381

kubeScheduler:
  enabled: true
  endpoints: # ips of master node
    - 10.10.10.30
    - 10.10.10.31
    - 10.10.10.32


kubeProxy:
  enabled: true
  endpoints: # ips of master node
    - 10.10.10.30
    - 10.10.10.31
    - 10.10.10.32

kubeStateMetrics:
  enabled: true

kube-state-metrics:
  fullnameOverride: kube-state-metrics
  selfMonitor:
    enabled: true
  prometheus:
    monitor:
      enabled: true
      relabelings:
        - action: replace
          regex: (.*)
          replacement: $1
          sourceLabels:
            - __meta_kubernetes_pod_node_name
          targetLabel: kubernetes_node

nodeExporter:
  enabled: true
  serviceMonitor:
    relabelings:
      - action: replace
        regex: (.*)
        replacement: $1
        sourceLabels:
          - __meta_kubernetes_pod_node_name
        targetLabel: kubernetes_node

prometheus-node-exporter:
  fullnameOverride: node-exporter
  podLabels:
    jobLabel: node-exporter
  extraArgs:
    - --collector.filesystem.mount-points-exclude=^/(dev|proc|sys|var/lib/docker/.+|var/lib/kubelet/.+)($|/)
    - --collector.filesystem.fs-types-exclude=^(autofs|binfmt_misc|bpf|cgroup2?|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|iso9660|mqueue|nsfs|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|selinuxfs|squashfs|sysfs|tracefs)$
  service:
    portName: http-metrics
  prometheus:
    monitor:
      enabled: true
      relabelings:
        - action: replace
          regex: (.*)
          replacement: $1
          sourceLabels:
            - __meta_kubernetes_pod_node_name
          targetLabel: kubernetes_node
  resources:
    requests:
      memory: 512Mi
      cpu: 250m
    limits:
      memory: 1024Mi

prometheusOperator:
  enabled: true
  prometheusConfigReloader:
    resources:
      requests:
        cpu: 200m
        memory: 50Mi
      limits:
        memory: 100Mi

prometheus:
  enabled: true
  prometheusSpec:
    replicas: 1
    replicaExternalLabelName: "replica"
    ruleSelectorNilUsesHelmValues: false
    serviceMonitorSelectorNilUsesHelmValues: false
    podMonitorSelectorNilUsesHelmValues: false
    probeSelectorNilUsesHelmValues: false
    retention: 6h
    enableAdminAPI: true
    walCompression: true

thanosRuler:
  enabled: false

Change the ips of the master node in the values file

Install Grafana & Prometheus with Helm

helm install prometheus prometheus-community/kube-prometheus-stack -n monitoring -f values.yaml

Error loading config

If you get the following error:

Error loading config (--config.file=/etc/prometheus/config_out/prometheus.env.yaml)

You can fix it opening a shell in the prometheus pod and running the following command:

cd /etc/prometheus/config_out
vi prometheus.env.yaml

Then, add the following content:

global:
  scrape_interval: 15s
  evaluation_interval: 15s

The pod should restart and the error should be fixed.

Access Grafana

Create an ingress for Grafana

Create a file named grafana-ingress.yml with the following content:

apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute 
metadata:
  name: grafana-ingress
  namespace: monitoring
  annotations: 
    kubernetes.io/ingress.class: traefik-external
spec:
  entryPoints:
    - websecure
  routes:
    - match: Host(`www.gf.your-domain.com`) # Change this to your domain
      kind: Rule
      services:
        - name: grafana
          port: 80
    - match: Host(`gf.your-domain.com`) # Change this to your domain
      kind: Rule
      services:
        - name: grafana
          port: 80
      middlewares:
        - name: default-headers
  tls:
    secretName: tls # Change this to your tls secret

Apply the ingress

kubectl apply -f grafana-ingress.yml

After adding an entry in your DNS server, you should be able to access Grafana at https://gf.your-domain.com

Conclusion

In this post, we saw how to monitor a Kubernetes cluster using Grafana. We installed Grafana and Prometheus using Helm and created an ingress to access Grafana. We also saw how to fix the error loading config in Prometheus. In a next post, we will see how to create dashboards and alerts in Grafana to monitor the Kubernetes cluster and other resources.

Joan Larcher

Monitoring with Grafana