Kube-Prometheus-Stack

The source code and default configuration of the Building Block is available in our GitLab.

Adding the Building Block

Adding the kube-prometheus-stack Building Block to your cluster for the first time needs a little extra step because the Prometheus Operator uses custom resource definitions (CRDs). Other Building Blocks create instances of those resources and the Prometheus Operator discovers those to configure the whole monitoring stack. It is important that the CRDs exist in the cluster before any other Building Block tries to create an instance. To learn more about the Prometheus Operator CRDs we recommend the upstream documentation.

The easiest and recommended way to achieve this is to apply the CRDs to the cluster before you deploy anything else, this can be done manually or in your CI/CD pipeline:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm pull --untar --untardir /tmp/kube-prometheus-stack --version 13.12.0 prometheus-community/kube-prometheus-stack
kubectl apply -f /tmp/kube-prometheus-stack/kube-prometheus-stack/crds/
# rm -r /tmp/kube-prometheus-stack

An alternative approach is to ensure that the kube-prometheus-stack Building Block is deployed as the first Building Block in your cluster, e.g. by pushing a .gitlab-ci.yml file in the beginnnig which only deploys the kube-prometheus-stack Building Block. After the CI/CD pipeline successfully deployed the Building Block you can add all other Building Blocks. A third way is to add a separate stage in your pipeline that install the Building Block first, but this makes your pipeline a bit slower.

All three approaches work, but for simplicity we recommend the first one. Once the Building Block is installed future updates of the CRDs will be automatically handled by the SysEleven Building Block.

Add the directory kube-prometheus-stack to your control repository. Add a .gitlab-ci.yml to the directory with the following content:

include:
  - project: syseleven/building-blocks/helmfiles/kube-prometheus-stack
    file: JobDevelopment.yaml
    ref: 8.0.1
  - project: syseleven/building-blocks/helmfiles/kube-prometheus-stack
    file: JobStaging.yaml
    ref: 8.0.1
  - project: syseleven/building-blocks/helmfiles/kube-prometheus-stack
    file: JobProduction.yaml
    ref: 8.0.1

Remove environments you are not using by removing their include.

Configuration

Required configuration

You have to set grafana.adminPassword. If you don’t, the Grafana admin password changes on each CI run.

Configure it in values-kube-prometheus-stack-${ENVIRONMENT}.yaml where ${ENVIRONMENT} is replaced with the environment to configure.

grafana:
  adminPassword: highly-secure-production-password

Configuring alertmanager

When adding an additional receiver, you need to copy the null receiver config into your own cluster configuration as well. Helm does not merge list values and you won't have the needed null-receiver configured.

alertmanager:
  config:
    receivers:
    - name: "null"  # Add this to your config as well
    - name: myotherreceiver
      webhook_configs:
      - send_resolved: true
        url: https://myurl

With the kube-prometheus-stack building block we already deploy an alertmanager for you. In combination with Prometheus and the default rules a lot of base metrics are monitored and alerts for them are created when something goes wrong. Though, in the default settings those alerts are only visible in the webinterface of the alertmanager. Most of the time it is desirable to send those alerts to your operations team, your on-call engineer or someone else. To achieve that you can configure the alertmanager in values-kube-prometheus-stack.yaml or values-kube-prometheus-stack-$ENVIRONMENT.yaml. There you can use the alertmanager.config.receivers setting to set all available options supported by alertmanager. As the building block is using the upstream kube-prometheus-stack helm chart the alertmanager.config section of their values.yaml is a good starting point.

Normally, no alert should be firing, but there is one exception. In the default configuration the Prometheus operator creates a Watchdog alert, which is always firing. This alert can be used to verify that your monitoring works, if it stops firing either the Prometheus or the Alertmanager is not working as expected. You can setup an external alerting provider (or a webhook hosted by you) to notify you if the alert stops firing.

Forward alertmanager to localhost

kubectl port-forward -n syseleven-managed-kube-prometheus-stack alertmanager-kube-prometheus-stack-alertmanager-0 9093

Send test alerts

You can send test alerts to a alertmanager instance to test alerting.

kubectl port-forward -n syseleven-managed-kube-prometheus-stack alertmanager-kube-prometheus-stack-alertmanager-0 9093 &

curl -H "Content-Type: application/json" -d '[{"labels":{"alertname":"myalert"}}]' localhost:9093/api/v1/alerts

Adding alert rules

If you want to configure additional alerts, you need to add PrometheusRules resources. The example below will generate an alert if the ServiceMonitor with the name your-service-monitor-name has less than one target up.

Deploy those resources together with your application in e.g. a Helm chart so that they are bundled together and nicely versioned.

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  labels:
    prometheus: kube-prometheus-stack-prometheus
    role: alert-rules
  name: your-application-name
spec:
  groups:
  - name: "your-application-name.rules"
    rules:
    - alert: PodDown
      for: 1m
      expr: sum(up{job="your-service-monitor-name"}) < 1 or absent(up{job="your-service-monitor-name"})
      annotations:
        message: The deployment has less than 1 pod running.

Adding Grafana dashboards

If you want to deploy additional Grafana dashboards, our recommendation is, to add a ConfigMap or Secret which has a label grafana_dashboard=1. The ConfigMap or Secret does not have to be in the same namespace as Grafana and you can deploy it together with your application or service.

Although you can create dashboards in the UI of Grafana, they are not persistent in per default configuration. You can enable persistence, but then you must set also the replica count to 1 with the disadvantage of loosing availability and we strongly recommend against doing so or you set-up a external database to store dashboards as described in the Grafana documentation.

apiVersion: v1
kind: ConfigMap
metadata:
  labels:
    grafana_dashboard: "1"
  name: new-dashboard-configmap
data:
  new-dashboard.json: |-
    {
      "id": null,
      "uid": "cLV5GDCkz",
      "title": "New dashboard",
      "tags": [],
      "style": "dark",
      "timezone": "browser",
      "editable": true,
      "hideControls": false,
      "graphTooltip": 1,
      "panels": [],
      "time": {
        "from": "now-6h",
        "to": "now"
      },
      "timepicker": {
        "time_options": [],
        "refresh_intervals": []
      },
      "templating": {
        "list": []
      },
      "annotations": {
        "list": []
      },
      "refresh": "5s",
      "schemaVersion": 17,
      "version": 0,
      "links": []
    }

For more possibilities to deploy Grafana dashboards, have a look at the upstream helm chart repo.

Make Grafana available via an ingress

Requirements

  • Ingress Controller, for example with the SysEleven Building Block
  • Optional: Building Blocks cert-manager and external-dns

Configuring Ingress

Add to the matching values file:

grafana:
  ingress:
    enabled: true
    annotations:
      cert-manager.io/cluster-issuer: "letsencrypt-production"
    hosts:
      - grafana.example.com
    tls:
      - secretName: grafana.example.com-tls
        hosts:
          - grafana.example.com
  grafana.ini:
    server:
      root_url: https://grafana.example.com

Adding SysEleven exporter

The SysEleven exporter allows to export your quota and current usage. To get this working, the following configuration has to be added:

syseleven-kube-prometheus-stack/values-syseleven-exporter.yaml

openstack:
  projectId: 1234567890abcdefghi

All other authentication information is automatically detected. Making the project id auto-detectable is currently tracked at k8s-5339.

Monitoring

The building blocks (in this case kube-prometheus-stack itself) ships with a lot of predefined alertrules and Grafana dashboards. Alertrules and dashboards are synced from kubernetes-mixin. This includes basic monitoring of the local kubernetes cluster itself (e.g. resource limits/requests, pod crash loops, API errors, ...)

SysEleven may add additional alertrules and dashboards to each building block.

Additional alertrules

  • None

Additional Grafana dashboards

  • Alerts
    • An overview of firing prometheus alerts
  • Cluster Capacity
    • An overview of the capacity of the local kubernetes cluster

Scale prometheus persistent volumes

# Set replicas to 0 and PVC template to new value
kubectl edit prometheuses

# Patch PVC (e.g. 100Gi)
kubectl patch pvc prometheus-kube-prometheus-stack-prometheus-db-prometheus-kube-prometheus-stack-prometheus-0 --namespace syseleven-managed-kube-prometheus-stack -p '{"spec":{"resources":{"requests":{"storage":"100Gi"}}}}' --type=merge
kubectl patch pvc prometheus-kube-prometheus-stack-prometheus-db-prometheus-kube-prometheus-stack-prometheus-1 --namespace syseleven-managed-kube-prometheus-stack -p '{"spec":{"resources":{"requests":{"storage":"100Gi"}}}}' --type=merge

# Verify pending resize
kubectl describe pvc prometheus-kube-prometheus-stack-prometheus-db-prometheus-kube-prometheus-stack-prometheus-0

# Scale replicas back
kubectl edit prometheuses

# Commit the changes to the values*.yaml files.

Scaling setup

This building block consists of multiple components. Each of the components can and must be scaled individually.

Scaling prometheus

  • prometheus-operator
    • Usually should only be run with replicas=1
    • Requests/limits for CPU/memory can be adjusted
  • prometheus-node-exporter
    • Runs as DaemonSet on each node, so no further replica scaling needed
    • Requests/limits for CPU/memory can be adjusted
  • kube-state-metrics
    • Usually should only be run with replicas=1
    • Requests/limits for CPU/memory can be adjusted
  • prometheus
    • Also see Prometheus High Availability for upstream documentation
    • Replicas can be increased, though each replica will be a dedicated prometheus that scrapes everything
    • Requests/limits for CPU/memory can be adjusted

Scaling alertmanager

Also see Alertmanager High Availability for upstream documentation.

  • Replicas can be increased to achieve higher availability
  • New replicas will automatically join the alertmanager cluster
  • Requests/limits for CPU/memory can be adjusted

Scaling grafana

Also see Set up Grafana for high availability for upstream documentation.

  • Replicas
    • Dashboards - when increasing replicas for grafana it is important to think about where you want the dashboards/user configuration to come from. By default we run with replicas=2 but do not save dashboards locally. So after respawn the dashboards are gone if they are not automated by supplying them as ConfigMap.
    • Sessions - when increasing replicas for grafana is is important to think about where you want to store user sessions. We default to configuring sticky sessions so each user is mapped to a specific replica until the replica is not available.
  • Requests/limits for CPU/memory can be adjusted