Kube-Prometheus-Stack

Overview

The source code and default configuration of the Building Block is available in our Repository code.sysEleven.de. Infos on release notes and new features please follow Release notes kube-prometheus-stack

Kube-Prometheus

Kube-Prometheus is the CCNF monitoring system and time series database. It comes with an Alertmanager and Grafana as a WebUI component to visualize the collected metrics.

Before you deploy the Kube-Prometheus Building Block, you should consider the following recommendations.

cpu (vCPU) Memory
6 CPU + 0.5 per node 6320MiB + 50MiB per node

No further activities need to be carried out in advance.

Adding the Building Block

Add the directory kube-prometheus-stack to your control repository. Add a .gitlab-ci.yml to the directory with the following content:

include:
  - project: syseleven/building-blocks/helmfiles/kube-prometheus-stack
    file: JobDevelopment.yaml
    ref: 34.0.0
  - project: syseleven/building-blocks/helmfiles/kube-prometheus-stack
    file: JobStaging.yaml
    ref: 34.0.0
  - project: syseleven/building-blocks/helmfiles/kube-prometheus-stack
    file: JobProduction.yaml
    ref: 34.0.0

Remove environments you are not using by removing their include.

Configuration

Required configuration

You have to set grafana.adminPassword. If you don’t, the Grafana admin password changes on each CI run.

Configure it in values-kube-prometheus-stack-${ENVIRONMENT}.yaml where ${ENVIRONMENT} is replaced with the environment to configure.

grafana:
  adminPassword: highly-secure-production-password

Configuring alertmanager

When adding a receiver, you need to copy the null receiver config into your own cluster configuration as well.
If you customise the configuration of a BB with values.yaml files, you have to be careful that you cannot simply extend lists. Helm cannot merge lists. Therefore, the existing list plus the new entry must be added to the customised configuration.

alertmanager:
  config:
    receivers:
      - name: "null"  # Add this to your config as well
      - name: myotherreceiver
        webhook_configs:
          - send_resolved: true
            url: https://myurl

Keep in mind that you need to configure the Building Blocks based on your demands through the values-*.yaml files !

With the kube-prometheus-stack building block we already deploy an Alertmanager for you. In combination with Prometheus and the default rules a lot of base metrics are monitored and alerts for them are created when something goes wrong.
In the default settings alerts are only visible in the webinterface of the alertmanager. Most of the time it is desirable to send those alerts to your operations team, your on-call engineer or someone else. To achieve that you can configure the alertmanager in values-kube-prometheus-stack.yaml or values-kube-prometheus-stack-$ENVIRONMENT.yaml. There you can use the alertmanager.config.receivers setting to set all available options supported by alertmanager. As the building block is using the upstream kube-prometheus-stack helm chart the alertmanager.config section of their values.yaml is a good starting point.

Normally no alert should be triggered, but there is an exception! In the default configuration the Prometheus operator creates a watchdog alarm which is always triggered. This alarm can be used to check if your monitoring is working. If it stops triggering, either Prometheus or the alert manager is not working as expected.
You can set up an external alerting provider (or a webhook hosted by you) to notify you when the alert stops triggering.

Forward Alertmanager to localhost

kubectl port-forward -n syseleven-kube-prometheus-stack alertmanager-kube-prometheus-stack-alertmanager-0 9093 

Send test alerts

You can send test alerts to an Alertmanager instance to test alerting.

kubectl port-forward -n syseleven-kube-prometheus-stack alertmanager-kube-prometheus-stack-alertmanager-0 9093 &

curl -H "Content-Type: application/json" -d '[{"labels":{"alertname":"myalert"}}]' localhost:9093/api/v1/alerts

Adding alert rules

If you want to configure additional alerts, you need to add PrometheusRules resources. The example below will generate an alert if the ServiceMonitor with the name your-service-monitor-name has less than one target up.

Deploy those resources together with your application in e.g. a Helm chart so that they are bundled together and nicely versioned.

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  labels:
    prometheus: kube-prometheus-stack-prometheus
    role: alert-rules
  name: your-application-name
spec:
  groups:
    - name: "your-application-name.rules"
      rules:
        - alert: PodDown
          for: 1m
          expr: sum(up{job="your-service-monitor-name"}) < 1 or absent(up{job="your-service-monitor-name"})
          annotations:
            message: The deployment has less than 1 pod running.

Adding Grafana dashboards

If you want to deploy additional Grafana dashboards, we recommend adding a ConfigMap or Secret with the label grafana_dashboard=1.
The ConfigMap or Secret does not have to be in the same namespace as Grafana and can be deployed together with your application or service.

Although you can create dashboards in the Grafana user interface, they are not persistent in the default configuration. You can enable persistence, but then you also have to set the number of replicas to 1, which has the disadvantage of losing availability.
We strongly recommend not to do this or to set up an external database for storing dashboards as described in the Grafana documentation.

apiVersion: v1
kind: ConfigMap
metadata:
  labels:
    grafana_dashboard: "1"
  name: new-dashboard-configmap
data:
  new-dashboard.json: |-
    {
      "id": null,
      "uid": "cLV5GDCkz",
      "title": "New dashboard",
      "tags": [],
      "style": "dark",
      "timezone": "browser",
      "editable": true,
      "hideControls": false,
      "graphTooltip": 1,
      "panels": [],
      "time": {
        "from": "now-6h",
        "to": "now"
      },
      "timepicker": {
        "time_options": [],
        "refresh_intervals": []
      },
      "templating": {
        "list": []
      },
      "annotations": {
        "list": []
      },
      "refresh": "5s",
      "schemaVersion": 17,
      "version": 0,
      "links": []
    }

For more possibilities to deploy Grafana dashboards, have a look at the upstream helm chart repo.

Make Grafana available via an ingress

Requirements

  • Ingress Controller, for example with the SysEleven Building Block
  • Optional: Building Blocks cert-manager and external-dns

Configuring Ingress

Add to the matching values file:

grafana:
  ingress:
    enabled: true
    annotations:
      cert-manager.io/cluster-issuer: "letsencrypt-production"
    hosts:
      - grafana.example.com
    tls:
      - secretName: grafana.example.com-tls
        hosts:
          - grafana.example.com
  grafana.ini:
    server:
      root_url: https://grafana.example.com

Adding SysEleven exporter

The SysEleven exporter allows to export your quota and current usage. To get this working, the following configuration has to be added:

syseleven-kube-prometheus-stack/values-syseleven-exporter.yaml

openstack:
  projectId: 1234567890abcdefghi

All other authentication information is automatically detected. Making the project id auto-detectable is currently tracked at k8s-5339.

Monitoring

The building blocks (in this case kube-prometheus-stack itself) come with a set of predefined alert rules and Grafana dashboards. Alert rules and dashboards are synchronized by kubernetes-mixin.
This includes basic monitoring of the local Kubernetes cluster itself (e.g. resource limits/requirements, pod crash loops, API errors, ...).
SysEleven may add additional alertrules and dashboards to each building block.

Additional alertrules

As an example, alert rules have been created in the directory kube-prometheus-stack-extension/templates/alerts.
Add your own alert rules to the existing files or create your own files in the directory with your alert rules.

Each alert rule should include a meaningful description as an annotation.

Additional Grafana dashboards

  • Alerts
    • An overview of firing prometheus alerts
  • Cluster Capacity
    • An overview of the capacity of the local kubernetes cluster

Scale prometheus persistent volumes

# Set replicas to 0 and PVC template to new value
kubectl edit prometheuses

# Patch PVC (e.g. 100Gi)
kubectl patch pvc prometheus-kube-prometheus-stack-prometheus-db-prometheus-kube-prometheus-stack-prometheus-0 --namespace syseleven-managed-kube-prometheus-stack -p '{"spec":{"resources":{"requests":{"storage":"100Gi"}}}}' --type=merge
kubectl patch pvc prometheus-kube-prometheus-stack-prometheus-db-prometheus-kube-prometheus-stack-prometheus-1 --namespace syseleven-managed-kube-prometheus-stack -p '{"spec":{"resources":{"requests":{"storage":"100Gi"}}}}' --type=merge

# Verify pending resize
kubectl describe pvc prometheus-kube-prometheus-stack-prometheus-db-prometheus-kube-prometheus-stack-prometheus-0

# Scale replicas back
kubectl edit prometheuses

# Commit the changes to the values*.yaml files.

Scaling setup

This building block consists of multiple components. Each of the components can and must be scaled individually.

Scaling prometheus

  • prometheus-operator
    • Usually should only be run with replicas=1
    • Requests/limits for CPU/memory can be adjusted
  • prometheus-node-exporter
    • Runs as DaemonSet on each node, so no further replica scaling needed
    • Requests/limits for CPU/memory can be adjusted
  • kube-state-metrics
    • Usually should only be run with replicas=1
    • Requests/limits for CPU/memory can be adjusted
  • prometheus
    • Also see Prometheus High Availability for upstream documentation
    • Replicas can be increased, though each replica will be a dedicated prometheus that scrapes everything
    • Requests/limits for CPU/memory can be adjusted

Scaling alertmanager

Also see Alertmanager High Availability for upstream documentation.

  • Replicas can be increased to achieve higher availability
  • New replicas will automatically join the alertmanager cluster
  • Requests/limits for CPU/memory can be adjusted

Scaling grafana

Also see Set up Grafana for high availability for upstream documentation.

  • Replicas
    • Dashboards - when increasing replicas for grafana it is important to think about where you want the dashboards/user configuration to come from. By default, we run with replicas=2 but do not save dashboards locally. So after respawn the dashboards are gone if they are not automated by supplying them as ConfigMap.
    • Sessions - when increasing replicas for grafana it is important to think about where you want to store user sessions. We configuresticky sessions by default, so each user is mapped to a specific replica until the replica is not available anymore.
  • Requests/limits for CPU/memory can be adjusted

Release-Notes

Please find more infos on release notes and new features Release notes Prometheus-Stack

Known Issues

For Kubernetes <= 1.24, when upgrading the building block from v31 to v32 and above, there will be errors in the diff stage of the CI pipeline.
However, neither Kubernetes >= 1.25 nor a fresh installation of the building block in respective versions is affected by this issue.
While updating to Kubernetes >= 1.25 is generally advised, an alternative workaround could be to remove the CRD prometheuses.monitoring.coreos.com before updating to the new version.