The source code and default configuration of the Building Block is available in our Repository code.sysEleven.de. Infos on release notes and new features please follow Release notes kube-prometheus-stack
Kube-Prometheus is the CCNF monitoring system and time series database. It comes with an Alertmanager and Grafana as a WebUI component to visualize the collected metrics.
Before you deploy the Kube-Prometheus Building Block, you should consider the following recommendations.
cpu (vCPU) | Memory |
---|---|
6 CPU + 0.5 per node | 6320MiB + 50MiB per node |
No further activities need to be carried out in advance.
Add the directory kube-prometheus-stack
to your control repository. Add a .gitlab-ci.yml
to the directory with the following content:
include:
- project: syseleven/building-blocks/helmfiles/kube-prometheus-stack
file: JobDevelopment.yaml
ref: 38.0.1
- project: syseleven/building-blocks/helmfiles/kube-prometheus-stack
file: JobStaging.yaml
ref: 38.0.1
- project: syseleven/building-blocks/helmfiles/kube-prometheus-stack
file: JobProduction.yaml
ref: 38.0.1
Remove environments you are not using by removing their include.
You have to set grafana.adminPassword
. If you don’t, the Grafana admin password changes on each CI run.
Configure it in values-kube-prometheus-stack-${ENVIRONMENT}.yaml
where ${ENVIRONMENT}
is replaced with the environment to configure.
grafana:
adminPassword: highly-secure-production-password
When adding a receiver, you need to copy the null receiver config into your own cluster configuration as well.
If you customise the configuration of a BB with values.yaml files, you have to be careful that you cannot simply extend lists. Helm cannot merge lists. Therefore, the existing list plus the new entry must be added to the customised configuration.
alertmanager:
config:
receivers:
- name: "null" # Add this to your config as well
- name: myotherreceiver
webhook_configs:
- send_resolved: true
url: https://myurl
Keep in mind that you need to configure the Building Blocks based on your demands through the values-*.yaml
files !
With the kube-prometheus-stack building block we already deploy an Alertmanager for you. In combination with Prometheus and the default rules a lot of base metrics are monitored and alerts for them are created when something goes wrong.
In the default settings alerts are only visible in the webinterface of the alertmanager. Most of the time it is desirable to send those alerts to your operations team, your on-call engineer or someone else. To achieve that you can configure the alertmanager in values-kube-prometheus-stack.yaml
or values-kube-prometheus-stack-$ENVIRONMENT.yaml
. There you can use the alertmanager.config.receivers
setting to set all available options supported by alertmanager. As the building block is using the upstream kube-prometheus-stack helm chart the alertmanager.config
section of their
values.yaml
is a good starting point.
Normally no alert should be triggered, but there is an exception! In the default configuration the Prometheus operator creates a watchdog
alarm which is always triggered. This alarm can be used to check if your monitoring is working. If it stops triggering, either Prometheus or the alert manager is not working as expected.
You can set up an external alerting provider (or a webhook hosted by you) to notify you when the alert stops triggering.
kubectl port-forward -n syseleven-kube-prometheus-stack alertmanager-kube-prometheus-stack-alertmanager-0 9093
You can send test alerts to an Alertmanager instance to test alerting.
kubectl port-forward -n syseleven-kube-prometheus-stack alertmanager-kube-prometheus-stack-alertmanager-0 9093 &
curl -H "Content-Type: application/json" -d '[{"labels":{"alertname":"myalert"}}]' localhost:9093/api/v1/alerts
If you want to configure additional alerts, you need to add PrometheusRules resources. The example below will generate an alert if the ServiceMonitor with the name your-service-monitor-name
has less than one target up.
Deploy those resources together with your application in e.g. a Helm chart so that they are bundled together and nicely versioned.
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
labels:
prometheus: kube-prometheus-stack-prometheus
role: alert-rules
name: your-application-name
spec:
groups:
- name: "your-application-name.rules"
rules:
- alert: PodDown
for: 1m
expr: sum(up{job="your-service-monitor-name"}) < 1 or absent(up{job="your-service-monitor-name"})
annotations:
message: The deployment has less than 1 pod running.
If you want to deploy additional Grafana dashboards, we recommend adding a ConfigMap or Secret with the label grafana_dashboard=1.
The ConfigMap or Secret does not have to be in the same namespace as Grafana and can be deployed together with your application or service.
Although you can create dashboards in the Grafana user interface, they are not persistent in the default configuration. You can enable persistence, but then you also have to set the number of replicas to 1, which has the disadvantage of losing availability.
We strongly recommend not to do this or to set up an external database for storing dashboards as described in the Grafana documentation.
apiVersion: v1
kind: ConfigMap
metadata:
labels:
grafana_dashboard: "1"
name: new-dashboard-configmap
data:
new-dashboard.json: |-
{
"id": null,
"uid": "cLV5GDCkz",
"title": "New dashboard",
"tags": [],
"style": "dark",
"timezone": "browser",
"editable": true,
"hideControls": false,
"graphTooltip": 1,
"panels": [],
"time": {
"from": "now-6h",
"to": "now"
},
"timepicker": {
"time_options": [],
"refresh_intervals": []
},
"templating": {
"list": []
},
"annotations": {
"list": []
},
"refresh": "5s",
"schemaVersion": 17,
"version": 0,
"links": []
}
For more possibilities to deploy Grafana dashboards, have a look at the upstream helm chart repo.
Add to the matching values file:
grafana:
ingress:
enabled: true
annotations:
cert-manager.io/cluster-issuer: "letsencrypt-production"
hosts:
- grafana.example.com
tls:
- secretName: grafana.example.com-tls
hosts:
- grafana.example.com
grafana.ini:
server:
root_url: https://grafana.example.com
The SysEleven exporter allows to export your quota and current usage.
In order to enable the Syseleven Exporter chart in the building block, add the following environment variable to the CI definition in the control repository:
.gitlab-ci.yml
variables:
RELEASE_SYSELEVEN_EXPORTER_ENABLED: "true"
More details on how to configure environment variables can be found here.
All other authentication information is automatically detected. Making the project id auto-detectable is currently tracked at k8s-5339.
The building blocks (in this case kube-prometheus-stack itself) come with a set of predefined alert rules and Grafana dashboards. Alert rules and dashboards are synchronized by kubernetes-mixin.
This includes basic monitoring of the local Kubernetes cluster itself (e.g. resource limits/requirements, pod crash loops, API errors, ...).
SysEleven may add additional alertrules and dashboards to each building block.
As an example, alert rules have been created in the directory kube-prometheus-stack-extension/templates/alerts
.
Add your own alert rules to the existing files or create your own files in the directory with your alert rules.
Each alert rule should include a meaningful description as an annotation.
# Set replicas to 0 and PVC template to new value
kubectl edit prometheuses
# Patch PVC (e.g. 100Gi)
kubectl patch pvc prometheus-kube-prometheus-stack-prometheus-db-prometheus-kube-prometheus-stack-prometheus-0 --namespace syseleven-managed-kube-prometheus-stack -p '{"spec":{"resources":{"requests":{"storage":"100Gi"}}}}' --type=merge
kubectl patch pvc prometheus-kube-prometheus-stack-prometheus-db-prometheus-kube-prometheus-stack-prometheus-1 --namespace syseleven-managed-kube-prometheus-stack -p '{"spec":{"resources":{"requests":{"storage":"100Gi"}}}}' --type=merge
# Verify pending resize
kubectl describe pvc prometheus-kube-prometheus-stack-prometheus-db-prometheus-kube-prometheus-stack-prometheus-0
# Scale replicas back
kubectl edit prometheuses
# Commit the changes to the values*.yaml files.
This building block consists of multiple components. Each of the components can and must be scaled individually.
Also see Alertmanager High Availability for upstream documentation.
Also see Set up Grafana for high availability for upstream documentation.
Please find more infos on release notes and new features Release notes Prometheus-Stack
For Kubernetes <= 1.24, when upgrading the building block from v31 to v32 and above, there will be errors in the diff
stage of the CI pipeline.
However, neither Kubernetes >= 1.25 nor a fresh installation of the building block in respective versions is affected by this issue.
While updating to Kubernetes >= 1.25 is generally advised, an alternative workaround could be to remove the CRD prometheuses.monitoring.coreos.com
before updating to the new version.