Monitoring💣
Monitoring packages requires a way to scrape metrics, provide those to data storage, and analyzing the results. Big Bang uses Prometheus and Grafana as the service for monitoring. Most packages offer built-in Prometheus metrics scraping or an add-on that will scrape the metrics. This document will show you how to integrate metrics scraping with Big Bang.
Prerequisites💣
Before integrating with Prometheus, you must identify the following:
- Does the application support metrics exporting for Prometheus. If not, you will need to find a Prometheus exporter to provide this service.
- Does the upstream Helm chart for the application (or exporter) support Prometheus natively? If not, we’ll have to create our own monitoring resources.
Searching the Helm chart for
monitoring.coreos.com
will usually find any resources that support Prometheus - What path and port are used to scrape metrics on the application or exporter?
- What services and/or pods are deployed that should be monitored?
- Is there a pre-existing Grafana dashboard that can be leveraged? If not, we will need to create one.
Integration💣
Placeholder values💣
The package requires placeholder values for whether the monitoring stack (e.g. Prometheus / Grafana) is enabled and what label to use for dashboards. In chart/values.yaml
, add placeholders for these:
serviceMonitor:
enabled: false
## Added by Big Bang
dashboards:
# Namespace to put .json ConfigMap so Grafana sidecar will find it
namespace: ""
# Label to apply to dashboard so Grafana sidecar will find it
label: grafana_dashboard
In this case, we put the values under
serviceMonitor:
since it already exists in the upstream Helm chart. Otherwise, we would createmonitoring:
for the values
Pass down values💣
Big Bang needs to set the placeholders above to the appropriate values. In addition, upstream charts may already have values related to monitoring that need to be set.
In bigbang/templates/podinfo/values.yaml
, add the following to pass down the values from Big Bang to PodInfo.
serviceMonitor:
enabled: {{ .Values.monitoring.enabled }}
dashboards:
namespace: monitoring
label: {{ dig "values" "grafana" "sidecar" "dashboards" "label" "grafana_dashboard" .Values.monitoring }}
Dependency💣
If we plan to scrape metrics from the application with the monitoring stack, we need to make sure the monitoring stack is deployed first so that CRDs are in place before we deploy our resources. To do this, we add a dependsOn
section in the bigbang/templates/podinfo/helmrelease.yaml
file like this:
spec:
{{- if or .Values.istio.enabled .Values.monitoring.enabled }}
dependsOn:
{{- if .Values.istio.enabled }}
- name: istio
namespace: {{ .Release.Namespace }}
{{- end }}
{{- if .Values.monitoring.enabled }}
- name: monitoring
namespace: {{ .Release.Namespace }}
{{- end }}
{{- end }}
We previously had a dependency on Istio, which we leave in place in this example.
Service Monitor💣
If the upstream Helm chart provides you with a ServiceMonitor
and Service
for scraping metrics, verify that there is a conditional around each one to only deploy them if monitoring is enabled (e.g. {{- if .Values.serviceMonitor.enabled }}
or {{- if .Values.monitoring.enabled }}
)
If the upstream chart does not provide a ServiceMonitor
and Service
for scraping metrics, you will need to create one yourself using the Prometheus instructions for running an exporter.
Any new resources should be placed in the
chart/templates/bigbang
folder.
RBAC💣
If the application is using Role Based Access Control (RBAC), you may need to create rules for Prometheus to access the metrics. Check the upstream Helm chart to make sure this is already done for you, or implement a new ClusterRole
and ClusterRoleBinding
into the chart following the Prometheus RBAC documentation
Alerts💣
Alerting rules allow you to define alert conditions based on Prometheus expression language expressions and to send notifications about firing alerts to an external service. By creating a PrometheusRule
, you can configure these conditions for your application.
You will need to decide what aspects of the application should be monitored and alerted on to detect potential failures in the service it provides. Some examples include:
- Low disk space on a persistent volume
- Loss of connectivity to external resources
- Metrics cannot be scraped
- Operator down
- Pods in CrashLookBackOff state
- Pods restarting too often
- Latency too high
- Web application returns 4xx or 5xx too often
- No log messages for too long
- Pod memory too close to limit
All of these rules must be based on PromQL queries using the application’s metrics.
Once you have identified what you want to monitor, create Prometheus Alerting Rules and add them to a PrometheusRule resource. The rule should reside in the chart/templates/bigbang
folder and only be deployed if monitoring is enabled.
Some examples of rules can be found in the Big Bang monitoring chart.
Dashboards💣
Dashboards are important for administrators to understand what is happening in your package and when action needs to be taken.
-
Create a dashboard
Some packages or maintainers provide Grafana dashboards upstream, otherwise you can search Grafana’s Dashboard Repository for a relevant Dashboard. If there is already a ready-made Grafana dashboard for your package provided upstream, you should use Kpt to sync it into monitoring package (for example flux provides the JSON dashboards in their upstream repo):
# There isn't a dashboard for podinfo, so we use flux as an example here kpt pkg get https://github.com/fluxcd/flux2.git//manifests/monitoring/grafana/dashboards@v0.9.1 chart/dashboards/
If you need to create your own dashboard, open Grafana and use
Create > Dashboard
. Add a panel and setup the query to pull custom data from your package or general data about your pods (e.g. container_processes). After you have saved your dashboard in Grafana, useShare (icon) > Export
to save the dashboard to a .json file inchart/dashboards
. You can leave theExport for sharing externally
slider off. -
We will store dashboards in a ConfigMap for Grafana’s sidecar to parse. Create a ConfigMapList in
chart/templates/bigbang/dashboards.yaml
to store all of the dashboards:{{- $pkg := "podinfo" }} {{- $files := .Files.Glob "dashboards/*.json" }} {{- if and .Values.serviceMonitor.enabled $files }} apiVersion: v1 kind: ConfigMapList items: {{- range $path, $fileContents := $files }} {{- $dashboardName := regexReplaceAll "(^.*/)(.*)\\.json$" $path "${2}" }} - apiVersion: v1 kind: ConfigMap metadata: name: {{ printf "%s-%s" $pkg $dashboardName | trunc 63 | trimSuffix "-" }} namespace: {{ default $.Release.Namespace $.Values.serviceMonitor.dashboards.namespace }} labels: {{- if $.Values.serviceMonitor.dashboards.label }} {{ $.Values.serviceMonitor.dashboards.label }}: "1" {{- end }} app: {{ $pkg }}-grafana {{- include (printf "%s.labels" $pkg) $ | nindent 6 }} data: {{ $dashboardName }}.json: {{ $.Files.Get $path | toJson }} {{- end }} {{- end }}
Podinfo’s Helm chart already had a key for monitoring named
serviceMonitor
. You may need to use a different key or create one namedmonitoring
. -
Commit your dashboard files:
git add -A git commit -m "feat: Grafana dashboards" git push
-
If your package is being integrated as a supported application in BigBang, you can add your Dashboards to the core monitoring package.
Create a new folder within
chart/dashboards/APP_NAME
and sync your JSON files for your dashboard(s) there, whether using KPT from a Github repo or individual files from Grafana’s Dashboard Repository.Commit your dashboard files:
git add -A git commit -m "feat: Adding APP_NAME Grafana Dashboards" git push
Any JSON dashboards in the
chart/dashboards
folder automatically get created and imported into the monitoring stack via the Operator.
Validation💣
Setup💣
Monitoring must be enabled in our Big Bang deployment and our application. We do this by setting monitoring.enabled
: true
in bigbang/values.yaml
. Then, deploy Big Bang and your application to your cluster.
# This assumes you have the Big Bang repository cloned in ~/bigbang
helm upgrade -i -n bigbang --create-namespace -f ~/bigbang/chart/values.yaml -f bigbang/values.yaml bigbang ~/bigbang/chart
# Deploy your application on top of Big Bang using the same values
helm upgrade -i -n bigbang --create-namespace -f ~/bigbang/chart/values.yaml -f bigbang/values.yaml bigbang-podinfo bigbang
Don’t forget to also include your Big Bang values for TLS certificates and Iron Bank pull credentials.
# Wait for the cluster to deploy
watch kubectl get gitrepo,hr,po -A
# Test ingress to monitoring stack
curl -L https://prometheus.bigbang.dev
curl -L https://grafana.bigbang.dev
If your application also has an ingress, test it (e.g.
https://podinfo.bigbang.dev
)
Target💣
Open https://prometheus.bigbang.dev
and navigate to Status > Targets
. The State
should show UP
if metrics are being scraped for your package.
There should be one
Endpoint
for every replica pod of your package.
Alert Rules💣
In Prometheus, navigate to Alerts
. Verify that the PrometheusRule
alerting rules show up here and are green.
Dashboards💣
Open https://grafana.bigbang.dev
and navigate to Dashboards > Manage
. Make sure your dashboards are listed. Select each one and verify that it is working correctly.