Monitoring📜

Overview📜

Monitoring in Bigbang is deployed using the upstream chart kube-prometheus-stack

Installs the kube-prometheus stack, a collection of Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

graph LR
  subgraph "Monitoring"
    alertmanagerpods("AlertManager Pod(s)") --> monitoringpods("Monitoring Pod(s)")
    alertmanagerservice{{AlertManager Service}} --> alertmanagerpods("AlertManager Pod(s)")
    alertmanagersvcmonitor("Service Monitor") --"Metrics Port"--> alertmanagerservice
    Prometheus --> alertmanagersvcmonitor("Service Monitor")
    grafanapods("Grafana Pod(s)") --> monitoringpods("Monitoring Pod(s)")
    grafanaservice{{Grafana Service}} --> grafanapods("Grafana Pod(s)")
    grafanasvcmonitor("Service Monitor") --"Metrics Port"--> grafanaservice
    Prometheus --> grafanasvcmonitor("Service Monitor")
    nodeexporterpods("Node-Exporter Pod(s)") --> monitoringpods("Monitoring Pod(s)")
    nodeexporterservice{{Node-Exporter Service}} -->  nodeexporterpods("Node-Exporter Pod(s)")
    nodeexportersvcmonitor("Service Monitor") --"Metrics Port"-->  nodeexporterservice
    Prometheus --> nodeexportersvcmonitor("Service Monitor")
    kubestatemetricspods("Kube-State-Metrics Pod(s)") --> monitoringpods("Monitoring Pod(s)")
    kubestatemetricsservice{{Kube-State-Metrics Service}} -->  kubestatemetricspods("Kube-State-Metrics Pod(s)")
    kubestatemetricssvcmonitor("Service Monitor") --"Metrics Port"-->  kubestatemetricsservice
    Prometheus --> kubestatemetricssvcmonitor("Service Monitor")
    Prometheus --> prometheussvcmonitor("Service Monitor")
    prometheussvcmonitor("Service Monitor") --"Metrics Port"--> prmetheussservice{{Prometheus Service}}
    prmetheussservice{{Prometheus Service}} --> Prometheus
    PromOperator ---|Manages/Creates| Prometheus
    VirtualServices --"App Port"--> alertmanagerservice
    VirtualServices --"App Port"--> grafanaservice
    VirtualServices --"App Port"--> Prometheus
  end
  subgraph "Logging"
    monitoringpods("Monitoring Pod(s)") ---|Logs|fluent(Fluentbit) --> logging-ek-es-http
    logging-ek-es-http{{Elastic Service<br />logging-ek-es-http}} --> elastic[(Elastic Storage)]
  end
  subgraph "Istio-system (Ingress)"
    ig(Ingress Gateway, Gateway) --> VirtualServices
  end

Big Bang Touchpoints📜

UI📜

Alertmanager, Prometheus and Grafana within the monitoring Package have UIs that are accessible and configurable. By default they are externally available behind an Istio installation.

Storage📜

Alertmanager📜

Persistent storage values for Alert Manager can be set/modified in the Big Bang chart:

monitoring:
  values:
    alertmanager:
      alertmanagerSpec:
        storage:
          volumeClaimTemplate:
            spec:
                storageClassName: 
                accessModes: ["ReadWriteOnce"]
                resources:
                  requests:
                    storage: 50Gi
              selector: {}

Prometheus-Operator📜

Persistent storage values for Prometheus-Operator can be set/modified in the Big Bang chart:

monitoring:
  values:
    prometheus:
      prometheusSpec:
        storageSpec:
          volumeClaimTemplate:
            spec:
                storageClassName: 
                accessModes: ["ReadWriteOnce"]
                resources:
                  requests:
                    storage: 50Gi
              selector: {}

Grafana📜

Persistent storage values for Grafana can be set/modified in the Big Bang chart:

monitoring:
  values:
    grafana:
      persistence:
        type: pvc
        enabled: false
        # storageClassName: default
        accessModes:
          - ReadWriteOnce
        size: 10Gi
        # annotations: {}
        finalizers:
          - kubernetes.io/pvc-protection
        # selectorLabels: {}
        # subPath: ""
        # existingClaim:

Logging📜

Within the kube-prometheus-stack chart, you can customize both the LogFormat and LogLevel for the following components: Note: within Big Bang, logs are captured by fluentbit and shipped to elastic by default.

Prometheus-Operator📜

LogFormat and LogLevel can be set for Prometheus-Operator via the following values in the Big Bang chart:

monitoring:
  values:
    prometheusOperator:
      logFormat: logfmt
      logLevel: info

Prometheus📜

LogFormat and LogLevel can be set for Prometheus via the following values in the Big Bang chart:

monitoring:
  values:
    prometheus:
      prometheusSpec:
         logFormat: logfmt
         logLevel: info

Alertmanager📜

LogFormat and LogLevel can be set for Alertmanager via the following values in the Big Bang chart:

monitoring:
  values:
    alertmanager:
      alertmanagerSpec:
        logFormat: logfmt
        logLevel: info

Grafana📜

LogLevel can be set for Grafana via the following values in the Big Bang chart:

monitoring:
  values:
    grafana:
      grafana.ini:
        log:
          mode: console

Single Sign on (SSO)📜

SSO can be configured for monitoring through Authservice, more info is included in the following documentation: Monitoring SSO Integration

Monitoring📜

Monitoring deployment has serviceMonitors enabled for

core-dns
kube-api-server
kube-controller-manager
kube-dns
kube-etcd
kube-proxy
kube-scheduler
kube-state-metrics
kubelet
node-exporter
alert manager
grafana
prometheus
prometheus-operator
node-exporter

Note: Other packages are responsible for deploying Service Monitors for their components as needed.

HA📜

Support for Prometheus and other apps within the package are being researched and section will be updated:

Alertmanager📜

High Availability can be accomplished by increasing the number of replicas for the deployment of Alertmanager;

monitoring:
  values:
    alertmanager:
      alertmanagerSpec:
        replicas: 3

Prometheus📜

High Availability can be accomplished by increasing the number of replicas for the deployment of Prometheus. Thanos must also be installed in the same namespace as the monitoring package in order for data to replicate across pods. An example of a Thanos object storage config using minIO is located here. Thanos also supports cloud object storage endpoints.

monitoring:
  values:
    thanosRuler:
      enabled: true
    prometheusOperator:
      clusterDomain: "cluster.local"
    prometheus:
      thanosService:
        enabled: true
      thanosServiceMonitor:
        enabled: true
      prometheusSpec:
        replicas: 3
        thanos:
          baseImage: registry1.dso.mil/ironbank/opensource/thanos/thanos
          version: v0.29.0
          objectStorageConfig:
            key: objstore.yml
            name: thanos-objstore-secret

Grafana📜

High Availability can be accomplished by increasing the number of replicas for the deployment of Grafana and configuring an external database connection (postgresql/mysql) so users and dashboard information can be centrally located for the replicas to have a source of truth. See Grafana’s upstream documentation

monitoring:
  values:
    grafana:
      replicas: 3
      grafana.ini:
        ...
    database:
      type: [postgres|mysql]
      host: external-db:5432
      name: grafana
      user: ""
      password: ""

Dependency Packages📜

When deploying BigBang, monitoring depends on gatekeeper/kyverno and istio being installed prior.

  {{- if or .Values.gatekeeper.enabled .Values.istiod.enabled .Values.kyvernoPolicies.enabled }}
  dependsOn:
  {{- if .Values.istiod.enabled }}
    - name: istio
      namespace: {{ .Release.Namespace }}
  {{- end }}
  {{- if .Values.gatekeeper.enabled }}
    - name: gatekeeper
      namespace: {{ .Release.Namespace }}
  {{- end }}
  {{- if .Values.kyvernoPolicies.enabled }}
    - name: kyverno-policies
      namespace: {{ .Release.Namespace }}
  {{- end }}
  {{- end }}