Skip to content

To upgrade the Loki Package💣

Check the upstream changelog and the helm chart upgrade notes.

Upgrading💣

Find the latest version of the loki image that matches the latest version in IronBank that Renovate has identified from here: https://github.com/grafana/loki/tree/helm-loki-3.2.0/production/helm/loki

Run a KPT update against the main chart folder:

kpt pkg update chart@helm-loki-${chart.version} --strategy alpha-git-patch

Restore all BigBang added templates and tests:

git checkout chart/templates/bigbang/
git checkout chart/deps/loki
git checkout chart/deps/minio
git checkout chart/tests/
git checkout chart/dashboards
git checkout chart/templates/tests

Update dependencies💣

Typically, the --strategy=force-delete-replace is useful to “heavy handidly” bring in dep changes which may need to be reviewed. LATEST_BB_PACKAGE_TAG_VERSION

cd chart/deps
kpt pkg update minio@${LATEST_BB_PACKAGE_TAG_VERSION} --strategy=force-delete-replace
cd ../../

Update dependencies in chart.yml💣

Ensure minio version in chart.yml matches the latest tag version.

Update binaries💣

If needed, log into registry1

helm registry login https://registry1.dso.mil -u ${registry1.username}
helm registry logout https://registry1.dso.mil

Pull assets and commit the binaries as well as the Chart.lock file that was generated.

export HELM_EXPERIMENTAL_OCI=1
helm dependency update ./chart

Update main chart💣

chart/Chart.yaml

  • update loki version and appVersion
  • Ensure Big Bang version suffix is appended to chart version
  • Ensure minio, gluon, and loki dependencies are present and up to date
    version: $VERSION-bb.0
    dependencies:
      - name: minio-instance
        alias: minio
        version: $MINIO_VERSION
        repository: file://./deps/minio
        condition: minio.enabled
      - name: grafana-agent-operator
        alias: grafana-agent-operator
        version: $GRAFANA_VERSION
        repository: https://grafana.github.io/helm-charts
        condition: monitoring.selfMonitoring.grafanaAgent.installOperator
      - name: gluon
        version: $GLUON_VERSION
        repository: "oci://registry.dso.mil/platform-one/big-bang/apps/library-charts/gluon"
    annotations:
      bigbang.dev/applicationVersions: |
        - Loki: $LOKI_APP_VERSION
    

chart/values.yaml

  • Verify renovate correctly tag for the new version.

chart/tests/*

  • add cypress testing configuration and/or tests if necessary.

Modifications made to upstream💣

This is a high-level list of modifications that Big Bang has made to the upstream helm chart. You can use this as as cross-check to make sure that no modifications were lost during the upgrade process.

chart/values.yaml - line 14, Ensure nameOverride is set to logging-loki nameOverride: logging-loki

  • line 17, Ensure fullnameOverride is set to logging-loki fullnameOverride: logging-loki

  • line 21, Ensure private-registry IPS is present:

    imagePullSecrets:
      - name: private-registry
    

  • line 23, update the kubectl image to pull from registry1

      kubectlImage:
        # -- The Docker registry
        registry: registry1.dso.mil/ironbank
        # -- Docker image repository
        repository: opensource/kubernetes/kubectl
        # -- Overrides the image tag whose default is the chart's appVersion
        tag: v1.25.2
    

line 40, Ensure loki.image section points to registry1 image and correct tag

  image:
    # -- The Docker registry
    registry: registry1.dso.mil
    # -- Docker image repository
    repository: ironbank/opensource/grafana/loki
    # -- Overrides the image tag whose default is the chart's appVersion
    tag: X.X.X

  • line 136, Ensure 136 config is present

        ingester:
          chunk_target_size: 196608
          flush_check_period: 5s
          flush_op_timeout: 100m
          lifecycler:
            ring:
              kvstore:
                store: memberlist
              replication_factor: 1
    

  • line 209, Ensure by default auth is disabled

      auth_enabled: false
    

  • line 231, Ensure storage.bucketNames points to loki, loki & loki-admin

      storage:
        bucketNames:
          chunks: loki
          ruler: loki
          admin: loki-admin
    

  • line 283, Ensure storage_config.boltdb_shipper configuration is present

      storage_config:
        boltdb_shipper:
          active_index_directory: /var/loki/boltdb-shipper-active
          cache_location: /var/loki/boltdb-shipper-cache
          cache_ttl: 24h
          shared_store: s3
    

  • line 343 , Ensure enterprise.image is pointed to registry1 image

      image:
        # -- The Docker registry
        registry: registry1.dso.mil
        # -- Docker image repository
        repository: ironbank/grafana/grafana-enterprise-logs
        # -- Overrides the image tag whose default is the chart's appVersion
        tag: vX.X.X
    

  • line 394, Ensure provisioner.enabled is set to false

      provisioner:
        # -- Whether the job should be part of the deployment
        enabled: false
    

  • line 481, Ensure all monitoring sub-components are set to enabled: false Including the added monitoring.enabled value

    monitoring:
      # -- Enable BigBang integration of Monitoring components
      enabled: false
    

  • line 572 ensure monitoring.selfMonitoring.grafanaAgent.installOperator is set to false

  • line 601, Ensure lokiCanary.enabled is set to false

        lokiCanary:
          enabled: false
    

  • line 664, write pod resources set

      resources:
        limits:
          cpu: 300m
          memory: 2Gi
        requests:
          cpu: 300m
          memory: 2Gi
    

  • line 701, ensure at the bottom of the write: block, there is a podDisruptionBudget section

      ## -- Application controller Pod Disruption Budget Configuration
      ## Ref: https://kubernetes.io/docs/tasks/run-application/configure-pdb/
      podDisruptionBudget:
        # -- Number of pods that are available after eviction as number or percentage (eg.: 50%)
        # @default -- `""` (defaults to 0 if not specified)
        minAvailable: ""
        # -- Number of pods that are unavailable after eviction as number or percentage (eg.: 50%).
        ## Has higher precedence over `controller.pdb.minAvailable`
        maxUnavailable: "1"
    

  • line 805, legacyReadTarget set to true to give users time to migrate 2/7/23

      legacyReadTarget: true
    

  • line 819, read pod resources set

      resources:
        limits:
          cpu: 300m
          memory: 2Gi
        requests:
          cpu: 300m
          memory: 2Gi
    

  • line 854, ensure at the bottom of the read: block, there is a podDisruptionBudget section

      ## -- Application controller Pod Disruption Budget Configuration
      ## Ref: https://kubernetes.io/docs/tasks/run-application/configure-pdb/
      podDisruptionBudget:
        # -- Number of pods that are available after eviction as number or percentage (eg.: 50%)
        # @default -- `""` (defaults to 0 if not specified)
        minAvailable: ""
        # -- Number of pods that are unavailable after eviction as number or percentage (eg.: 50%).
        ## Has higher precedence over `controller.pdb.minAvailable`
        maxUnavailable: "1"
    

  • line 931, ensure at the bottom of the backend: block, there is a podDisruptionBudget section

      ## -- Application controller Pod Disruption Budget Configuration
      ## Ref: https://kubernetes.io/docs/tasks/run-application/configure-pdb/
      podDisruptionBudget:
        # -- Number of pods that are available after eviction as number or percentage (eg.: 50%)
        # @default -- `""` (defaults to 0 if not specified)
        minAvailable: ""
        # -- Number of pods that are unavailable after eviction as number or percentage (eg.: 50%).
        ## Has higher precedence over `controller.pdb.minAvailable`
        maxUnavailable: "1"
    

  • line 944, Ensure singleBinary.replicas is set to 1

    singleBinary:
      # -- Number of replicas for the single binary
      replicas: 1
    

  • line 986, set resource requests and limits for singleBinary

      resources:
        limits:
          cpu: 100m
          memory: 256Mi
        requests:
          cpu: 100m
          memory: 256Mi
    

  • line 1071 gateway.enabled set to false by default

  • line 1091, Ensure gateway.image is pointed to registry1 equivalent

      image:
        # -- The Docker registry for the gateway image
        registry: registry1.dso.mil
        # -- The gateway image repository
        repository: ironbank/opensource/nginx/nginx
        # -- The gateway image tag
        tag: X.X.X
    

  • line 1236, ensure at the bottom of the gateway: block, there is a podDisruptionBudget section

      ## -- Application controller Pod Disruption Budget Configuration
      ## Ref: https://kubernetes.io/docs/tasks/run-application/configure-pdb/
      podDisruptionBudget:
        # -- Number of pods that are available after eviction as number or percentage (eg.: 50%)
        # @default -- `""` (defaults to 0 if not specified)
        minAvailable: ""
        # -- Number of pods that are unavailable after eviction as number or percentage (eg.: 50%).
        ## Has higher precedence over `controller.pdb.minAvailable`
        maxUnavailable: "1"
    

  • line 1286 remove minio block added by upstream

      replicas: 1
      # Minio requires 2 to 16 drives for erasure code (drivesPerNode * replicas)
      # https://docs.min.io/docs/minio-erasure-code-quickstart-guide
      # Since we only have 1 replica, that means 2 drives must be used.
      drivesPerNode: 2
      rootUser: enterprise-logs
      rootPassword: supersecret
      buckets:
        - name: chunks
          policy: none
          purge: false
        - name: ruler
          policy: none
          purge: false
        - name: admin
          policy: none
          purge: false
      persistence:
        size: 5Gi
      resources:
        requests:
          cpu: 100m
          memory: 128Mi
    

  • line 1287 or EOF. Move extraObjects configmap block up under loki. Above Minio.

    # Create extra manifests via values. Would be passed through `tpl` for templating
    extraObjects: []
    

  • line 1311, ensure the following BB values are all set under minio key:

    minio:
      # -- Enable minio instance support, must have minio-operator installed
      enabled: false
      # Override the minio service name for easier connection setup
      service:
        nameOverride: "minio.logging.svc.cluster.local"
      # -- Minio root credentials
      secrets:
        name: "loki-objstore-creds"
        accessKey: "minio"
        secretKey: "minio123" # default key, change this!
      tenant:
        # -- Buckets to be provisioned to for tenant
        buckets:
          - name: loki
          - name: loki-admin
        # -- Users to to be provisioned to for tenant
        users:
          - name: minio-user
        # -- User credentials to create for above user. Otherwise password is randomly generated.
        # This auth is not required to be set or reclaimed for minio use with Loki
        defaultUserCredentials:
          username: "minio-user"
          password: ""
        ## Specification for MinIO Pool(s) in this Tenant.
        pools:
          - servers: 1
            volumesPerServer: 4
            size: 750Mi
            securityContext:
              runAsUser: 1001
              runAsGroup: 1001
              fsGroup: 1001
        metrics:
          enabled: false
          port: 9000
          memory: 128M
    

  • End of file add/verify the following blocks:

    domain: bigbang.dev
    
    istio:
      enabled: false
      mtls:
        # STRICT = Allow only mutual TLS traffic
        # PERMISSIVE = Allow both plain text and mutual TLS traffic
        mode: STRICT
    
    networkPolicies:
      enabled: false
      # -- Control Plane CIDR to allow init job communication to the Kubernetes API.  
      # Use `kubectl get endpoints kubernetes` to get the CIDR range needed for your cluster
      controlPlaneCidr: 0.0.0.0/0
    
    bbtests:
      enabled: false
      cypress:
        artifacts: true
        envs:
          cypress_check_datasource: 'false'
          cypress_grafana_url: 'http://monitoring-grafana.monitoring.svc.cluster.local'
      scripts:
        image: registry1.dso.mil/ironbank/big-bang/base:2.0.0
        envs:
          LOKI_URL: 'http://{{ .Values.fullnameOverride }}.{{ .Release.Namespace }}.svc:3100'
          LOKI_VERSION: '{{ .Values.loki.image.tag }}'
    

chart/templates/tokengen/job-tokengen.yaml - At the top of the file, at the start of the templates under the conditionals at the very top, add the following NetworkPolicy resources:

{{- if .Values.networkPolicies.enabled }}
{{- if .Values.minio.enabled }}
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: tokengen-ingress-minio
  namespace: {{ .Release.Namespace }}
  annotations:
    "helm.sh/hook": post-install
    "helm.sh/hook-weight": "-10"
    "helm.sh/hook-delete-policy": hook-succeeded,hook-failed,before-hook-creation
spec:
  podSelector:
    matchLabels:
      app: minio
      app.kubernetes.io/instance: {{ .Release.Name }}
  ingress:
    - from:
      - podSelector:
          matchLabels:
            {{- include "enterprise-logs.tokengenLabels" . | nindent 14 }}
            {{- with .Values.enterprise.tokengen.labels }}
            {{- toYaml . | nindent 14 }}
            {{- end }}
      ports:
        - port: 9000
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: tokengen-egress-minio
  namespace: {{ .Release.Namespace }}
  annotations:
    "helm.sh/hook": post-install
    "helm.sh/hook-weight": "-10"
    "helm.sh/hook-delete-policy": hook-succeeded,hook-failed,before-hook-creation
spec:
  podSelector:
    matchLabels:
      app: minio
      app.kubernetes.io/instance: {{ .Release.Name }}
  egress:
    - to:
      - podSelector:
          matchLabels:
            {{- include "enterprise-logs.tokengenLabels" . | nindent 14 }}
            {{- with .Values.enterprise.tokengen.labels }}
            {{- toYaml . | nindent 14 }}
            {{- end }}
      ports:
        - port: 9000
{{- end }}
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: api-egress-tokengen-job
  namespace: {{ .Release.Namespace }}
  annotations:
    "helm.sh/hook": post-install
    "helm.sh/hook-weight": "-10"
    "helm.sh/hook-delete-policy": hook-succeeded,hook-failed,before-hook-creation
spec:
  egress:
  - to:
    - ipBlock:
        cidr: {{ .Values.networkPolicies.controlPlaneCidr }}
        {{- if eq .Values.networkPolicies.controlPlaneCidr "0.0.0.0/0" }}
        # ONLY Block requests to AWS metadata IP
        except:
        - 169.254.169.254/32
        {{- end }}
  podSelector:
    matchLabels:
      {{- include "enterprise-logs.tokengenLabels" . | nindent 6 }}
      {{- with .Values.enterprise.tokengen.labels }}
      {{- toYaml . | nindent 6 }}
      {{- end }}
  policyTypes:
  - Egress
{{- end }}
---

chart/templates/_helpers.tpl - On line 13 for the $default function, remove the ternary function and ensure the definition looks just like:

{{- $default := "loki" }

  • line 181 ensure the following block for minio looks like:
    {{- if .Values.minio.enabled -}}
    s3:
      endpoint: {{ $.Values.minio.service.nameOverride }}
      bucketnames: {{ $.Values.loki.storage.bucketNames.chunks }}
      secret_access_key: {{ $.Values.minio.secrets.secretKey }}
      access_key_id: {{ $.Values.minio.secrets.accessKey }}
      s3forcepathstyle: true
      insecure: true
    

chart/templates/backend/poddisruptionbudget-backend.yaml - Ensure that there is not hard-coded spec for the PDB template

  {{- with .Values.backend.podDisruptionBudget.maxUnavailable }}
  maxUnavailable: {{ . }}
  {{- else }}
  minAvailable: {{ .Values.backend.podDisruptionBudget.minAvailable | default 0 }}
  {{- end }}

chart/templates/gateway/poddisruptionbudget-gateway.yaml - Ensure that there is not hard-coded spec for the PDB template

  {{- with .Values.gateway.podDisruptionBudget.maxUnavailable }}
  maxUnavailable: {{ . }}
  {{- else }}
  minAvailable: {{ .Values.gateway.podDisruptionBudget.minAvailable | default 0 }}
  {{- end }}

chart/templates/read/poddisruptionbudget-read.yaml - Ensure that there is not hard-coded spec for the PDB template

  {{- with .Values.read.podDisruptionBudget.maxUnavailable }}
  maxUnavailable: {{ . }}
  {{- else }}
  minAvailable: {{ .Values.read.podDisruptionBudget.minAvailable | default 0 }}
  {{- end }}

chart/templates/write/poddisruptionbudget-write.yaml - Ensure that there is no hard-coded spec for the PDB template

  {{- with .Values.write.podDisruptionBudget.maxUnavailable }}
  maxUnavailable: {{ . }}
  {{- else }}
  minAvailable: {{ .Values.write.podDisruptionBudget.minAvailable | default 0 }}
  {{- end }}

chart/src/dashboards/ - cd into this directory and run the following command to update the logic so the Release name is captured:

sed -i 's/(loki|enterprise-logs)/logging-loki/g' \*.json
- modify the loki-logs.json dashboard to maintain the expr for log querying (lines 775 and 840): - 775: "expr": "sum(rate({namespace=\"$namespace\", pod=~\"$deployment.*\", pod=~\"$pod\", container=~\"$container\" } |logfmt|= \"$filter\" [5m])) by (level)", - 840: "expr": "{namespace=\"$namespace\", pod=~\"$deployment.*\", pod=~\"$pod\", container=~\"$container\"} | logfmt | level=\"$level\" |= \"$filter\"",

Testing new Loki Version💣

Deploy Loki Scalable as a part of BigBang💣

helm upgrade \
  --install bigbang ./bigbang/chart \
  --create-namespace \
  --namespace bigbang \
  --values ./bigbang/chart/values.yaml \
  --values ./bigbang/chart/ingress-certs.yaml \
  --values ./overrides/loki.yaml \
  --set gatekeeper.enabled=false \
  --set clusterAuditor.enabled=false \
  --set twistlock.enabled=false \
  --set loki.enabled=true \
  --set promtail.enabled=true \
  --set logging.enabled=false \
  --set eckoperator.enabled=false \
  --set fluentbit.enabled=true \
  --set jaeger.enabled=false \
  --set tempo.enabled=true \
  --set addons.minioOperator.enabled=true
overrides/loki.yaml
loki:
  git:
    tag: ""
    branch: "my-branch-name-goes-here"
  enabled: true
  strategy: scalable
- Visit https://grafana.bigbang.dev and login - Navigate to configuration -> Data Sources -> Loki and then click Save & Test to ensure Data Source changes can be saved successfully. - Search dashboards for Loki Dashboard Quick Search and confirm log data is being populated/no error messages.

Deploy Loki Monolith as a part of BigBang💣

helm upgrade \
  --install bigbang ./bigbang/chart \
  --create-namespace \
  --namespace bigbang \
  --values ./bigbang/chart/values.yaml \
  --values ./bigbang/chart/ingress-certs.yaml \
  --values ./overrides/loki.yaml \
  --set gatekeeper.enabled=false \
  --set clusterAuditor.enabled=false \
  --set twistlock.enabled=false \
  --set loki.enabled=true \
  --set promtail.enabled=true \
  --set logging.enabled=false \
  --set eckoperator.enabled=false \
  --set fluentbit.enabled=true \
  --set jaeger.enabled=false \
  --set tempo.enabled=true
overrides/loki.yaml
loki:
  git:
    tag: ""
    branch: "my-branch-name-goes-here"
  enabled: true
- Visit https://grafana.bigbang.dev and login - Navigate to configuration -> Data Sources -> Loki and then click Save & Test to ensure Data Source changes can be saved successfully. - Search dashboards for Loki Dashboard Quick Search and confirm log data is being populated/no error messages.


Last update: 2023-07-06 by Michael Martin