To upgrade the Loki Package📜
Check the upstream changelog and the helm chart upgrade notes.
Upgrading📜
Find the latest version of the loki
image that matches the latest version in IronBank that Renovate has identified from here: https://github.com/grafana/loki/tree/helm-loki-3.2.0/production/helm/loki
Run a KPT update against the main chart folder:
# To find the chart version for the commmand below:
# - Browse to the [upstream](https://github.com/grafana/loki/tree/main/production/helm/loki).
# - Click on the drop-down menu on the upper left, then on Tags.
# - Scroll through the tags until you get to the Helm chart version tags (e.g. helm-loki-5.9.2, helm-loki-5.9.1, etc.).
# - Starting with the most recent Helm chart version tag, open the Chart.yaml for the tag. If the appVersion value corresponds to the
# version of Loki that Renovate detected for an upgrade, this is the correct version. So, for example, if you will be updating to chart
# version helm-loki-5.9.2, your kpt command would be:
#
# kpt pkg update chart@helm-loki-5.9.2 --strategy alpha-git-patch
kpt pkg update chart@helm-loki-${chart.version} --strategy alpha-git-patch
# Note to reviewer: I removed the 'git checkout' commands here that referenced the nonexistent folder chart/deps. Not sure if the rest of these are needed.
git checkout chart/templates/bigbang/
git checkout chart/tests/
git checkout chart/dashboards
git checkout chart/templates/tests
Update dependencies in chart.yml📜
Ensure that the minio version in chart/Chart.yaml matches the latest tag version of minio available in the Big Bang minio package Chart.yaml
Update binaries📜
If needed, log into registry1.
# Note, if you are using Ubuntu on WSL and get an error about storing credentials or about how `The name org.freedesktop.secrets was not
# provided by any .service files` when you run the command below, install the libsecret-1-dev and gnome-keyring packages. After doing this,
# you'll be prompted to set a keyring password the first time you run this command.
#
helm registry login https://registry1.dso.mil -u ${registry1.username}
# Note: You may need to resolve merge conflicts in chart/values.yaml before these commands work. Refer to the "Modifications made to upstream"
# section below for hinsts on how to resolve them. Also, you need to be logged in to registry1 thorough docker.
export HELM_EXPERIMENTAL_OCI=1
helm dependency update ./chart
helm registry logout https://registry1.dso.mil
Update main chart📜
chart/Chart.yaml
- update loki
version
andappVersion
- Ensure Big Bang version suffix is appended to chart version
- Ensure minio and gluon dependencies are present and up to date
version: $VERSION-bb.0 dependencies: - name: minio-instance alias: minio version: $MINIO_VERSION repository: file://./deps/minio condition: minio.enabled - name: grafana-agent-operator alias: grafana-agent-operator version: $GRAFANA_VERSION repository: https://grafana.github.io/helm-charts condition: monitoring.selfMonitoring.grafanaAgent.installOperator - name: gluon version: $GLUON_VERSION repository: "oci://registry.dso.mil/platform-one/big-bang/apps/library-charts/gluon" annotations: bigbang.dev/applicationVersions: | - Loki: $LOKI_APP_VERSION
chart/values.yaml
- Verify that Renovate updated the loki: section with the correct value for
tag
. For example, if Renovate wants to update Loki to version 2.8.3, you should see:loki: # Configures the readiness probe for all of the Loki pods readinessProbe: httpGet: path: /ready port: http-metrics initialDelaySeconds: 30 timeoutSeconds: 1 image: # -- The Docker registry registry: registry1.dso.mil # -- Docker image repository repository: ironbank/opensource/grafana/loki # -- Overrides the image tag whose default is the chart's appVersion tag: 2.8.3
chart/tests/*
- Verify that cypress testing configuration and tests are present here. You should see contents similar to this in chart/tests/cypress/:
And this in chart/tests/scripts/:
drwxr-xr-x 2 ubuntu ubuntu 4096 Aug 1 12:24 ./ drwxr-xr-x 4 ubuntu ubuntu 4096 Aug 1 12:24 ../ -rw-r--r-- 1 ubuntu ubuntu 86 Aug 1 12:24 cypress.json -rw-r--r-- 1 ubuntu ubuntu 1494 Aug 1 12:24 loki-health.spec.js
If you are unsure or if these directories do not exist or are empty, check with the code owners.drwxr-xr-x 2 ubuntu ubuntu 4096 Aug 1 12:24 ./ drwxr-xr-x 4 ubuntu ubuntu 4096 Aug 1 12:24 ../ -rw-r--r-- 1 ubuntu ubuntu 2192 Aug 1 12:24 test.sh
Modifications made to upstream📜
This is a high-level list of modifications that Big Bang has made to the upstream helm chart. You can use this as as cross-check to make sure that no modifications were lost during the upgrade process.
chart/values.yaml
- Ensure nameOverride is set to logging-loki
nameOverride: logging-loki
logging-loki
fullnameOverride: logging-loki
private-registry
IPS is present:
imagePullSecrets:
- name: private-registry
automountServiceAccountToken
is set to `false`` for the service account.
serviceAccount:
# -- Set this toggle to false to opt out of automounting API credentials for the service account
automountServiceAccountToken: false
kubectlImage:
# -- The Docker registry
registry: registry1.dso.mil/ironbank
# -- Docker image repository
repository: opensource/kubernetes/kubectl
# -- Overrides the image tag whose default is the chart's appVersion
tag: v1.27.4
Verify that the loki.image
section points to a registry1 image and has the correct tag. For example, for Loki 2.8.3:
image:
# -- The Docker registry
registry: registry1.dso.mil
# -- Docker image repository
repository: ironbank/opensource/grafana/loki
# -- Overrides the image tag whose default is the chart's appVersion
tag: 2.8.3
-
Ensure that this block is present somewher in the
loki:
section:ingester: chunk_target_size: 196608 flush_check_period: 5s flush_op_timeout: 100m lifecycler: ring: kvstore: store: memberlist replication_factor: 1
-
Ensure by default auth is disabled in
loki.auth_enabled
auth_enabled: false
-
Ensure that
loki.storage.bucketNames
points toloki
,loki
&loki-admin
storage: bucketNames: chunks: loki ruler: loki admin: loki-admin
-
Ensure
loki.storage_config.boltdb_shipper
configuration is presentstorage_config: boltdb_shipper: active_index_directory: /var/loki/boltdb-shipper-active cache_location: /var/loki/boltdb-shipper-cache cache_ttl: 24h shared_store: s3
-
Ensure
enterprise.image
is pointed to registry1 imageimage: # -- The Docker registry registry: registry1.dso.mil # -- Docker image repository repository: ironbank/grafana/grafana-enterprise-logs # -- Overrides the image tag whose default is the chart's appVersion tag: vX.X.X
-
Ensure
enterprise.provisioner.enabled
is set tofalse
provisioner: # -- Whether the job should be part of the deployment enabled: false
-
Ensure all
monitoring:
sub-components are set toenabled: false
Including the addedmonitoring.enabled
valuemonitoring: # -- Enable BigBang integration of Monitoring components enabled: false
Note that as of August 16, 2023, this is a little over 150 lines of code.📜
-
Ensure
monitoring.selfMonitoring.grafanaAgent.installOperator
is set tofalse
-
Ensure
monitoring.lokiCanary.enabled
is set tofalse
lokiCanary: enabled: false
-
Verify that
write.resources
are set:resources: limits: cpu: 300m memory: 2Gi requests: cpu: 300m memory: 2Gi
-
Ensure that at the bottom of the
write:
block, there is apodDisruptionBudget:
section## -- Application controller Pod Disruption Budget Configuration ## Ref: https://kubernetes.io/docs/tasks/run-application/configure-pdb/ podDisruptionBudget: # -- Number of pods that are available after eviction as number or percentage (eg.: 50%) # @default -- `""` (defaults to 0 if not specified) minAvailable: "" # -- Number of pods that are unavailable after eviction as number or percentage (eg.: 50%). ## Has higher precedence over `controller.pdb.minAvailable` maxUnavailable: "1"
- Make sure
read.resources
are set to:resources: limits: cpu: 300m memory: 2Gi requests: cpu: 300m memory: 2Gi
-
Ensure that at the bottom of the
read:
block, there is apodDisruptionBudget
section## -- Application controller Pod Disruption Budget Configuration ## Ref: https://kubernetes.io/docs/tasks/run-application/configure-pdb/ podDisruptionBudget: # -- Number of pods that are available after eviction as number or percentage (eg.: 50%) # @default -- `""` (defaults to 0 if not specified) minAvailable: "" # -- Number of pods that are unavailable after eviction as number or percentage (eg.: 50%). ## Has higher precedence over `controller.pdb.minAvailable` maxUnavailable: "1"
-
Ensure that at the bottom of the
backend:
block, there is apodDisruptionBudget
section## -- Application controller Pod Disruption Budget Configuration ## Ref: https://kubernetes.io/docs/tasks/run-application/configure-pdb/ podDisruptionBudget: # -- Number of pods that are available after eviction as number or percentage (eg.: 50%) # @default -- `""` (defaults to 0 if not specified) minAvailable: "" # -- Number of pods that are unavailable after eviction as number or percentage (eg.: 50%). ## Has higher precedence over `controller.pdb.minAvailable` maxUnavailable: "1"
-
Verify that
singleBinary.replicas
is set to1
singleBinary: # -- Number of replicas for the single binary replicas: 1
-
Verify that
singleBinary.resources
is set to:resources: limits: cpu: 100m memory: 256Mi requests: cpu: 100m memory: 256Mi
-
Make sure
gateway.enabled
is set tofalse
. -
Ensure
gateway.image
is pointed to registry1 equivalentimage: # -- The Docker registry for the gateway image registry: registry1.dso.mil # -- The gateway image repository repository: ironbank/opensource/nginx/nginx # -- The gateway image tag tag: X.X.X
-
Ensure that at the bottom of the
gateway:
block, there is apodDisruptionBudget
section## -- Application controller Pod Disruption Budget Configuration ## Ref: https://kubernetes.io/docs/tasks/run-application/configure-pdb/ podDisruptionBudget: # -- Number of pods that are available after eviction as number or percentage (eg.: 50%) # @default -- `""` (defaults to 0 if not specified) minAvailable: "" # -- Number of pods that are unavailable after eviction as number or percentage (eg.: 50%). ## Has higher precedence over `controller.pdb.minAvailable` maxUnavailable: "1"
*** Important ***📜
Before following the step below, note that if there is only one minio: block, you shouldn’t remove it. - Remove minio block added by upstream
-
Move the
extraObjects:
configmap block up underloki:
, so that it is bettweenloki:
andenterprise:
. -
Ensure the following BB values are all set under minio key:
minio: # -- Enable minio instance support, must have minio-operator installed enabled: false # Override the minio service name for easier connection setup service: nameOverride: "minio.logging.svc.cluster.local" # -- Minio root credentials secrets: name: "loki-objstore-creds" accessKey: "minio" secretKey: "minio123" # default key, change this! tenant: # -- Buckets to be provisioned to for tenant buckets: - name: loki - name: loki-admin # -- Users to to be provisioned to for tenant users: - name: minio-user # -- User credentials to create for above user. Otherwise password is randomly generated. # This auth is not required to be set or reclaimed for minio use with Loki defaultUserCredentials: username: "minio-user" password: "" ## Specification for MinIO Pool(s) in this Tenant. pools: - servers: 1 volumesPerServer: 4 size: 750Mi securityContext: runAsUser: 1001 runAsGroup: 1001 fsGroup: 1001 metrics: enabled: false port: 9000 memory: 128M
-
End of file add/verify the following blocks:
domain: bigbang.dev istio: enabled: false mtls: # STRICT = Allow only mutual TLS traffic # PERMISSIVE = Allow both plain text and mutual TLS traffic mode: STRICT networkPolicies: enabled: false # -- Control Plane CIDR to allow init job communication to the Kubernetes API. # Use `kubectl get endpoints kubernetes` to get the CIDR range needed for your cluster controlPlaneCidr: 0.0.0.0/0 bbtests: enabled: false cypress: artifacts: true envs: cypress_check_datasource: 'false' cypress_grafana_url: 'http://monitoring-grafana.monitoring.svc.cluster.local' scripts: image: registry1.dso.mil/ironbank/big-bang/base:2.0.0 envs: LOKI_URL: 'http://{{ .Values.fullnameOverride }}.{{ .Release.Namespace }}.svc:3100' LOKI_VERSION: '{{ .Values.loki.image.tag }}'
chart/templates/tokengen/job-tokengen.yaml
- At the top of the file, at the start of the templates under the conditionals at the very top, add the following NetworkPolicy resources:
{{- if .Values.networkPolicies.enabled }}
{{- if .Values.minio.enabled }}
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: tokengen-ingress-minio
namespace: {{ .Release.Namespace }}
annotations:
"helm.sh/hook": post-install
"helm.sh/hook-weight": "-10"
"helm.sh/hook-delete-policy": hook-succeeded,hook-failed,before-hook-creation
spec:
podSelector:
matchLabels:
app: minio
app.kubernetes.io/instance: {{ .Release.Name }}
ingress:
- from:
- podSelector:
matchLabels:
{{- include "enterprise-logs.tokengenLabels" . | nindent 14 }}
{{- with .Values.enterprise.tokengen.labels }}
{{- toYaml . | nindent 14 }}
{{- end }}
ports:
- port: 9000
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: tokengen-egress-minio
namespace: {{ .Release.Namespace }}
annotations:
"helm.sh/hook": post-install
"helm.sh/hook-weight": "-10"
"helm.sh/hook-delete-policy": hook-succeeded,hook-failed,before-hook-creation
spec:
podSelector:
matchLabels:
app: minio
app.kubernetes.io/instance: {{ .Release.Name }}
egress:
- to:
- podSelector:
matchLabels:
{{- include "enterprise-logs.tokengenLabels" . | nindent 14 }}
{{- with .Values.enterprise.tokengen.labels }}
{{- toYaml . | nindent 14 }}
{{- end }}
ports:
- port: 9000
{{- end }}
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: api-egress-tokengen-job
namespace: {{ .Release.Namespace }}
annotations:
"helm.sh/hook": post-install
"helm.sh/hook-weight": "-10"
"helm.sh/hook-delete-policy": hook-succeeded,hook-failed,before-hook-creation
spec:
egress:
- to:
- ipBlock:
cidr: {{ .Values.networkPolicies.controlPlaneCidr }}
{{- if eq .Values.networkPolicies.controlPlaneCidr "0.0.0.0/0" }}
# ONLY Block requests to AWS metadata IP
except:
- 169.254.169.254/32
{{- end }}
podSelector:
matchLabels:
{{- include "enterprise-logs.tokengenLabels" . | nindent 6 }}
{{- with .Values.enterprise.tokengen.labels }}
{{- toYaml . | nindent 6 }}
{{- end }}
policyTypes:
- Egress
{{- end }}
---
chart/templates/_helpers.tpl
- On line 13 for the $default
function, remove the ternary
function and ensure the definition looks just like:
{{- $default := "loki" }
- Ensure the following block for minio looks like:
{{- if .Values.minio.enabled -}} s3: endpoint: {{ $.Values.minio.service.nameOverride }} bucketnames: {{ $.Values.loki.storage.bucketNames.chunks }} secret_access_key: {{ $.Values.minio.secrets.secretKey }} access_key_id: {{ $.Values.minio.secrets.accessKey }} s3forcepathstyle: true insecure: true
chart/templates/backend/poddisruptionbudget-backend.yaml
- Ensure that there is not hard-coded spec for the PDB template
{{- with .Values.backend.podDisruptionBudget.maxUnavailable }}
maxUnavailable: {{ . }}
{{- else }}
minAvailable: {{ .Values.backend.podDisruptionBudget.minAvailable | default 0 }}
{{- end }}
chart/templates/gateway/poddisruptionbudget-gateway.yaml
- Ensure that there is not hard-coded spec for the PDB template
{{- with .Values.gateway.podDisruptionBudget.maxUnavailable }}
maxUnavailable: {{ . }}
{{- else }}
minAvailable: {{ .Values.gateway.podDisruptionBudget.minAvailable | default 0 }}
{{- end }}
chart/templates/read/poddisruptionbudget-read.yaml
- Ensure that there is not hard-coded spec for the PDB template
{{- with .Values.read.podDisruptionBudget.maxUnavailable }}
maxUnavailable: {{ . }}
{{- else }}
minAvailable: {{ .Values.read.podDisruptionBudget.minAvailable | default 0 }}
{{- end }}
chart/templates/write/poddisruptionbudget-write.yaml
- Ensure that there is no hard-coded spec for the PDB template
{{- with .Values.write.podDisruptionBudget.maxUnavailable }}
maxUnavailable: {{ . }}
{{- else }}
minAvailable: {{ .Values.write.podDisruptionBudget.minAvailable | default 0 }}
{{- end }}
chart/src/dashboards/
- cd into this directory and run the following command to update the logic so the Release name is captured:
sed -i 's/(loki|enterprise-logs)/logging-loki/g' \*.json
loki-logs.json
dashboard to maintain the expr
for log querying (lines 775 and 840):
- 775: "expr": "sum(rate({namespace=\"$namespace\", pod=~\"$deployment.*\", pod=~\"$pod\", container=~\"$container\" } |logfmt|= \"$filter\" [5m])) by (level)",
- 840: "expr": "{namespace=\"$namespace\", pod=~\"$deployment.*\", pod=~\"$pod\", container=~\"$container\"} | logfmt | level=\"$level\" |= \"$filter\"",
automountServiceAccountToken📜
The mutating Kyverno policy named update-automountserviceaccounttokens is leveraged to harden all ServiceAccounts in this package with automountServiceAccountToken: false
.
This policy revokes access to the K8s API for Pods utilizing said ServiceAccounts. If a Pod truly requires access to the K8s API (for app functionality), the Pod is added to the pods:
array of the same mutating policy. This grants the Pod access to the API, and creates a Kyverno PolicyException to prevent an alert.
Testing new Loki Version📜
Deploy Loki Scalable as a part of BigBang📜
helm upgrade \
--install bigbang ./bigbang/chart \
--create-namespace \
--namespace bigbang \
--values ./bigbang/chart/values.yaml \
--values ./bigbang/chart/ingress-certs.yaml \
--values ./overrides/loki.yaml \
--set gatekeeper.enabled=false \
--set clusterAuditor.enabled=false \
--set twistlock.enabled=false \
--set loki.enabled=true \
--set promtail.enabled=true \
--set logging.enabled=false \
--set eckoperator.enabled=false \
--set fluentbit.enabled=true \
--set jaeger.enabled=false \
--set tempo.enabled=true \
--set addons.minioOperator.enabled=true
overrides/loki.yaml
loki:
git:
tag: ""
branch: "my-branch-name-goes-here"
enabled: true
strategy: scalable
https://grafana.bigbang.dev
and login
- Navigate to configuration -> Data Sources -> Loki
and then click Save & Test
to ensure Data Source changes can be saved successfully.
- Search dashboards for Loki Dashboard Quick Search
and confirm log data is being populated/no error messages.
Deploy Loki Monolith as a part of BigBang📜
Loki Monolith is tested during the “package tests” stage of loki pipelines.
helm upgrade \
--install bigbang ./bigbang/chart \
--create-namespace \
--namespace bigbang \
--values ./bigbang/chart/values.yaml \
--values ./bigbang/chart/ingress-certs.yaml \
--values ./overrides/loki.yaml \
--set gatekeeper.enabled=false \
--set clusterAuditor.enabled=false \
--set twistlock.enabled=false \
--set loki.enabled=true \
--set promtail.enabled=true \
--set jaeger.enabled=false \
--set tempo.enabled=true
overrides/loki.yaml
loki:
git:
tag: ""
branch: "my-branch-name-goes-here"
enabled: true
https://grafana.bigbang.dev
and login
- Navigate to configuration -> Data Sources -> Loki
and then click Save & Test
to ensure Data Source changes can be saved successfully.
- Search dashboards for Loki Dashboard Quick Search
and confirm log data is being populated/no error messages.