To upgrade the Loki Package📜
-
Navigate to the upstream chart repo and folder and find the tag that corresponds with the new chart version for this update.
- Check the upstream changelog for upgrade notices.
-
Checkout the
renovate/ironbank
branch -
From the root of the repo run
kpt pkg update chart@<tag> --strategy alpha-git-patch
, where tag is found in step 1 (loki ref:helm-loki-<tag>
)- Run a KPT update against the main chart folder:
# To find the chart version for the commmand below: # - Browse to the [upstream](https://github.com/grafana/loki/tree/main/production/helm/loki). # - Click on the drop-down menu on the upper left, then on Tags. # - Scroll/Search through the tags until you get to the Helm chart version tags (e.g. helm-loki-5.9.2, helm-loki-5.9.1, etc.). # - Starting with the most recent Helm chart version tag, open the Chart.yaml for the tag. If the appVersion value corresponds to the # version of Loki that Renovate detected for an upgrade, this is the correct version. So, for example, if you will be updating to chart # version helm-loki-5.9.2, your kpt command would be: # # kpt pkg update chart@helm-loki-5.9.2 --strategy alpha-git-patch kpt pkg update chart@helm-loki-<tag> --strategy alpha-git-patch
- Restore all BigBang added templates and tests:
git checkout chart/templates/bigbang/ git checkout chart/tests/ git checkout chart/dashboards git checkout chart/templates/tests
- Follow the
Update main chart
section of this document for a list of changes per file to be aware of, for how Big Bang differs from upstream.
- Run a KPT update against the main chart folder:
-
Modify the version in
Chart.yaml
and append-bb.0
to the chart version from upstream. -
Update dependencies and binaries using
helm dependency update ./chart
-
Ensure that the minio version in chart/Chart.yaml matches the latest tag version of minio available in the Big Bang minio package Chart.yaml
-
If needed, log into registry1.
# Note, if you are using Ubuntu on WSL and get an error about storing credentials or about how `The name org.freedesktop.secrets was not # provided by any .service files` when you run the command below, install the libsecret-1-dev and gnome-keyring packages. After doing this, # you'll be prompted to set a keyring password the first time you run this command. # helm registry login https://registry1.dso.mil -u ${registry1.username}
- Pull assets and commit the binaries as well as the Chart.lock file that was generated.
Then log out.
# Note: You may need to resolve merge conflicts in chart/values.yaml before these commands work. Refer to the "Modifications made to upstream" # section below for hinsts on how to resolve them. Also, you need to be logged in to registry1 thorough docker. export HELM_EXPERIMENTAL_OCI=1 helm dependency update ./chart
helm registry logout https://registry1.dso.mil
-
-
Update
CHANGELOG.md
adding an entry for the new version and noting all changes in a list (at minimum should include- Updated <chart or dependency> to x.x.x
). -
Generate the
README.md
updates by following the guide in gluon. -
Push up your changes, add upgrade notices if applicable, validate that CI passes.
-
If there are any failures, follow the information in the pipeline to make the necessary updates.
-
Add the
debug
label to the MR for more detailed information. -
Reach out to the CODEOWNERS if needed.
-
-
Follow the
Testing new Loki Version
section of this document for manual testing.
Update main chart📜
chart/Chart.yaml
- update loki
version
andappVersion
- Ensure Big Bang version suffix is appended to chart version
- Ensure minio and gluon dependencies are present and up to date
version: $VERSION-bb.0 dependencies: - name: minio-instance alias: minio version: $MINIO_VERSION repository: file://./deps/minio condition: minio.enabled - name: grafana-agent-operator alias: grafana-agent-operator version: $GRAFANA_VERSION repository: https://grafana.github.io/helm-charts condition: monitoring.selfMonitoring.grafanaAgent.installOperator - name: gluon version: $GLUON_VERSION repository: "oci://registry.dso.mil/platform-one/big-bang/apps/library-charts/gluon" annotations: bigbang.dev/applicationVersions: | - Loki: $LOKI_APP_VERSION
chart/values.yaml
- Verify that Renovate updated the loki: section with the correct value for
tag
. For example, if Renovate wants to update Loki to version 2.8.3, you should see:loki: # Configures the readiness probe for all of the Loki pods readinessProbe: httpGet: path: /ready port: http-metrics initialDelaySeconds: 30 timeoutSeconds: 1 image: # -- The Docker registry registry: registry1.dso.mil # -- Docker image repository repository: ironbank/opensource/grafana/loki # -- Overrides the image tag whose default is the chart's appVersion tag: 2.8.3
chart/tests/*
- Verify that cypress testing configuration and tests are present here. You should see contents similar to this in chart/tests/cypress/:
And this in chart/tests/scripts/:
drwxr-xr-x 2 ubuntu ubuntu 4096 Aug 1 12:24 ./ drwxr-xr-x 4 ubuntu ubuntu 4096 Aug 1 12:24 ../ -rw-r--r-- 1 ubuntu ubuntu 86 Aug 1 12:24 cypress.json -rw-r--r-- 1 ubuntu ubuntu 1494 Aug 1 12:24 loki-health.spec.js
If you are unsure or if these directories do not exist or are empty, check with the code owners.drwxr-xr-x 2 ubuntu ubuntu 4096 Aug 1 12:24 ./ drwxr-xr-x 4 ubuntu ubuntu 4096 Aug 1 12:24 ../ -rw-r--r-- 1 ubuntu ubuntu 2192 Aug 1 12:24 test.sh
Modifications made to upstream📜
This is a high-level list of modifications that Big Bang has made to the upstream helm chart. You can use this as as cross-check to make sure that no modifications were lost during the upgrade process.
chart/values.yaml
- Ensure nameOverride is set to logging-loki
nameOverride: logging-loki
logging-loki
fullnameOverride: logging-loki
private-registry
IPS is present:
imagePullSecrets:
- name: private-registry
automountServiceAccountToken
is set to `false`` for the service account.
serviceAccount:
# -- Set this toggle to false to opt out of automounting API credentials for the service account
automountServiceAccountToken: false
kubectlImage:
# -- The Docker registry
registry: registry1.dso.mil/ironbank
# -- Docker image repository
repository: opensource/kubernetes/kubectl
# -- Overrides the image tag whose default is the chart's appVersion
tag: v1.27.4
Verify that the loki.image
section points to a registry1 image and has the correct tag. For example, for Loki 2.8.3:
image:
# -- The Docker registry
registry: registry1.dso.mil
# -- Docker image repository
repository: ironbank/opensource/grafana/loki
# -- Overrides the image tag whose default is the chart's appVersion
tag: 2.8.3
-
Ensure that this block is present somewher in the
loki:
section:ingester: chunk_target_size: 196608 flush_check_period: 5s flush_op_timeout: 100m lifecycler: ring: kvstore: store: memberlist replication_factor: 1
-
Ensure by default auth is disabled in
loki.auth_enabled
auth_enabled: false
-
Ensure that
loki.storage.bucketNames
points toloki
,loki
&loki-admin
storage: bucketNames: chunks: loki ruler: loki admin: loki-admin
-
Ensure
loki.storage_config.boltdb_shipper
configuration is presentstorage_config: boltdb_shipper: active_index_directory: /var/loki/boltdb-shipper-active cache_location: /var/loki/boltdb-shipper-cache cache_ttl: 24h
-
Ensure
enterprise.image
is pointed to registry1 imageimage: # -- The Docker registry registry: registry1.dso.mil # -- Docker image repository repository: ironbank/grafana/grafana-enterprise-logs # -- Overrides the image tag whose default is the chart's appVersion tag: vX.X.X
-
Ensure
enterprise.provisioner.enabled
is set tofalse
provisioner: # -- Whether the job should be part of the deployment enabled: false
-
Ensure all
monitoring:
sub-components are set toenabled: false
Including the addedmonitoring.enabled
valuemonitoring: # -- Enable BigBang integration of Monitoring components enabled: false
Note that as of August 16, 2023, this is a little over 150 lines of code.📜
-
Ensure
monitoring.selfMonitoring.grafanaAgent.installOperator
is set tofalse
-
Ensure
monitoring.lokiCanary.enabled
is set tofalse
lokiCanary: enabled: false
-
Verify that
write.resources
are set:resources: limits: cpu: 300m memory: 2Gi requests: cpu: 300m memory: 2Gi
-
Ensure that at the bottom of the
write:
block, there is apodDisruptionBudget:
section## -- Application controller Pod Disruption Budget Configuration ## Ref: https://kubernetes.io/docs/tasks/run-application/configure-pdb/ podDisruptionBudget: # -- Number of pods that are available after eviction as number or percentage (eg.: 50%) # @default -- `""` (defaults to 0 if not specified) minAvailable: "" # -- Number of pods that are unavailable after eviction as number or percentage (eg.: 50%). ## Has higher precedence over `controller.pdb.minAvailable` maxUnavailable: "1"
- Make sure
read.resources
are set to:resources: limits: cpu: 300m memory: 2Gi requests: cpu: 300m memory: 2Gi
-
Ensure that at the bottom of the
read:
block, there is apodDisruptionBudget
section## -- Application controller Pod Disruption Budget Configuration ## Ref: https://kubernetes.io/docs/tasks/run-application/configure-pdb/ podDisruptionBudget: # -- Number of pods that are available after eviction as number or percentage (eg.: 50%) # @default -- `""` (defaults to 0 if not specified) minAvailable: "" # -- Number of pods that are unavailable after eviction as number or percentage (eg.: 50%). ## Has higher precedence over `controller.pdb.minAvailable` maxUnavailable: "1"
-
Ensure that at the bottom of the
backend:
block, there is apodDisruptionBudget
section## -- Application controller Pod Disruption Budget Configuration ## Ref: https://kubernetes.io/docs/tasks/run-application/configure-pdb/ podDisruptionBudget: # -- Number of pods that are available after eviction as number or percentage (eg.: 50%) # @default -- `""` (defaults to 0 if not specified) minAvailable: "" # -- Number of pods that are unavailable after eviction as number or percentage (eg.: 50%). ## Has higher precedence over `controller.pdb.minAvailable` maxUnavailable: "1"
-
Verify that
singleBinary.replicas
is set to1
singleBinary: # -- Number of replicas for the single binary replicas: 1
-
Verify that
singleBinary.resources
is set to:resources: limits: cpu: 100m memory: 256Mi requests: cpu: 100m memory: 256Mi
-
Make sure
gateway.enabled
is set tofalse
. -
Ensure
gateway.image
is pointed to registry1 equivalentimage: # -- The Docker registry for the gateway image registry: registry1.dso.mil # -- The gateway image repository repository: ironbank/opensource/nginx/nginx # -- The gateway image tag tag: X.X.X
-
Ensure that at the bottom of the
gateway:
block, there is apodDisruptionBudget
section## -- Application controller Pod Disruption Budget Configuration ## Ref: https://kubernetes.io/docs/tasks/run-application/configure-pdb/ podDisruptionBudget: # -- Number of pods that are available after eviction as number or percentage (eg.: 50%) # @default -- `""` (defaults to 0 if not specified) minAvailable: "" # -- Number of pods that are unavailable after eviction as number or percentage (eg.: 50%). ## Has higher precedence over `controller.pdb.minAvailable` maxUnavailable: "1"
-
Ensure that
singleBinary.persistence.size
is set to12Gi
-
Ensure that
singleBinary.persistence.enableStatefulAutoDeletePVC
is set tofalse
. -
Ensure that
memcachedExporter.enabled
is set tofalse
. -
Ensure that
chunksCache.enabled
is set tofalse
. -
Esnure that
resultsCache.enabled
is set tofalse
. -
The default value for
loki.schemaConfig
has changed. To keep the prior value, ensure that the following block is present:
schemaConfig:
configs:
- from: 2022-01-11
store: boltdb-shipper
object_store: "{{ .Values.loki.storage.type }}"
schema: v12
index:
prefix: loki_index_
period: 24h
- from: 2023-08-01
store: tsdb
object_store: "{{ .Values.loki.storage.type }}"
schema: v12
index:
prefix: loki_tsdb_
period: 24h
- from: 2024-04-01
store: tsdb
object_store: "{{ .Values.loki.storage.type }}"
schema: v13
index:
prefix: loki_tsdb_
period: 24h
- Ensure that the value for
memcached.image.repository
andmemcached.image.tag
are set to valid values from registry1.
memcached:
image:
# -- Memcached Docker image repository
repository: registry1.dso.mil/ironbank/opensource/memcached/memcached
# -- Memcached Docker image tag
tag: 1.6.23
- Ensured that
memcached.containerSecurityContext
includes the following:
fsGroup: 10001
runAsGroup: 10001
runAsNonRoot: true
runAsUser: 10001
** Important **📜
Before following the step below, note that if there is only one minio: block, you shouldn’t remove it. - Remove minio block added by upstream
-
Move the
extraObjects:
configmap block up underloki:
, so that it is bettweenloki:
andenterprise:
. -
Ensure the following BB values are all set under minio key:
minio: # -- Enable minio instance support, must have minio-operator installed enabled: false # Allow the address used by Loki to refer to Minio to be overridden address: "minio.logging.svc.cluster.local" # -- Minio root credentials secrets: name: "loki-objstore-creds" accessKey: "minio" secretKey: "minio123" # default key, change this! tenant: # -- Buckets to be provisioned to for tenant buckets: - name: loki - name: loki-admin # -- Users to to be provisioned to for tenant users: - name: minio-user # -- User credentials to create for above user. Otherwise password is randomly generated. # This auth is not required to be set or reclaimed for minio use with Loki defaultUserCredentials: username: "minio-user" password: "" ## Specification for MinIO Pool(s) in this Tenant. pools: - servers: 1 volumesPerServer: 4 size: 750Mi securityContext: runAsUser: 1001 runAsGroup: 1001 fsGroup: 1001 metrics: enabled: false port: 9000 memory: 128M
-
End of file add/verify the following blocks:
domain: bigbang.dev istio: enabled: false mtls: # STRICT = Allow only mutual TLS traffic # PERMISSIVE = Allow both plain text and mutual TLS traffic mode: STRICT networkPolicies: enabled: false # -- Control Plane CIDR to allow init job communication to the Kubernetes API. # Use `kubectl get endpoints kubernetes` to get the CIDR range needed for your cluster controlPlaneCidr: 0.0.0.0/0 bbtests: enabled: false cypress: artifacts: true envs: cypress_check_datasource: 'false' cypress_grafana_url: 'http://monitoring-grafana.monitoring.svc.cluster.local' scripts: image: registry1.dso.mil/ironbank/big-bang/base:2.0.0 envs: LOKI_URL: 'http://{{ .Values.fullnameOverride }}.{{ .Release.Namespace }}.svc:3100' LOKI_VERSION: '{{ .Values.loki.image.tag }}'
chart/templates/tokengen/job-tokengen.yaml
- At the top of the file, at the start of the templates under the conditionals at the very top, add the following NetworkPolicy resources:
{{- if .Values.networkPolicies.enabled }}
{{- if .Values.minio.enabled }}
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: tokengen-ingress-minio
namespace: {{ .Release.Namespace }}
annotations:
"helm.sh/hook": post-install
"helm.sh/hook-weight": "-10"
"helm.sh/hook-delete-policy": hook-succeeded,hook-failed,before-hook-creation
spec:
podSelector:
matchLabels:
app: minio
app.kubernetes.io/instance: {{ .Release.Name }}
ingress:
- from:
- podSelector:
matchLabels:
{{- include "enterprise-logs.tokengenLabels" . | nindent 14 }}
{{- with .Values.enterprise.tokengen.labels }}
{{- toYaml . | nindent 14 }}
{{- end }}
ports:
- port: 9000
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: tokengen-egress-minio
namespace: {{ .Release.Namespace }}
annotations:
"helm.sh/hook": post-install
"helm.sh/hook-weight": "-10"
"helm.sh/hook-delete-policy": hook-succeeded,hook-failed,before-hook-creation
spec:
podSelector:
matchLabels:
app: minio
app.kubernetes.io/instance: {{ .Release.Name }}
egress:
- to:
- podSelector:
matchLabels:
{{- include "enterprise-logs.tokengenLabels" . | nindent 14 }}
{{- with .Values.enterprise.tokengen.labels }}
{{- toYaml . | nindent 14 }}
{{- end }}
ports:
- port: 9000
{{- end }}
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: api-egress-tokengen-job
namespace: {{ .Release.Namespace }}
annotations:
"helm.sh/hook": post-install
"helm.sh/hook-weight": "-10"
"helm.sh/hook-delete-policy": hook-succeeded,hook-failed,before-hook-creation
spec:
egress:
- to:
- ipBlock:
cidr: {{ .Values.networkPolicies.controlPlaneCidr }}
{{- if eq .Values.networkPolicies.controlPlaneCidr "0.0.0.0/0" }}
# ONLY Block requests to AWS metadata IP
except:
- 169.254.169.254/32
{{- end }}
podSelector:
matchLabels:
{{- include "enterprise-logs.tokengenLabels" . | nindent 6 }}
{{- with .Values.enterprise.tokengen.labels }}
{{- toYaml . | nindent 6 }}
{{- end }}
policyTypes:
- Egress
{{- end }}
---
chart/templates/_helpers.tpl
- On line 13 for the $default
function, remove the ternary
function and ensure the definition looks just like:
{{- $default := "loki" }
- Ensure the following block for minio looks like:
{{- if .Values.minio.enabled -}} s3: endpoint: {{ include "loki.minio" $ }} bucketnames: {{ $.Values.loki.storage.bucketNames.chunks }} secret_access_key: {{ $.Values.minio.secrets.secretKey }} access_key_id: {{ $.Values.minio.secrets.accessKey }} s3forcepathstyle: true insecure: true
chart/templates/backend/poddisruptionbudget-backend.yaml
- Ensure that there is not hard-coded spec for the PDB template
{{- with .Values.backend.podDisruptionBudget.maxUnavailable }}
maxUnavailable: {{ . }}
{{- else }}
minAvailable: {{ .Values.backend.podDisruptionBudget.minAvailable | default 0 }}
{{- end }}
chart/templates/gateway/poddisruptionbudget-gateway.yaml
- Ensure that there is not hard-coded spec for the PDB template
{{- with .Values.gateway.podDisruptionBudget.maxUnavailable }}
maxUnavailable: {{ . }}
{{- else }}
minAvailable: {{ .Values.gateway.podDisruptionBudget.minAvailable | default 0 }}
{{- end }}
chart/templates/read/poddisruptionbudget-read.yaml
- Ensure that there is not hard-coded spec for the PDB template
{{- with .Values.read.podDisruptionBudget.maxUnavailable }}
maxUnavailable: {{ . }}
{{- else }}
minAvailable: {{ .Values.read.podDisruptionBudget.minAvailable | default 0 }}
{{- end }}
chart/templates/write/poddisruptionbudget-write.yaml
- Ensure that there is no hard-coded spec for the PDB template
{{- with .Values.write.podDisruptionBudget.maxUnavailable }}
maxUnavailable: {{ . }}
{{- else }}
minAvailable: {{ .Values.write.podDisruptionBudget.minAvailable | default 0 }}
{{- end }}
chart/src/dashboards/
- cd into this directory and run the following command to update the logic so the Release name is captured:
sed -i 's/(loki|enterprise-logs)/logging-loki/g' \*.json
loki-logs.json
dashboard to maintain the expr
for log querying (lines 775 and 840):
- 775: "expr": "sum(rate({namespace=\"$namespace\", pod=~\"$deployment.*\", pod=~\"$pod\", container=~\"$container\" } |logfmt|= \"$filter\" [5m])) by (level)",
- 840: "expr": "{namespace=\"$namespace\", pod=~\"$deployment.*\", pod=~\"$pod\", container=~\"$container\"} | logfmt | level=\"$level\" |= \"$filter\"",
automountServiceAccountToken📜
The mutating Kyverno policy named update-automountserviceaccounttokens is leveraged to harden all ServiceAccounts in this package with automountServiceAccountToken: false
.
This policy revokes access to the K8s API for Pods utilizing said ServiceAccounts. If a Pod truly requires access to the K8s API (for app functionality), the Pod is added to the pods:
array of the same mutating policy. This grants the Pod access to the API, and creates a Kyverno PolicyException to prevent an alert.
Testing new Loki Version📜
NOTE: For these testing steps it is good to do them on both a clean install and an upgrade. For clean install, point Loki to your branch. For an upgrade do an install with Loki pointing to the latest tag, then perform a helm upgrade with Loki pointing to your branch.
Deploy Loki Scalable as a part of BigBang📜
You will want to install with:
- Loki, Promtail, Fluentbit, Tempo, Monitoring, MinioOperator and Istio packages enabled
overrides/loki.yaml
clusterAuditor:
enabled: false
gatekeeper:
enabled: false
istioOperator:
enabled: true
istio:
enabled: true
monitoring:
enabled: true
loki:
enabled: true
git:
tag: ""
branch: "renovate/ironbank"
strategy: scalable
promtail:
enabled: true
tempo:
enabled: true
jaeger:
enabled: false
eckOperator:
enabled: false
fluentbit:
enabled: true
twistlock:
enabled: false
addons:
minioOperator:
enabled: true
- Visit
https://grafana.bigbang.dev
and login with default credentials - Navigate to
Connections -> Data Sources -> Loki
- Click
Save & Test
to ensure Data Source changes can be saved successfully. - Search dashboards for
Loki Dashboard Quick Search
and confirm log data is being populated/no error messages.
Deploy Loki Monolith as a part of BigBang📜
Loki Monolith is tested during the “package tests” stage of loki pipelines.
You will want to install with:
- Loki, Promtail, Tempo, Monitoring and Istio packages enabled
overrides/loki.yaml
clusterAuditor:
enabled: false
gatekeeper:
enabled: false
istioOperator:
enabled: true
istio:
enabled: true
monitoring:
enabled: true
loki:
enabled: true
git:
tag: ""
branch: "renovate/ironbank"
promtail:
enabled: true
tempo:
enabled: true
jaeger:
enabled: false
twistlock:
enabled: false
https://grafana.bigbang.dev
and login with default credentials
- Navigate to Connections -> Data Sources -> Loki
- Click Save & Test
to ensure Data Source changes can be saved successfully.
- Search dashboards for Loki Dashboard Quick Search
and confirm log data is being populated/no error messages.
When in doubt with any testing or upgrade steps, reach out to the CODEOWNERS for assistance.