To upgrade the Loki Package💣
Check the upstream changelog and the helm chart upgrade notes.
Upgrading💣
Find the latest version of the loki
image that matches the latest version in IronBank that Renovate has identified from here: https://github.com/grafana/loki/tree/helm-loki-3.2.0/production/helm/loki
Run a KPT update against the main chart folder:
kpt pkg update chart@helm-loki-${chart.version} --strategy alpha-git-patch
Restore all BigBang added templates and tests:
git checkout chart/templates/bigbang/
git checkout chart/deps/loki
git checkout chart/deps/minio
git checkout chart/tests/
git checkout chart/dashboards
git checkout chart/templates/tests
Update dependencies💣
Typically, the --strategy=force-delete-replace
is useful to “heavy handidly” bring in dep changes which may need to be reviewed.
LATEST_BB_PACKAGE_TAG_VERSION
cd chart/deps
kpt pkg update minio@${LATEST_BB_PACKAGE_TAG_VERSION} --strategy=force-delete-replace
cd ../../
Update dependencies in chart.yml💣
Ensure minio version in chart.yml matches the latest tag version.
Update binaries💣
If needed, log into registry1
helm registry login https://registry1.dso.mil -u ${registry1.username}
helm registry logout https://registry1.dso.mil
Pull assets and commit the binaries as well as the Chart.lock file that was generated.
export HELM_EXPERIMENTAL_OCI=1
helm dependency update ./chart
Update main chart💣
chart/Chart.yaml
- update loki
version
andappVersion
- Ensure Big Bang version suffix is appended to chart version
- Ensure minio, gluon, and loki dependencies are present and up to date
version: $VERSION-bb.0
dependencies:
- name: minio-instance
alias: minio
version: $MINIO_VERSION
repository: file://./deps/minio
condition: minio.enabled
- name: grafana-agent-operator
alias: grafana-agent-operator
version: $GRAFANA_VERSION
repository: https://grafana.github.io/helm-charts
condition: monitoring.selfMonitoring.grafanaAgent.installOperator
- name: gluon
version: $GLUON_VERSION
repository: "oci://registry.dso.mil/platform-one/big-bang/apps/library-charts/gluon"
annotations:
bigbang.dev/applicationVersions: |
- Loki: $LOKI_APP_VERSION
chart/values.yaml
- Verify renovate correctly
tag
for the new version.
chart/tests/*
- add cypress testing configuration and/or tests if necessary.
Modifications made to upstream💣
This is a high-level list of modifications that Big Bang has made to the upstream helm chart. You can use this as as cross-check to make sure that no modifications were lost during the upgrade process.
chart/values.yaml
-
line 14, Ensure nameOverride is set to
logging-loki
nameOverride: logging-loki
-
line 17, Ensure fullnameOverride is set to
logging-loki
fullnameOverride: logging-loki
-
line 21, Ensure
private-registry
IPS is present:
imagePullSecrets:
- name: private-registry
- line 23, update the kubectl image to pull from registry1
kubectlImage:
# -- The Docker registry
registry: registry1.dso.mil/ironbank
# -- Docker image repository
repository: opensource/kubernetes/kubectl
# -- Overrides the image tag whose default is the chart's appVersion
tag: v1.25.2
line 40, Ensure loki.image
section points to registry1 image and correct tag
image:
# -- The Docker registry
registry: registry1.dso.mil
# -- Docker image repository
repository: ironbank/opensource/grafana/loki
# -- Overrides the image tag whose default is the chart's appVersion
tag: X.X.X
- line 136, Ensure
136
config is present
ingester:
chunk_target_size: 196608
flush_check_period: 5s
flush_op_timeout: 100m
lifecycler:
ring:
kvstore:
store: memberlist
replication_factor: 1
- line 209, Ensure by default auth is disabled
auth_enabled: false
- line 231, Ensure
storage.bucketNames
points toloki
,loki
&loki-admin
storage:
bucketNames:
chunks: loki
ruler: loki
admin: loki-admin
- line 283, Ensure
storage_config.boltdb_shipper
configuration is present
storage_config:
boltdb_shipper:
active_index_directory: /var/loki/boltdb-shipper-active
cache_location: /var/loki/boltdb-shipper-cache
cache_ttl: 24h
shared_store: s3
- line 343 , Ensure
enterprise.image
is pointed to registry1 image
image:
# -- The Docker registry
registry: registry1.dso.mil
# -- Docker image repository
repository: ironbank/grafana/grafana-enterprise-logs
# -- Overrides the image tag whose default is the chart's appVersion
tag: vX.X.X
- line 394, Ensure
provisioner.enabled
is set tofalse
provisioner:
# -- Whether the job should be part of the deployment
enabled: false
- line 481, Ensure all monitoring sub-components are set to
enabled: false
Including the addedmonitoring.enabled
value
monitoring:
# -- Enable BigBang integration of Monitoring components
enabled: false
-
line 572 ensure
monitoring.selfMonitoring.grafanaAgent.installOperator
is set tofalse
-
line 601, Ensure
lokiCanary.enabled
is set tofalse
lokiCanary:
enabled: false
- line 664, write pod resources set
resources:
limits:
cpu: 300m
memory: 2Gi
requests:
cpu: 300m
memory: 2Gi
- line 701, ensure at the bottom of the
write:
block, there is apodDisruptionBudget
section
## -- Application controller Pod Disruption Budget Configuration
## Ref: https://kubernetes.io/docs/tasks/run-application/configure-pdb/
podDisruptionBudget:
# -- Number of pods that are available after eviction as number or percentage (eg.: 50%)
# @default -- `""` (defaults to 0 if not specified)
minAvailable: ""
# -- Number of pods that are unavailable after eviction as number or percentage (eg.: 50%).
## Has higher precedence over `controller.pdb.minAvailable`
maxUnavailable: "1"
- line 805, legacyReadTarget set to true to give users time to migrate 2/7/23
legacyReadTarget: true
- line 819, read pod resources set
resources:
limits:
cpu: 300m
memory: 2Gi
requests:
cpu: 300m
memory: 2Gi
- line 854, ensure at the bottom of the
read:
block, there is apodDisruptionBudget
section
## -- Application controller Pod Disruption Budget Configuration
## Ref: https://kubernetes.io/docs/tasks/run-application/configure-pdb/
podDisruptionBudget:
# -- Number of pods that are available after eviction as number or percentage (eg.: 50%)
# @default -- `""` (defaults to 0 if not specified)
minAvailable: ""
# -- Number of pods that are unavailable after eviction as number or percentage (eg.: 50%).
## Has higher precedence over `controller.pdb.minAvailable`
maxUnavailable: "1"
- line 931, ensure at the bottom of the
backend:
block, there is apodDisruptionBudget
section
## -- Application controller Pod Disruption Budget Configuration
## Ref: https://kubernetes.io/docs/tasks/run-application/configure-pdb/
podDisruptionBudget:
# -- Number of pods that are available after eviction as number or percentage (eg.: 50%)
# @default -- `""` (defaults to 0 if not specified)
minAvailable: ""
# -- Number of pods that are unavailable after eviction as number or percentage (eg.: 50%).
## Has higher precedence over `controller.pdb.minAvailable`
maxUnavailable: "1"
- line 944, Ensure
singleBinary.replicas
is set to1
singleBinary:
# -- Number of replicas for the single binary
replicas: 1
- line 986, set resource requests and limits for
singleBinary
resources:
limits:
cpu: 100m
memory: 256Mi
requests:
cpu: 100m
memory: 256Mi
-
line 1071
gateway.enabled
set tofalse
by default -
line 1091, Ensure
gateway.image
is pointed to registry1 equivalent
image:
# -- The Docker registry for the gateway image
registry: registry1.dso.mil
# -- The gateway image repository
repository: ironbank/opensource/nginx/nginx
# -- The gateway image tag
tag: X.X.X
- line 1236, ensure at the bottom of the
gateway:
block, there is apodDisruptionBudget
section
## -- Application controller Pod Disruption Budget Configuration
## Ref: https://kubernetes.io/docs/tasks/run-application/configure-pdb/
podDisruptionBudget:
# -- Number of pods that are available after eviction as number or percentage (eg.: 50%)
# @default -- `""` (defaults to 0 if not specified)
minAvailable: ""
# -- Number of pods that are unavailable after eviction as number or percentage (eg.: 50%).
## Has higher precedence over `controller.pdb.minAvailable`
maxUnavailable: "1"
- line 1286 remove minio block added by upstream
replicas: 1
# Minio requires 2 to 16 drives for erasure code (drivesPerNode * replicas)
# https://docs.min.io/docs/minio-erasure-code-quickstart-guide
# Since we only have 1 replica, that means 2 drives must be used.
drivesPerNode: 2
rootUser: enterprise-logs
rootPassword: supersecret
buckets:
- name: chunks
policy: none
purge: false
- name: ruler
policy: none
purge: false
- name: admin
policy: none
purge: false
persistence:
size: 5Gi
resources:
requests:
cpu: 100m
memory: 128Mi
- line 1287 or EOF. Move extraObjects configmap block up under loki. Above Minio.
# Create extra manifests via values. Would be passed through `tpl` for templating
extraObjects: []
- line 1311, ensure the following BB values are all set under minio key:
minio:
# -- Enable minio instance support, must have minio-operator installed
enabled: false
# Override the minio service name for easier connection setup
service:
nameOverride: "minio.logging.svc.cluster.local"
# -- Minio root credentials
secrets:
name: "loki-objstore-creds"
accessKey: "minio"
secretKey: "minio123" # default key, change this!
tenant:
# -- Buckets to be provisioned to for tenant
buckets:
- name: loki
- name: loki-admin
# -- Users to to be provisioned to for tenant
users:
- name: minio-user
# -- User credentials to create for above user. Otherwise password is randomly generated.
# This auth is not required to be set or reclaimed for minio use with Loki
defaultUserCredentials:
username: "minio-user"
password: ""
## Specification for MinIO Pool(s) in this Tenant.
pools:
- servers: 1
volumesPerServer: 4
size: 750Mi
securityContext:
runAsUser: 1001
runAsGroup: 1001
fsGroup: 1001
metrics:
enabled: false
port: 9000
memory: 128M
- End of file add/verify the following blocks:
domain: bigbang.dev
istio:
enabled: false
mtls:
# STRICT = Allow only mutual TLS traffic
# PERMISSIVE = Allow both plain text and mutual TLS traffic
mode: STRICT
networkPolicies:
enabled: false
# -- Control Plane CIDR to allow init job communication to the Kubernetes API.
# Use `kubectl get endpoints kubernetes` to get the CIDR range needed for your cluster
controlPlaneCidr: 0.0.0.0/0
bbtests:
enabled: false
cypress:
artifacts: true
envs:
cypress_check_datasource: 'false'
cypress_grafana_url: 'http://monitoring-grafana.monitoring.svc.cluster.local'
scripts:
image: registry1.dso.mil/ironbank/big-bang/base:2.0.0
envs:
LOKI_URL: 'http://{{ .Values.fullnameOverride }}.{{ .Release.Namespace }}.svc:3100'
LOKI_VERSION: '{{ .Values.loki.image.tag }}'
chart/templates/tokengen/job-tokengen.yaml
- At the top of the file, at the start of the templates under the conditionals at the very top, add the following NetworkPolicy resources:
{{- if .Values.networkPolicies.enabled }}
{{- if .Values.minio.enabled }}
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: tokengen-ingress-minio
namespace: {{ .Release.Namespace }}
annotations:
"helm.sh/hook": post-install
"helm.sh/hook-weight": "-10"
"helm.sh/hook-delete-policy": hook-succeeded,hook-failed,before-hook-creation
spec:
podSelector:
matchLabels:
app: minio
app.kubernetes.io/instance: {{ .Release.Name }}
ingress:
- from:
- podSelector:
matchLabels:
{{- include "enterprise-logs.tokengenLabels" . | nindent 14 }}
{{- with .Values.enterprise.tokengen.labels }}
{{- toYaml . | nindent 14 }}
{{- end }}
ports:
- port: 9000
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: tokengen-egress-minio
namespace: {{ .Release.Namespace }}
annotations:
"helm.sh/hook": post-install
"helm.sh/hook-weight": "-10"
"helm.sh/hook-delete-policy": hook-succeeded,hook-failed,before-hook-creation
spec:
podSelector:
matchLabels:
app: minio
app.kubernetes.io/instance: {{ .Release.Name }}
egress:
- to:
- podSelector:
matchLabels:
{{- include "enterprise-logs.tokengenLabels" . | nindent 14 }}
{{- with .Values.enterprise.tokengen.labels }}
{{- toYaml . | nindent 14 }}
{{- end }}
ports:
- port: 9000
{{- end }}
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: api-egress-tokengen-job
namespace: {{ .Release.Namespace }}
annotations:
"helm.sh/hook": post-install
"helm.sh/hook-weight": "-10"
"helm.sh/hook-delete-policy": hook-succeeded,hook-failed,before-hook-creation
spec:
egress:
- to:
- ipBlock:
cidr: {{ .Values.networkPolicies.controlPlaneCidr }}
{{- if eq .Values.networkPolicies.controlPlaneCidr "0.0.0.0/0" }}
# ONLY Block requests to AWS metadata IP
except:
- 169.254.169.254/32
{{- end }}
podSelector:
matchLabels:
{{- include "enterprise-logs.tokengenLabels" . | nindent 6 }}
{{- with .Values.enterprise.tokengen.labels }}
{{- toYaml . | nindent 6 }}
{{- end }}
policyTypes:
- Egress
{{- end }}
---
chart/templates/_helpers.tpl
- On line 13 for the
$default
function, remove theternary
function and ensure the definition looks just like:
{{- $default := "loki" }
- line 181 ensure the following block for minio looks like:
{{- if .Values.minio.enabled -}}
s3:
endpoint: {{ $.Values.minio.service.nameOverride }}
bucketnames: {{ $.Values.loki.storage.bucketNames.chunks }}
secret_access_key: {{ $.Values.minio.secrets.secretKey }}
access_key_id: {{ $.Values.minio.secrets.accessKey }}
s3forcepathstyle: true
insecure: true
chart/templates/backend/poddisruptionbudget-backend.yaml
- Ensure that there is not hard-coded spec for the PDB template
{{- with .Values.backend.podDisruptionBudget.maxUnavailable }}
maxUnavailable: {{ . }}
{{- else }}
minAvailable: {{ .Values.backend.podDisruptionBudget.minAvailable | default 0 }}
{{- end }}
chart/templates/gateway/poddisruptionbudget-gateway.yaml
- Ensure that there is not hard-coded spec for the PDB template
{{- with .Values.gateway.podDisruptionBudget.maxUnavailable }}
maxUnavailable: {{ . }}
{{- else }}
minAvailable: {{ .Values.gateway.podDisruptionBudget.minAvailable | default 0 }}
{{- end }}
chart/templates/read/poddisruptionbudget-read.yaml
- Ensure that there is not hard-coded spec for the PDB template
{{- with .Values.read.podDisruptionBudget.maxUnavailable }}
maxUnavailable: {{ . }}
{{- else }}
minAvailable: {{ .Values.read.podDisruptionBudget.minAvailable | default 0 }}
{{- end }}
chart/templates/write/poddisruptionbudget-write.yaml
- Ensure that there is no hard-coded spec for the PDB template
{{- with .Values.write.podDisruptionBudget.maxUnavailable }}
maxUnavailable: {{ . }}
{{- else }}
minAvailable: {{ .Values.write.podDisruptionBudget.minAvailable | default 0 }}
{{- end }}
chart/src/dashboards/
- cd into this directory and run the following command to update the logic so the Release name is captured:
sed -i 's/(loki|enterprise-logs)/logging-loki/g' \*.json
Testing new Loki Version💣
Deploy Loki Scalable as a part of BigBang💣
helm upgrade \
--install bigbang ./bigbang/chart \
--create-namespace \
--namespace bigbang \
--values ./bigbang/chart/values.yaml \
--values ./bigbang/chart/ingress-certs.yaml \
--values ./overrides/loki.yaml \
--set gatekeeper.enabled=false \
--set clusterAuditor.enabled=false \
--set twistlock.enabled=false \
--set loki.enabled=true \
--set promtail.enabled=true \
--set logging.enabled=false \
--set eckoperator.enabled=false \
--set fluentbit.enabled=true \
--set jaeger.enabled=false \
--set tempo.enabled=true \
--set addons.minioOperator.enabled=true
overrides/loki.yaml
loki:
git:
tag: ""
branch: "my-branch-name-goes-here"
enabled: true
strategy: scalable
- Visit
https://grafana.bigbang.dev
and login - Navigate to
configuration -> Data Sources -> Loki
and then clickSave & Test
to ensure Data Source changes can be saved successfully. - Search dashboards for
Loki Dashboard Quick Search
and confirm log data is being populated/no error messages.
Deploy Loki Monolith as a part of BigBang💣
helm upgrade \
--install bigbang ./bigbang/chart \
--create-namespace \
--namespace bigbang \
--values ./bigbang/chart/values.yaml \
--values ./bigbang/chart/ingress-certs.yaml \
--values ./overrides/loki.yaml \
--set gatekeeper.enabled=false \
--set clusterAuditor.enabled=false \
--set twistlock.enabled=false \
--set loki.enabled=true \
--set promtail.enabled=true \
--set logging.enabled=false \
--set eckoperator.enabled=false \
--set fluentbit.enabled=true \
--set jaeger.enabled=false \
--set tempo.enabled=true
overrides/loki.yaml
loki:
git:
tag: ""
branch: "my-branch-name-goes-here"
enabled: true
- Visit
https://grafana.bigbang.dev
and login - Navigate to
configuration -> Data Sources -> Loki
and then clickSave & Test
to ensure Data Source changes can be saved successfully. - Search dashboards for
Loki Dashboard Quick Search
and confirm log data is being populated/no error messages.