Loki Development and Maintenance Guide📜
To upgrade the Loki Package📜
-
Navigate to the upstream chart repo and folder and find the tag that corresponds with the new chart version for this update.
- Check the upstream changelog for upgrade notices.
-
Checkout the
renovate/ironbank
branch -
From the root of the repo run
kpt pkg update chart@<tag> --strategy alpha-git-patch
, where tag is found in step 1 (loki ref:helm-loki-<tag>
)- Run a KPT update against the main chart folder:
# To find the chart version for the commmand below: # - Browse to the [upstream](https://github.com/grafana/loki/tree/main/production/helm/loki). # - Click on the drop-down menu on the upper left, then on Tags. # - Scroll/Search through the tags until you get to the Helm chart version tags (e.g. helm-loki-5.9.2, helm-loki-5.9.1, etc.). # - Starting with the most recent Helm chart version tag, open the Chart.yaml for the tag. If the appVersion value corresponds to the # version of Loki that Renovate detected for an upgrade, this is the correct version. So, for example, if you will be updating to chart # version helm-loki-5.9.2, your kpt command would be: # # kpt pkg update chart@helm-loki-5.9.2 --strategy alpha-git-patch kpt pkg update chart@helm-loki-<tag> --strategy alpha-git-patch
- Restore all BigBang added templates and tests:
git checkout chart/templates/bigbang/ git checkout chart/tests/ git checkout chart/dashboards git checkout chart/templates/tests
- Follow the Update main chart section of this document for a list of changes per file to be aware of, for how Big Bang differs from upstream.
-
Modify the version in
Chart.yaml
and append-bb.0
to the chart version from upstream. -
Update dependencies and binaries using
helm dependency update ./chart
-
Ensure that the minio version in chart/Chart.yaml matches the latest tag version of minio available in the Big Bang minio package Chart.yaml
-
If needed, log into registry1.
# Note, if you are using Ubuntu on WSL and get an error about storing credentials or about how `The name org.freedesktop.secrets was not # provided by any .service files` when you run the command below, install the libsecret-1-dev and gnome-keyring packages. After doing this, # you'll be prompted to set a keyring password the first time you run this command. # helm registry login https://registry1.dso.mil -u ${registry1.username}
- Pull assets and commit the binaries as well as the Chart.lock file that was generated.
# Note: You may need to resolve merge conflicts in chart/values.yaml before these commands work. Refer to the "Modifications made to upstream" # section below for hinsts on how to resolve them. Also, you need to be logged in to registry1 thorough docker. export HELM_EXPERIMENTAL_OCI=1 helm dependency update ./chart
Then log out.
helm registry logout https://registry1.dso.mil
-
-
Update
CHANGELOG.md
adding an entry for the new version and noting all changes in a list (at minimum should include- Updated <chart or dependency> to x.x.x
). -
Generate the
README.md
updates by following the guide in gluon. -
Push up your changes, add upgrade notices if applicable, validate that CI passes.
-
If there are any failures, follow the information in the pipeline to make the necessary updates.
-
Add the
debug
label to the MR for more detailed information. -
Reach out to the CODEOWNERS if needed.
-
-
As part of your MR that modifies bigbang packages, you should modify the bigbang bigbang/tests/test-values.yaml against your branch for the CI/CD MR testing by enabling your packages.
- To do this, at a minimum, you will need to follow the instructions at bigbang/docs/developer/test-package-against-bb.md with changes for Loki enabled (the below is a reference, actual changes could be more depending on what changes where made to Loki in the pakcage MR).
test-values.yaml📜
```yaml
loki:
enabled: true
git:
tag: null
branch: <my-package-branch-that-needs-testing>
values:
istio:
hardened:
enabled: true
### Additional compononents of Loki should be changed to reflect testing changes introduced in the package MR
```
- Follow the
Testing new Loki Version
section of this document for manual testing.
Update main chart📜
chart/Chart.yaml
📜
- update loki
version
andappVersion
- Ensure Big Bang version suffix is appended to chart version
- Ensure minio and gluon dependencies are present and up to date
version: $VERSION-bb.0
dependencies:
- name: minio-instance
alias: minio
version: $MINIO_VERSION
repository: file://./deps/minio
condition: minio.enabled
- name: grafana-agent-operator
alias: grafana-agent-operator
version: $GRAFANA_VERSION
repository: https://grafana.github.io/helm-charts
condition: monitoring.selfMonitoring.grafanaAgent.installOperator
- name: gluon
version: $GLUON_VERSION
repository: "oci://registry.dso.mil/platform-one/big-bang/apps/library-charts/gluon"
annotations:
bigbang.dev/applicationVersions: |
- Loki: $LOKI_APP_VERSION
chart/values.yaml
📜
- Verify that Renovate updated the loki: section with the correct value for
tag
. For example, if Renovate wants to update Loki to version 2.8.3, you should see:
loki:
# Configures the readiness probe for all of the Loki pods
readinessProbe:
httpGet:
path: /ready
port: http-metrics
initialDelaySeconds: 30
timeoutSeconds: 1
image:
# -- The Docker registry
registry: registry1.dso.mil
# -- Docker image repository
repository: ironbank/opensource/grafana/loki
# -- Overrides the image tag whose default is the chart's appVersion
tag: 2.8.3
chart/tests/*
📜
- Verify that cypress testing configuration and tests are present here. You should see contents similar to this in chart/tests/cypress/:
drwxr-xr-x 2 ubuntu ubuntu 4096 Aug 1 12:24 ./
drwxr-xr-x 4 ubuntu ubuntu 4096 Aug 1 12:24 ../
-rw-r--r-- 1 ubuntu ubuntu 86 Aug 1 12:24 cypress.json
-rw-r--r-- 1 ubuntu ubuntu 1494 Aug 1 12:24 loki-health.spec.js
And this in chart/tests/scripts/:
drwxr-xr-x 2 ubuntu ubuntu 4096 Aug 1 12:24 ./
drwxr-xr-x 4 ubuntu ubuntu 4096 Aug 1 12:24 ../
-rw-r--r-- 1 ubuntu ubuntu 2192 Aug 1 12:24 test.sh
If you are unsure or if these directories do not exist or are empty, check with the code owners.
Modifications made to upstream📜
This is a high-level list of modifications that Big Bang has made to the upstream helm chart. You can use this as as cross-check to make sure that no modifications were lost during the upgrade process.
chart/values.yaml
📜
- Ensure nameOverride is set to
logging-loki
nameOverride: logging-loki
- Ensure fullnameOverride is set to
logging-loki
fullnameOverride: logging-loki
- Ensure
private-registry
IPS is present:
imagePullSecrets:
- name: private-registry
- Ensure
deploymentMode
is set toSingleBinary
deploymentMode: SingleBinary
- Ensure the loki image is properly set:
loki:
image:
# -- The Docker registry
registry: registry1.dso.mil
# -- Docker image repository
repository: ironbank/opensource/grafana/loki
# -- Overrides the image tag whose default is the chart's appVersion
tag: vX.X.X
- Ensure
loki.auth_enabled
is set tofalse
auth_enabled: false
- Ensure
loki.commonConfig.replication_factor
is set to1
commonConfig:
replication_factor: 1
- Ensure
loki.storage.bucketNames
is set:
storage:
bucketNames:
chunks: loki
ruler: loki
admin: loki-admin
deletion: loki-deletion
- Ensure the following is present for
loki.schemaConfig
:
schemaConfig:
configs:
- from: 2022-01-11
store: boltdb-shipper
object_store: "{{ .Values.loki.storage.type }}"
schema: v12
index:
prefix: loki_index_
period: 24h
- from: 2023-08-01
store: tsdb
object_store: "{{ .Values.loki.storage.type }}"
schema: v12
index:
prefix: loki_tsdb_
period: 24h
- from: 2024-04-01
store: tsdb
object_store: "{{ .Values.loki.storage.type }}"
schema: v13
index:
prefix: loki_tsdb_
period: 24h
- Ensure the 3 lines below are present within
loki.storage_config.boltdb_shipper
:
storage_config:
boltdb_shipper:
active_index_directory: /var/loki/boltdb-shipper-active
cache_location: /var/loki/boltdb-shipper-cache
cache_ttl: 24h
- Ensure the 3 lines below are present within
loki.storage_config.tsdb_shipper
:
storage_config:
tsdb_shipper:
active_index_directory: /var/loki/tsdb-shipper-active
cache_location: /var/loki/tsdb-shipper-cache
cache_ttl: 24h
- Ensure
loki.analytics.reporting_enabled
is set tofalse
analytics:
reporting_enabled: false
- Ensure
loki.ingester
configuration is set to:
ingester:
chunk_target_size: 196608
flush_check_period: 5s
flush_op_timeout: 100m
autoforget_unhealthy: true
lifecycler:
ring:
kvstore:
store: memberlist
replication_factor: 1
- Ensure
enterprise.image
is set to the registry1 image:
image:
# -- The Docker registry
registry: registry1.dso.mil
# -- Docker image repository
repository: ironbank/grafana/grafana-enterprise-logs
# -- Overrides the image tag whose default is the chart's appVersion
tag: vX.X.X
- Ensure
enterprise.tokengen.annotations
includes:
annotations:
sidecar.istio.io/inject: "false"
- Ensure
enterprise.provisioner.enabled
is set tofalse
provisioner:
# -- Whether the job should be part of the deployment
enabled: false
- Ensure
kubectlImage
is set to the registry1 image:
kubectlImage:
# -- The Docker registry
registry: registry1.dso.mil
# -- Docker image repository
repository: ironbank/opensource/kubernetes/kubectl
# -- Overrides the image tag whose default is the chart's appVersion
tag: vX.X.X
- Ensure
test.enabled
is set tofalse
and thattest.prometheusAddress
is set to"http://prometheus:9090"
test:
enabled: false
prometheusAddress: "http://prometheus:9090"
- Ensure
lokiCanary.enabled
is set tofalse
lokiCanary:
enabled: false
- Ensure
service.automountServiceAccountToken
is set tofalse
:
serviceAccount:
# -- Set this toggle to false to opt out of automounting API credentials for the service account
automountServiceAccountToken: false
- Ensure
gateway.enabled
is set tofalse
gateway:
enabled: false
- Ensure
gateway.image
is set to the registry1 image:
image:
# -- The Docker registry for the gateway image
registry: registry1.dso.mil
# -- The gateway image repository
repository: ironbank/opensource/nginx/nginx
# -- The gateway image tag
tag: vX.X.X
- Ensure that at the bottom of the
gateway:
block, there is apodDisruptionBudget
section
## -- Application controller Pod Disruption Budget Configuration
## Ref: https://kubernetes.io/docs/tasks/run-application/configure-pdb/
podDisruptionBudget:
# -- Number of pods that are available after eviction as number or percentage (eg.: 50%)
# @default -- `""` (defaults to 0 if not specified)
minAvailable: ""
# -- Number of pods that are unavailable after eviction as number or percentage (eg.: 50%).
## Has higher precedence over `controller.pdb.minAvailable`
maxUnavailable: "1"
- Ensure
singleBinary.replicas
is set to1
singleBinary:
# -- Number of replicas for the single binary
replicas: 1
- Verify that
singleBinary.resources
is set to:
resources:
limits:
cpu: 100m
memory: 256Mi
requests:
cpu: 100m
memory: 256Mi
-
Ensure that
singleBinary.persistence.enableStatefulAutoDeletePVC
is set tofalse
. -
Ensure that
singleBinary.persistence.size
is set to12Gi
-
Ensure that
write.replicas
is set to0
:
replicas: 0
- Verify that
write.resources
are set:
resources:
limits:
cpu: 300m
memory: 2Gi
requests:
cpu: 300m
memory: 2Gi
- Ensure that at the bottom of the
write:
block, there is apodDisruptionBudget:
section
## -- Application controller Pod Disruption Budget Configuration
## Ref: https://kubernetes.io/docs/tasks/run-application/configure-pdb/
podDisruptionBudget:
# -- Number of pods that are available after eviction as number or percentage (eg.: 50%)
# @default -- `""` (defaults to 0 if not specified)
minAvailable: ""
# -- Number of pods that are unavailable after eviction as number or percentage (eg.: 50%).
## Has higher precedence over `controller.pdb.minAvailable`
maxUnavailable: "1"
- Ensure that
read.replicas
is set to0
:
replicas: 0
- Make sure
read.resources
are set to:
resources:
limits:
cpu: 300m
memory: 2Gi
requests:
cpu: 300m
memory: 2Gi
- Ensure that at the bottom of the
read:
block, there is apodDisruptionBudget
section
## -- Application controller Pod Disruption Budget Configuration
## Ref: https://kubernetes.io/docs/tasks/run-application/configure-pdb/
podDisruptionBudget:
# -- Number of pods that are available after eviction as number or percentage (eg.: 50%)
# @default -- `""` (defaults to 0 if not specified)
minAvailable: ""
# -- Number of pods that are unavailable after eviction as number or percentage (eg.: 50%).
## Has higher precedence over `controller.pdb.minAvailable`
maxUnavailable: "1"
- Ensure that
backend.replicas
is set to0
:
replicas: 0
- Ensure that at the bottom of the
backend:
block, there is apodDisruptionBudget
section
## -- Application controller Pod Disruption Budget Configuration
## Ref: https://kubernetes.io/docs/tasks/run-application/configure-pdb/
podDisruptionBudget:
# -- Number of pods that are available after eviction as number or percentage (eg.: 50%)
# @default -- `""` (defaults to 0 if not specified)
minAvailable: ""
# -- Number of pods that are unavailable after eviction as number or percentage (eg.: 50%).
## Has higher precedence over `controller.pdb.minAvailable`
maxUnavailable: "1"
- Ensure that
querier.resources
are set to:
resources:
limits:
cpu: 300m
memory: 2Gi
requests:
cpu: 300m
memory: 2Gi
- Ensure that
compactor.resources
are set to:
resources:
limits:
cpu: 300m
memory: 2Gi
requests:
cpu: 300m
memory: 2Gi
- Ensure that
compactor.serviceAccount.automountServiceAccountToken
is set tofalse
serviceAccount:
automountServiceAccountToken: false
- Ensure that
bloomGateway.serviceAccount.automountServiceAccountToken
is set tofalse
serviceAccount:
automountServiceAccountToken: false
- Ensure that
bloomCompactor.serviceAccount.automountServiceAccountToken
is set tofalse
serviceAccount:
automountServiceAccountToken: false
- Ensure that
patternIngestor.resources
are set to:
resources:
limits:
cpu: 100m
memory: 256Mi
requests:
cpu: 100m
memory: 256Mi
- Ensure that
patternIngestor.serviceAccount.automountServiceAccountToken
is set tofalse
serviceAccount:
automountServiceAccountToken: false
- Ensure that the value for
memcached.image.repository
andmemcached.image.tag
are set to valid values from registry1.
memcached:
# -- Memcached Docker image repository
image:
# -- Memcached Docker image repository
repository: registry1.dso.mil/ironbank/opensource/memcached/memcached
# -- Memcached Docker image tag
tag: X.X.X
- Ensured that
memcached.containerSecurityContext
includes the following:
fsGroup: 10001
runAsGroup: 10001
runAsNonRoot: true
runAsUser: 10001
- Ensure that
memcachedExporter.enabled
is set tofalse
.
memcachedExporter:
# -- Whether memcached metrics should be exported
enabled: false
- Ensure that
memcachedExporter.containerSecurityContext
includes the following:
fsGroup: 10001
runAsGroup: 10001
runAsNonRoot: true
runAsUser: 10001
- Ensure that
resultsCache.enabled
is set tofalse
.
resultsCache:
# -- Specifies whether memcached based results-cache should be enabled
enabled: false
- Ensure that
chunksCache.enabled
is set tofalse
.
chunksCache:
# -- Specifies whether memcached based chunks-cache should be enabled
enabled: false
** Important **📜
Before following the step below, note that if there is only one minio:
block, you shouldn’t remove it.📜
-
Remove minio block added by upstream
-
Ensure the following BB values are all set under minio key:
minio:
# -- Enable minio instance support, must have minio-operator installed
enabled: false
# Allow the address used by Loki to refer to Minio to be overridden
address: "minio.logging.svc.cluster.local"
# -- Minio root credentials
secrets:
name: "loki-objstore-creds"
accessKey: "minio"
secretKey: "minio123" # default key, change this!
tenant:
# -- Buckets to be provisioned to for tenant
buckets:
- name: loki
- name: loki-admin
# -- Users to to be provisioned to for tenant
users:
- name: minio-user
# -- User credentials to create for above user. Otherwise password is randomly generated.
# This auth is not required to be set or reclaimed for minio use with Loki
defaultUserCredentials:
username: "minio-user"
password: ""
## Specification for MinIO Pool(s) in this Tenant.
pools:
- servers: 1
volumesPerServer: 4
size: 750Mi
securityContext:
runAsUser: 1001
runAsGroup: 1001
fsGroup: 1001
metrics:
enabled: false
port: 9000
memory: 128M
- Ensure the
sidecar.image
is set to the equivalent registry1 image:
sidecar:
image:
# -- The Docker registry and image for the k8s sidecar
repository: registry1.dso.mil/ironbank/kiwigrid/k8s-sidecar
# -- Docker image tag
tag: X.X.X
# -- Docker image sha. If empty, no sha will be used
sha: ""
- Ensure the
sidecar.resources
are set to:
resources:
limits:
cpu: 100m
memory: 100Mi
requests:
cpu: 100m
memory: 100Mi
- Ensure the
sidecar.securityContext
is set to:
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
seccompProfile:
type: RuntimeDefault
- Ensure
sidecar.rules.enabled
is set tofalse
rules:
enabled: false
- At the end of file, but before the
DEPRECATED VALUES
section (if that section is present), add/verify the following blocks:
domain: dev.bigbang.mil
# Default to false, override in openshift-test-values.yaml
openshift: false
fluentbit:
enabled: false
promtail:
enabled: false
istio:
enabled: false
hardened:
enabled: false
outboundTrafficPolicyMode: "REGISTRY_ONLY"
customServiceEntries: []
# - name: "allow-google"
# enabled: true
# spec:
# hosts:
# - google.com
# - www.google.com
# location: MESH_EXTERNAL
# ports:
# - number: 443
# protocol: TLS
# name: https
# resolution: DNS
customAuthorizationPolicies: []
# - name: "allow-nothing"
# enabled: true
# spec: {}
monitoring:
enabled: true
namespaces:
- monitoring
principals:
- cluster.local/ns/monitoring/sa/monitoring-grafana
- cluster.local/ns/monitoring/sa/monitoring-monitoring-kube-alertmanager
- cluster.local/ns/monitoring/sa/monitoring-monitoring-kube-operator
- cluster.local/ns/monitoring/sa/monitoring-monitoring-kube-prometheus
- cluster.local/ns/monitoring/sa/monitoring-monitoring-kube-state-metrics
- cluster.local/ns/monitoring/sa/monitoring-monitoring-prometheus-node-exporter
promtail:
enabled: true
namespaces:
- promtail
principals:
- cluster.local/ns/promtail/sa/promtail-promtail
fluentbit:
enabled: true
namespaces:
- fluentbit
principals:
- cluster.local/ns/fluentbit/sa/fluentbit-fluent-bit
minioOperator:
enabled: true
namespaces:
- minio-operator
principals:
- cluster.local/ns/minio-operator/sa/minio-operator
loki:
enabled: false
annotations: {}
labels: {}
gateways:
- istio-system/public
hosts:
- "loki.{{ .Values.domain }}"
service: ""
port: ""
exposeReadyEndpoint: false
mtls:
# STRICT = Allow only mutual TLS traffic
# PERMISSIVE = Allow both plain text and mutual TLS traffic
mode: STRICT
networkPolicies:
enabled: false
# -- Control Plane CIDR to allow init job communication to the Kubernetes API.
# Use `kubectl get endpoints kubernetes` to get the CIDR range needed for your cluster
controlPlaneCidr: 0.0.0.0/0
ingressLabels:
app: public-ingressgateway
istio: ingressgateway
additionalPolicies: []
bbtests:
enabled: false
cypress:
artifacts: true
envs:
cypress_check_datasource: 'false'
cypress_grafana_url: 'http://monitoring-grafana.monitoring.svc.cluster.local'
scripts:
image: registry1.dso.mil/ironbank/big-bang/base:2.1.0
envs:
LOKI_URL: 'http://{{ .Values.fullnameOverride }}.{{ .Release.Namespace }}.svc:3100'
LOKI_VERSION: '{{ .Values.loki.image.tag }}'
-
In the
DEPRECATED VALUES
section (if that section is present), setmonitoring.enabled
tofalse
monitoring: # -- Enable BigBang integration of Monitoring components enabled: false
-
In the
DEPRECATED VALUES
section (if that section is present), ensure allmonitoring:
sub-components are set toenabled: false
- Ensure
monitoring.dashboards.enabled
is set tofalse
- Ensure
monitoring.rules.enabldd
is set tofalse
- Ensure
monitoring.serviceMonitor.enabled
is set tofalse
- Ensure
monitoring.serviceMonitor.metricsInstance.enabled
is set tofalse
-
Ensure
monitoring.selfMonitoring.enabled
is set tofalse
-
In the
DEPRECATED VALUES
section (if that section is present), setmonitoring.serviceMonitor.metricsInstance.enabled
tofalse
metricsInstance: # -- If enabled, MetricsInstance resources for Grafana Agent Operator are created enabled: false
-
In the
DEPRECATED VALUES
section (if that section is present), ensuremonitoring.selfMonitoring.grafanaAgent.installOperator
is set tofalse
-
In the
Chart Testing
section, ensuremonitoring.lokiCanary.enabled
is set tofalse
lokiCanary:
enabled: false
chart/ci/
📜
-
In each of the 4 files in the
chart/ci
directory (default-single-binary-values.yaml
,default-values.yaml
,ingress-values.yaml
, andlegacy-monitoring-values.yaml
), ensure that theloki.storage.bucketNames
are set to:storage: bucketNames: chunks: loki ruler: loki admin: loki-admin deletion: loki-deletion
chart/src/dashboards/
📜
-
cd
into this directory and run the following command to update the logic so the Release name is captured: -
Bash:
sed -i 's/(loki|enterprise-logs)/logging-loki/g' \*.json
Note On Mac, use GNU SED which can be installed via
brew install gnu-sed
. By default, this version of the command is invoked usinggsed
instead ofsed
.
Note This will cause changes in the following files if they haven’t already been updated:
- loki-chunks.json
- loki-deletion.json
- loki-operational.json
- loki-reads-resources.json
- loki-reads.json
- loki-retention.json
- loki-writes-resources.json
- loki-writes.json
- modify the
loki-logs.json
dashboard to maintain theexpr
for log querying (lines 775 and 840): -
775:
"expr": "sum(rate({namespace=\"$namespace\", pod=~\"$deployment.*\", pod=~\"$pod\", container=~\"$container\" } |logfmt|= \"$filter\" [5m])) by (level)",
-
840:
"expr": "{namespace=\"$namespace\", pod=~\"$deployment.*\", pod=~\"$pod\", container=~\"$container\"} | logfmt | level=\"$level\" |= \"$filter\"",
chart/templates/backend/poddisruptionbudget-backend.yaml
📜
- Ensure that there is not hard-coded spec for the PDB template
{{- with .Values.backend.podDisruptionBudget.maxUnavailable }}
maxUnavailable: {{ . }}
{{- else }}
minAvailable: {{ .Values.backend.podDisruptionBudget.minAvailable | default 0 }}
{{- end }}
chart/templates/backend/query-scheduler-discovery.yaml
📜
- Ensure that the
grpc
port specifies anappProtocol
oftcp
, as in:
- name: grpc
port: {{ .Values.loki.server.grpc_listen_port }}
targetPort: grpc
appProtocol: tcp
protocol: TCP
chart/templates/backend/service-backend-headless.yaml
📜
- Ensure that the
grpc
port specifies anappProtocol
oftcp
, as in:
- name: grpc
port: {{ .Values.loki.server.grpc_listen_port }}
targetPort: grpc
appProtocol: tcp
protocol: TCP
chart/templates/backend/service-backend.yaml
📜
- Ensure that the
grpc
port specifies anappProtocol
oftcp
, as in:
- name: grpc
port: {{ .Values.loki.server.grpc_listen_port }}
targetPort: grpc
appProtocol: tcp
protocol: TCP
chart/templates/gateway/poddisruptionbudget-gateway.yaml
📜
- Ensure that there is not hard-coded spec for the PDB template
{{- with .Values.gateway.podDisruptionBudget.maxUnavailable }}
maxUnavailable: {{ . }}
{{- else }}
minAvailable: {{ .Values.gateway.podDisruptionBudget.minAvailable | default 0 }}
{{- end }}
chart/templates/read/poddisruptionbudget-read.yaml
📜
- Ensure that there is not hard-coded spec for the PDB template
{{- with .Values.read.podDisruptionBudget.maxUnavailable }}
maxUnavailable: {{ . }}
{{- else }}
minAvailable: {{ .Values.read.podDisruptionBudget.minAvailable | default 0 }}
{{- end }}
chart/templates/backend/service-read.yaml
📜
- Ensure that the
grpc
port specifies anappProtocol
oftcp
, as in:
- name: grpc
port: {{ .Values.loki.server.grpc_listen_port }}
targetPort: grpc
appProtocol: tcp
protocol: TCP
- Ensure that
spec.publishNotReadyAddresses
is set totrue
publishNotReadyAddresses: true
chart/templates/tokengen/job-tokengen.yaml
📜
- At the top of the file, at the start of the templates under the conditionals at the very top, add the following NetworkPolicy resources:
{{- if .Values.networkPolicies.enabled }}
{{- if .Values.minio.enabled }}
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: tokengen-ingress-minio
namespace: {{ .Release.Namespace }}
annotations:
"helm.sh/hook": post-install
"helm.sh/hook-weight": "-10"
"helm.sh/hook-delete-policy": hook-succeeded,hook-failed,before-hook-creation
spec:
podSelector:
matchLabels:
app: minio
app.kubernetes.io/instance: {{ .Release.Name }}
ingress:
- from:
- podSelector:
matchLabels:
{{- include "enterprise-logs.tokengenLabels" . | nindent 14 }}
{{- with .Values.enterprise.tokengen.labels }}
{{- toYaml . | nindent 14 }}
{{- end }}
ports:
- port: 9000
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: tokengen-egress-minio
namespace: {{ .Release.Namespace }}
annotations:
"helm.sh/hook": post-install
"helm.sh/hook-weight": "-10"
"helm.sh/hook-delete-policy": hook-succeeded,hook-failed,before-hook-creation
spec:
podSelector:
matchLabels:
app: minio
app.kubernetes.io/instance: {{ .Release.Name }}
egress:
- to:
- podSelector:
matchLabels:
{{- include "enterprise-logs.tokengenLabels" . | nindent 14 }}
{{- with .Values.enterprise.tokengen.labels }}
{{- toYaml . | nindent 14 }}
{{- end }}
ports:
- port: 9000
---
{{- end }}
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: api-egress-tokengen-job
namespace: {{ .Release.Namespace }}
annotations:
"helm.sh/hook": post-install
"helm.sh/hook-weight": "-10"
"helm.sh/hook-delete-policy": hook-succeeded,hook-failed,before-hook-creation
spec:
egress:
- to:
- ipBlock:
cidr: {{ .Values.networkPolicies.controlPlaneCidr }}
{{- if eq .Values.networkPolicies.controlPlaneCidr "0.0.0.0/0" }}
# ONLY Block requests to AWS metadata IP
except:
- 169.254.169.254/32
{{- end }}
podSelector:
matchLabels:
{{- include "enterprise-logs.tokengenLabels" . | nindent 6 }}
{{- with .Values.enterprise.tokengen.labels }}
{{- toYaml . | nindent 6 }}
{{- end }}
policyTypes:
- Egress
---
{{- end }}
chart/templates/write/poddisruptionbudget-write.yaml
📜
-
Ensure that there is no hard-coded spec for the PDB template
{{- with .Values.write.podDisruptionBudget.maxUnavailable }} maxUnavailable: {{ . }} {{- else }} minAvailable: {{ .Values.write.podDisruptionBudget.minAvailable | default 0 }} {{- end }}
chart/templates/backend/service-write.yaml
📜
- Ensure that the
grpc
port specifies anappProtocol
oftcp
, as in:
- name: grpc
port: {{ .Values.loki.server.grpc_listen_port }}
targetPort: grpc
appProtocol: tcp
protocol: TCP
chart/templates/_helpers.tpl
📜
- On line 13 for the
$default
function, remove theternary
function and ensure the definition looks just like:
{{- $default := "loki" }
- On line 201, ensure the following block for minio looks like:
{{- if .Values.minio.enabled -}}
s3:
endpoint: {{ include "loki.minio" $ }}
bucketnames: {{ $.Values.loki.storage.bucketNames.chunks }}
secret_access_key: {{ $.Values.minio.secrets.secretKey }}
access_key_id: {{ $.Values.minio.secrets.accessKey }}
s3forcepathstyle: true
insecure: true
- On line 349, ensure that
s3.bucketnames
looks like:
s3:
bucketnames: {{ $.Values.loki.storage.bucketNames.ruler }}
chart/templates/service-memberlist.yaml
📜
- Ensure that the
tcp
port specifies anappProtocol
oftcp
, as in:
- name: tcp
port: 7946
targetPort: http-memberlist
protocol: TCP
appProtocol: tcp
automountServiceAccountToken📜
The mutating Kyverno policy named update-automountserviceaccounttokens is leveraged to harden all ServiceAccounts in this package with automountServiceAccountToken: false
.
This policy revokes access to the K8s API for Pods utilizing said ServiceAccounts. If a Pod truly requires access to the K8s API (for app functionality), the Pod is added to the pods:
array of the same mutating policy. This grants the Pod access to the API, and creates a Kyverno PolicyException to prevent an alert.
Testing new Loki Version📜
NOTE: For these testing steps it is good to do them on both a clean install and an upgrade. For clean install, point Loki to your branch. For an upgrade do an install with Loki pointing to the latest tag, then perform a helm upgrade with Loki pointing to your branch.
Deploy Loki Scalable as a part of BigBang📜
You will want to install with:
- Loki, Promtail, Fluentbit, Tempo, Monitoring, MinioOperator and Istio packages enabled
overrides/loki.yaml
clusterAuditor:
enabled: false
gatekeeper:
enabled: false
istioOperator:
enabled: true
istio:
enabled: true
loki:
enabled: true
monitoring:
enabled: true
values:
istio:
enabled: true
loki:
enabled: true
values:
istio:
enabled: true
git:
tag: ""
branch: "renovate/ironbank"
promtail:
enabled: true
tempo:
enabled: true
jaeger:
enabled: false
twistlock:
enabled: false
kyvernoPolicies:
values:
exclude:
any:
# Allows k3d load balancer to bypass policies.
- resources:
namespaces:
- istio-system
names:
- svclb-*
policies:
restrict-host-path-mount-pv:
parameters:
allow:
- /var/lib/rancher/k3s/storage/pvc-*
- Visit
https://grafana.dev.bigbang.mil
and login with default credentials - Navigate to
Connections -> Data Sources -> Loki
- Click
Save & Test
to ensure Data Source changes can be saved successfully. - Search dashboards for
Loki Dashboard Quick Search
and confirm log data is being populated/no error messages.
Deploy Loki Monolith as a part of BigBang📜
Loki Monolith is tested during the “package tests” stage of loki pipelines.
You will want to install with:
- Loki, Promtail, Tempo, Monitoring and Istio packages enabled
overrides/loki.yaml
clusterAuditor:
enabled: false
gatekeeper:
enabled: false
istioOperator:
enabled: true
istio:
enabled: true
monitoring:
enabled: true
loki:
enabled: true
git:
tag: ""
branch: "renovate/ironbank"
promtail:
enabled: true
tempo:
enabled: true
jaeger:
enabled: false
twistlock:
enabled: false
kyvernoPolicies:
values:
exclude:
any:
# Allows k3d load balancer to bypass policies.
- resources:
namespaces:
- istio-system
names:
- svclb-*
policies:
restrict-host-path-mount-pv:
parameters:
allow:
- /var/lib/rancher/k3s/storage/pvc-*
- Visit
https://grafana.bigbang.dev
and login with default credentials - Navigate to
Connections -> Data Sources -> Loki
- Click
Save & Test
to ensure Data Source changes can be saved successfully. - Search dashboards for
Loki Dashboard Quick Search
and confirm log data is being populated/no error messages.
When in doubt with any testing or upgrade steps, reach out to the CODEOWNERS for assistance.