Skip to content

Loki Development and Maintenance GuideπŸ“œ

To upgrade the Loki PackageπŸ“œ

  1. Navigate to the upstream chart repo and folder and find the tag that corresponds with the new chart version for this update.

  2. Checkout the renovate/ironbank branch

  3. From the root of the repo run kpt pkg update chart@<tag> --strategy alpha-git-patch, where tag is found in step 1 (loki ref: helm-loki-<tag>)

    • Run a KPT update against the main chart folder:
      # To find the chart version for the commmand below:
      # - Browse to the [upstream](https://github.com/grafana/loki/tree/main/production/helm/loki).
      # - Click on the drop-down menu on the upper left, then on Tags.
      # - Scroll/Search through the tags until you get to the Helm chart version tags (e.g. helm-loki-5.9.2, helm-loki-5.9.1, etc.).
      # - Starting with the most recent Helm chart version tag, open the Chart.yaml for the tag. If the appVersion value corresponds to the
      # version of Loki that Renovate detected for an upgrade, this is the correct version. So, for example, if you will be updating to chart
      # version helm-loki-5.9.2, your kpt command would be:
      #
      # kpt pkg update chart@helm-loki-5.9.2 --strategy alpha-git-patch
    
      kpt pkg update chart@helm-loki-<tag> --strategy alpha-git-patch
    
    • Restore all BigBang added templates and tests:
      git checkout chart/templates/bigbang/
      git checkout chart/tests/
      git checkout chart/dashboards
      git checkout chart/templates/tests
    
    • Follow the Update main chart section of this document for a list of changes per file to be aware of, for how Big Bang differs from upstream.
  4. Modify the version in Chart.yaml and append -bb.0 to the chart version from upstream.

  5. Update dependencies and binaries using helm dependency update ./chart

    • Ensure that the minio version in chart/Chart.yaml matches the latest tag version of minio available in the Big Bang minio package Chart.yaml

    • If needed, log into registry1.

    # Note, if you are using Ubuntu on WSL and get an error about storing credentials or about how `The name org.freedesktop.secrets was not
    # provided by any .service files` when you run the command below, install the libsecret-1-dev and gnome-keyring packages. After doing this,
    # you'll be prompted to set a keyring password the first time you run this command.
    #
    helm registry login https://registry1.dso.mil -u ${registry1.username}
    
    • Pull assets and commit the binaries as well as the Chart.lock file that was generated.
    # Note: You may need to resolve merge conflicts in chart/values.yaml before these commands work. Refer to the "Modifications made to upstream"
    # section below for hinsts on how to resolve them. Also, you need to be logged in to registry1 thorough docker.
    export HELM_EXPERIMENTAL_OCI=1
    helm dependency update ./chart
    

    Then log out.

    helm registry logout https://registry1.dso.mil
    
  6. Update CHANGELOG.md adding an entry for the new version and noting all changes in a list (at minimum should include - Updated <chart or dependency> to x.x.x).

  7. Generate the README.md updates by following the guide in gluon.

  8. Push up your changes, add upgrade notices if applicable, validate that CI passes.

    • If there are any failures, follow the information in the pipeline to make the necessary updates.

    • Add the debug label to the MR for more detailed information.

    • Reach out to the CODEOWNERS if needed.

  9. As part of your MR that modifies bigbang packages, you should modify the bigbang bigbang/tests/test-values.yaml against your branch for the CI/CD MR testing by enabling your packages.

    • To do this, at a minimum, you will need to follow the instructions at bigbang/docs/developer/test-package-against-bb.md with changes for Loki enabled (the below is a reference, actual changes could be more depending on what changes where made to Loki in the pakcage MR).

test-values.yamlπŸ“œ

```yaml
loki:
  enabled: true
  git:
    tag: null
    branch: <my-package-branch-that-needs-testing>
  values:
    istio:
      hardened:
        enabled: true
  ### Additional compononents of Loki should be changed to reflect testing changes introduced in the package MR
```
  1. Follow the Testing new Loki Version section of this document for manual testing.

Update main chartπŸ“œ

chart/Chart.yamlπŸ“œ

  • update loki version and appVersion
  • Ensure Big Bang version suffix is appended to chart version
  • Ensure minio and gluon dependencies are present and up to date
version: $VERSION-bb.0
dependencies:
  - name: minio-instance
    alias: minio
    version: $MINIO_VERSION
    repository: file://./deps/minio
    condition: minio.enabled
  - name: grafana-agent-operator
    alias: grafana-agent-operator
    version: $GRAFANA_VERSION
    repository: https://grafana.github.io/helm-charts
    condition: monitoring.selfMonitoring.grafanaAgent.installOperator
  - name: gluon
    version: $GLUON_VERSION
    repository: "oci://registry.dso.mil/platform-one/big-bang/apps/library-charts/gluon"
annotations:
  bigbang.dev/applicationVersions: |
    - Loki: $LOKI_APP_VERSION

chart/values.yamlπŸ“œ

  • Verify that Renovate updated the loki: section with the correct value for tag. For example, if Renovate wants to update Loki to version 2.8.3, you should see:
loki:
  # Configures the readiness probe for all of the Loki pods
  readinessProbe:
    httpGet:
      path: /ready
      port: http-metrics
    initialDelaySeconds: 30
    timeoutSeconds: 1
  image:
    # -- The Docker registry
    registry: registry1.dso.mil
    # -- Docker image repository
    repository: ironbank/opensource/grafana/loki
    # -- Overrides the image tag whose default is the chart's appVersion
    tag: 2.8.3

chart/tests/*πŸ“œ

  • Verify that cypress testing configuration and tests are present here. You should see contents similar to this in chart/tests/cypress/:
drwxr-xr-x 2 ubuntu ubuntu 4096 Aug  1 12:24 ./
drwxr-xr-x 4 ubuntu ubuntu 4096 Aug  1 12:24 ../
-rw-r--r-- 1 ubuntu ubuntu   86 Aug  1 12:24 cypress.json
-rw-r--r-- 1 ubuntu ubuntu 1494 Aug  1 12:24 loki-health.spec.js

And this in chart/tests/scripts/:

drwxr-xr-x 2 ubuntu ubuntu 4096 Aug  1 12:24 ./
drwxr-xr-x 4 ubuntu ubuntu 4096 Aug  1 12:24 ../
-rw-r--r-- 1 ubuntu ubuntu 2192 Aug  1 12:24 test.sh

If you are unsure or if these directories do not exist or are empty, check with the code owners.

Modifications made to upstreamπŸ“œ

This is a high-level list of modifications that Big Bang has made to the upstream helm chart. You can use this as as cross-check to make sure that no modifications were lost during the upgrade process.

chart/values.yamlπŸ“œ

  • Ensure nameOverride is set to logging-loki
nameOverride: logging-loki
  • Ensure fullnameOverride is set to logging-loki
fullnameOverride: logging-loki
  • Ensure private-registry IPS is present:
imagePullSecrets:
  - name: private-registry
  • Ensure deploymentMode is set to SingleBinary
deploymentMode: SingleBinary
  • Ensure the loki image is properly set:
loki:
  image:
    # -- The Docker registry
    registry: registry1.dso.mil
    # -- Docker image repository
    repository: ironbank/opensource/grafana/loki
    # -- Overrides the image tag whose default is the chart's appVersion
    tag: vX.X.X
  • Ensure loki.auth_enabled is set to false
auth_enabled: false
  • Ensure loki.commonConfig.replication_factor is set to 1
commonConfig:
  replication_factor: 1
  • Ensure loki.storage.bucketNames is set:
storage:
  bucketNames:
    chunks: loki
    ruler: loki
    admin: loki-admin
    deletion: loki-deletion
  • Ensure the following is present for loki.schemaConfig:
schemaConfig:
  configs:
    - from: 2022-01-11
      store: boltdb-shipper
      object_store: "{{ .Values.loki.storage.type }}"
      schema: v12
      index:
        prefix: loki_index_
        period: 24h
    - from: 2023-08-01
      store: tsdb
      object_store: "{{ .Values.loki.storage.type }}"
      schema: v12
      index:
        prefix: loki_tsdb_
        period: 24h
    - from: 2024-04-01
      store: tsdb
      object_store: "{{ .Values.loki.storage.type }}"
      schema: v13
      index:
        prefix: loki_tsdb_
        period: 24h
  • Ensure the 3 lines below are present within loki.storage_config.boltdb_shipper:
storage_config:
  boltdb_shipper:
    active_index_directory: /var/loki/boltdb-shipper-active
    cache_location: /var/loki/boltdb-shipper-cache
    cache_ttl: 24h
  • Ensure the 3 lines below are present within loki.storage_config.tsdb_shipper:
storage_config:
  tsdb_shipper:
    active_index_directory: /var/loki/tsdb-shipper-active
    cache_location: /var/loki/tsdb-shipper-cache
    cache_ttl: 24h
  • Ensure loki.analytics.reporting_enabled is set to false
analytics:
  reporting_enabled: false
  • Ensure loki.ingester configuration is set to:
ingester:
  chunk_target_size: 196608
  flush_check_period: 5s
  flush_op_timeout: 100m
  autoforget_unhealthy: true
  lifecycler:
    ring:
      kvstore:
        store: memberlist
      replication_factor: 1
  • Ensure enterprise.image is set to the registry1 image:
image:
  # -- The Docker registry
  registry: registry1.dso.mil
  # -- Docker image repository
  repository: ironbank/grafana/grafana-enterprise-logs
  # -- Overrides the image tag whose default is the chart's appVersion
  tag: vX.X.X
  • Ensure enterprise.tokengen.annotations includes:
annotations:
  sidecar.istio.io/inject: "false"
  • Ensure enterprise.provisioner.enabled is set to false
provisioner:
  # -- Whether the job should be part of the deployment
  enabled: false
  • Ensure kubectlImage is set to the registry1 image:
kubectlImage:
  # -- The Docker registry
  registry: registry1.dso.mil
  # -- Docker image repository
  repository: ironbank/opensource/kubernetes/kubectl
  # -- Overrides the image tag whose default is the chart's appVersion
  tag: vX.X.X
  • Ensure test.enabled is set to false and that test.prometheusAddress is set to "http://prometheus:9090"
test:
  enabled: false
  prometheusAddress: "http://prometheus:9090"
  • Ensure lokiCanary.enabled is set to false
lokiCanary:
  enabled: false
  • Ensure service.automountServiceAccountToken is set to false:
serviceAccount:
  # -- Set this toggle to false to opt out of automounting API credentials for the service account
  automountServiceAccountToken: false
  • Ensure gateway.enabled is set to false
gateway:
  enabled: false
  • Ensure gateway.image is set to the registry1 image:
image:
  # -- The Docker registry for the gateway image
  registry: registry1.dso.mil
  # -- The gateway image repository
  repository: ironbank/opensource/nginx/nginx
  # -- The gateway image tag
  tag: vX.X.X
  • Ensure that at the bottom of the gateway: block, there is a podDisruptionBudget section
  ## -- Application controller Pod Disruption Budget Configuration
  ## Ref: https://kubernetes.io/docs/tasks/run-application/configure-pdb/
  podDisruptionBudget:
    # -- Number of pods that are available after eviction as number or percentage (eg.: 50%)
    # @default -- `""` (defaults to 0 if not specified)
    minAvailable: ""
    # -- Number of pods that are unavailable after eviction as number or percentage (eg.: 50%).
    ## Has higher precedence over `controller.pdb.minAvailable`
    maxUnavailable: "1"
  • Ensure singleBinary.replicas is set to 1
singleBinary:
  # -- Number of replicas for the single binary
  replicas: 1
  • Verify that singleBinary.resources is set to:
resources:
  limits:
    cpu: 100m
    memory: 256Mi
  requests:
    cpu: 100m
    memory: 256Mi
  • Ensure that singleBinary.persistence.enableStatefulAutoDeletePVC is set to false.

  • Ensure that singleBinary.persistence.size is set to 12Gi

  • Ensure that write.replicas is set to 0:

replicas: 0
  • Verify that write.resources are set:
resources:
  limits:
    cpu: 300m
    memory: 2Gi
  requests:
    cpu: 300m
    memory: 2Gi
  • Ensure that at the bottom of the write: block, there is a podDisruptionBudget: section
## -- Application controller Pod Disruption Budget Configuration
## Ref: https://kubernetes.io/docs/tasks/run-application/configure-pdb/
podDisruptionBudget:
  # -- Number of pods that are available after eviction as number or percentage (eg.: 50%)
  # @default -- `""` (defaults to 0 if not specified)
  minAvailable: ""
  # -- Number of pods that are unavailable after eviction as number or percentage (eg.: 50%).
  ## Has higher precedence over `controller.pdb.minAvailable`
  maxUnavailable: "1"
  • Ensure that read.replicas is set to 0:
replicas: 0
  • Make sure read.resources are set to:
resources:
  limits:
    cpu: 300m
    memory: 2Gi
  requests:
    cpu: 300m
    memory: 2Gi
  • Ensure that at the bottom of the read: block, there is a podDisruptionBudget section
## -- Application controller Pod Disruption Budget Configuration
## Ref: https://kubernetes.io/docs/tasks/run-application/configure-pdb/
podDisruptionBudget:
  # -- Number of pods that are available after eviction as number or percentage (eg.: 50%)
  # @default -- `""` (defaults to 0 if not specified)
  minAvailable: ""
  # -- Number of pods that are unavailable after eviction as number or percentage (eg.: 50%).
  ## Has higher precedence over `controller.pdb.minAvailable`
  maxUnavailable: "1"
  • Ensure that backend.replicas is set to 0:
replicas: 0
  • Ensure that at the bottom of the backend: block, there is a podDisruptionBudget section
## -- Application controller Pod Disruption Budget Configuration
## Ref: https://kubernetes.io/docs/tasks/run-application/configure-pdb/
podDisruptionBudget:
  # -- Number of pods that are available after eviction as number or percentage (eg.: 50%)
  # @default -- `""` (defaults to 0 if not specified)
  minAvailable: ""
  # -- Number of pods that are unavailable after eviction as number or percentage (eg.: 50%).
  ## Has higher precedence over `controller.pdb.minAvailable`
  maxUnavailable: "1"
  • Ensure that querier.resources are set to:
resources:
  limits:
    cpu: 300m
    memory: 2Gi
  requests:
    cpu: 300m
    memory: 2Gi
  • Ensure that compactor.resources are set to:
resources:
  limits:
    cpu: 300m
    memory: 2Gi
  requests:
    cpu: 300m
    memory: 2Gi
  • Ensure that compactor.serviceAccount.automountServiceAccountToken is set to false
serviceAccount:
  automountServiceAccountToken: false
  • Ensure that bloomGateway.serviceAccount.automountServiceAccountToken is set to false
serviceAccount:
  automountServiceAccountToken: false
  • Ensure that bloomCompactor.serviceAccount.automountServiceAccountToken is set to false
serviceAccount:
  automountServiceAccountToken: false
  • Ensure that patternIngestor.resources are set to:
resources:
  limits:
    cpu: 100m
    memory: 256Mi
  requests:
    cpu: 100m
    memory: 256Mi
  • Ensure that patternIngestor.serviceAccount.automountServiceAccountToken is set to false
serviceAccount:
  automountServiceAccountToken: false
  • Ensure that the value for memcached.image.repository and memcached.image.tag are set to valid values from registry1.
memcached:
  # -- Memcached Docker image repository
  image:
    # -- Memcached Docker image repository
    repository: registry1.dso.mil/ironbank/opensource/memcached/memcached
    # -- Memcached Docker image tag
    tag: X.X.X
  • Ensured that memcached.containerSecurityContext includes the following:
fsGroup: 10001
runAsGroup: 10001
runAsNonRoot: true
runAsUser: 10001
  • Ensure that memcachedExporter.enabled is set to false.
memcachedExporter:
  # -- Whether memcached metrics should be exported
  enabled: false
  • Ensure that memcachedExporter.containerSecurityContext includes the following:
fsGroup: 10001
runAsGroup: 10001
runAsNonRoot: true
runAsUser: 10001
  • Ensure that resultsCache.enabled is set to false.
resultsCache:
  # -- Specifies whether memcached based results-cache should be enabled
  enabled: false
  • Ensure that chunksCache.enabled is set to false.
chunksCache:
  # -- Specifies whether memcached based chunks-cache should be enabled
  enabled: false

** Important **πŸ“œ

Before following the step below, note that if there is only one minio: block, you shouldn’t remove it.πŸ“œ

  • Remove minio block added by upstream

  • Ensure the following BB values are all set under minio key:

minio:
  # -- Enable minio instance support, must have minio-operator installed
  enabled: false
  # Allow the address used by Loki to refer to Minio to be overridden
  address: "minio.logging.svc.cluster.local"
  # -- Minio root credentials
  secrets:
    name: "loki-objstore-creds"
    accessKey: "minio"
    secretKey: "minio123" # default key, change this!
  tenant:
    # -- Buckets to be provisioned to for tenant
    buckets:
      - name: loki
      - name: loki-admin
    # -- Users to to be provisioned to for tenant
    users:
      - name: minio-user
    # -- User credentials to create for above user. Otherwise password is randomly generated.
    # This auth is not required to be set or reclaimed for minio use with Loki
    defaultUserCredentials:
      username: "minio-user"
      password: ""
    ## Specification for MinIO Pool(s) in this Tenant.
    pools:
      - servers: 1
        volumesPerServer: 4
        size: 750Mi
        securityContext:
          runAsUser: 1001
          runAsGroup: 1001
          fsGroup: 1001
    metrics:
      enabled: false
      port: 9000
      memory: 128M
  • Ensure the sidecar.image is set to the equivalent registry1 image:
sidecar:
  image:
  # -- The Docker registry and image for the k8s sidecar
  repository: registry1.dso.mil/ironbank/kiwigrid/k8s-sidecar
  # -- Docker image tag
  tag: X.X.X
  # -- Docker image sha. If empty, no sha will be used
  sha: ""
  • Ensure the sidecar.resources are set to:
resources:
  limits:
    cpu: 100m
    memory: 100Mi
  requests:
    cpu: 100m
    memory: 100Mi
  • Ensure the sidecar.securityContext is set to:
securityContext:
  allowPrivilegeEscalation: false
  capabilities:
    drop:
    - ALL
  seccompProfile:
    type: RuntimeDefault
  • Ensure sidecar.rules.enabled is set to false
rules:
  enabled: false
  • At the end of file, but before the DEPRECATED VALUES section (if that section is present), add/verify the following blocks:
domain: dev.bigbang.mil

# Default to false, override in openshift-test-values.yaml
openshift: false

fluentbit:
  enabled: false
promtail:
  enabled: false

istio:
  enabled: false
  hardened:
    enabled: false
    outboundTrafficPolicyMode: "REGISTRY_ONLY"
    customServiceEntries: []
      # - name: "allow-google"
      #   enabled: true
      #   spec:
      #     hosts:
      #       - google.com
      #       - www.google.com
      #     location: MESH_EXTERNAL
      #     ports:
      #       - number: 443
      #         protocol: TLS
      #         name: https
      #         resolution: DNS
    customAuthorizationPolicies: []
    # - name: "allow-nothing"
    #   enabled: true
    #   spec: {}
    monitoring:
      enabled: true
      namespaces:
        - monitoring
      principals:
        - cluster.local/ns/monitoring/sa/monitoring-grafana
        - cluster.local/ns/monitoring/sa/monitoring-monitoring-kube-alertmanager
        - cluster.local/ns/monitoring/sa/monitoring-monitoring-kube-operator
        - cluster.local/ns/monitoring/sa/monitoring-monitoring-kube-prometheus
        - cluster.local/ns/monitoring/sa/monitoring-monitoring-kube-state-metrics
        - cluster.local/ns/monitoring/sa/monitoring-monitoring-prometheus-node-exporter
    promtail:
      enabled: true
      namespaces:
      - promtail
      principals:
      - cluster.local/ns/promtail/sa/promtail-promtail
    fluentbit:
      enabled: true
      namespaces:
      - fluentbit
      principals:
      - cluster.local/ns/fluentbit/sa/fluentbit-fluent-bit
    minioOperator:
      enabled: true
      namespaces:
      - minio-operator
      principals:
      - cluster.local/ns/minio-operator/sa/minio-operator
  loki:
    enabled: false
    annotations: {}
    labels: {}
    gateways:
      - istio-system/public
    hosts:
      - "loki.{{ .Values.domain }}"
    service: ""
    port: ""
    exposeReadyEndpoint: false
  mtls:
    # STRICT = Allow only mutual TLS traffic
    # PERMISSIVE = Allow both plain text and mutual TLS traffic
    mode: STRICT

networkPolicies:
  enabled: false
  # -- Control Plane CIDR to allow init job communication to the Kubernetes API.
  # Use `kubectl get endpoints kubernetes` to get the CIDR range needed for your cluster
  controlPlaneCidr: 0.0.0.0/0
  ingressLabels:
    app: public-ingressgateway
    istio: ingressgateway
  additionalPolicies: []

bbtests:
  enabled: false
  cypress:
    artifacts: true
    envs:
      cypress_check_datasource: 'false'
      cypress_grafana_url: 'http://monitoring-grafana.monitoring.svc.cluster.local'
  scripts:
    image: registry1.dso.mil/ironbank/big-bang/base:2.1.0
    envs:
      LOKI_URL: 'http://{{ .Values.fullnameOverride }}.{{ .Release.Namespace }}.svc:3100'
      LOKI_VERSION: '{{ .Values.loki.image.tag }}'
  • In the DEPRECATED VALUES section (if that section is present), set monitoring.enabled to false

    monitoring:
      # -- Enable BigBang integration of Monitoring components
      enabled: false
    
  • In the DEPRECATED VALUES section (if that section is present), ensure all monitoring: sub-components are set to enabled: false

  • Ensure monitoring.dashboards.enabled is set to false
  • Ensure monitoring.rules.enabldd is set to false
  • Ensure monitoring.serviceMonitor.enabled is set to false
  • Ensure monitoring.serviceMonitor.metricsInstance.enabled is set to false
  • Ensure monitoring.selfMonitoring.enabled is set to false

  • In the DEPRECATED VALUES section (if that section is present), set monitoring.serviceMonitor.metricsInstance.enabled to false

    metricsInstance:
      # -- If enabled, MetricsInstance resources for Grafana Agent Operator are created
      enabled: false
    
  • In the DEPRECATED VALUES section (if that section is present), ensure monitoring.selfMonitoring.grafanaAgent.installOperator is set to false

  • In the Chart Testing section, ensure monitoring.lokiCanary.enabled is set to false

lokiCanary:
  enabled: false

chart/ci/πŸ“œ

  • In each of the 4 files in the chart/ci directory (default-single-binary-values.yaml, default-values.yaml, ingress-values.yaml, and legacy-monitoring-values.yaml), ensure that the loki.storage.bucketNames are set to:

    storage:
      bucketNames:
        chunks: loki
        ruler: loki
        admin: loki-admin
        deletion: loki-deletion
    

chart/src/dashboards/πŸ“œ

  • cd into this directory and run the following command to update the logic so the Release name is captured:

  • Bash:

    sed -i 's/(loki|enterprise-logs)/logging-loki/g' \*.json
    

    Note On Mac, use GNU SED which can be installed via brew install gnu-sed. By default, this version of the command is invoked using gsed instead of sed.

Note This will cause changes in the following files if they haven’t already been updated: - loki-chunks.json - loki-deletion.json - loki-operational.json - loki-reads-resources.json - loki-reads.json - loki-retention.json - loki-writes-resources.json - loki-writes.json

  • modify the loki-logs.json dashboard to maintain the expr for log querying (lines 775 and 840):
  • 775:

    "expr": "sum(rate({namespace=\"$namespace\", pod=~\"$deployment.*\", pod=~\"$pod\", container=~\"$container\" } |logfmt|= \"$filter\" [5m])) by (level)",
    
  • 840:

    "expr": "{namespace=\"$namespace\", pod=~\"$deployment.*\", pod=~\"$pod\", container=~\"$container\"} | logfmt | level=\"$level\" |= \"$filter\"",
    

chart/templates/backend/poddisruptionbudget-backend.yamlπŸ“œ

  • Ensure that there is not hard-coded spec for the PDB template
  {{- with .Values.backend.podDisruptionBudget.maxUnavailable }}
  maxUnavailable: {{ . }}
  {{- else }}
  minAvailable: {{ .Values.backend.podDisruptionBudget.minAvailable | default 0 }}
  {{- end }}

chart/templates/backend/query-scheduler-discovery.yamlπŸ“œ

  • Ensure that the grpc port specifies an appProtocol of tcp, as in:
- name: grpc
  port: {{ .Values.loki.server.grpc_listen_port }}
  targetPort: grpc
  appProtocol: tcp
  protocol: TCP

chart/templates/backend/service-backend-headless.yamlπŸ“œ

  • Ensure that the grpc port specifies an appProtocol of tcp, as in:
- name: grpc
  port: {{ .Values.loki.server.grpc_listen_port }}
  targetPort: grpc
  appProtocol: tcp
  protocol: TCP

chart/templates/backend/service-backend.yamlπŸ“œ

  • Ensure that the grpc port specifies an appProtocol of tcp, as in:
- name: grpc
  port: {{ .Values.loki.server.grpc_listen_port }}
  targetPort: grpc
  appProtocol: tcp
  protocol: TCP

chart/templates/gateway/poddisruptionbudget-gateway.yamlπŸ“œ

  • Ensure that there is not hard-coded spec for the PDB template
  {{- with .Values.gateway.podDisruptionBudget.maxUnavailable }}
  maxUnavailable: {{ . }}
  {{- else }}
  minAvailable: {{ .Values.gateway.podDisruptionBudget.minAvailable | default 0 }}
  {{- end }}

chart/templates/read/poddisruptionbudget-read.yamlπŸ“œ

  • Ensure that there is not hard-coded spec for the PDB template
  {{- with .Values.read.podDisruptionBudget.maxUnavailable }}
  maxUnavailable: {{ . }}
  {{- else }}
  minAvailable: {{ .Values.read.podDisruptionBudget.minAvailable | default 0 }}
  {{- end }}

chart/templates/backend/service-read.yamlπŸ“œ

  • Ensure that the grpc port specifies an appProtocol of tcp, as in:
- name: grpc
  port: {{ .Values.loki.server.grpc_listen_port }}
  targetPort: grpc
  appProtocol: tcp
  protocol: TCP
  • Ensure that spec.publishNotReadyAddresses is set to true
publishNotReadyAddresses: true

chart/templates/tokengen/job-tokengen.yamlπŸ“œ

  • At the top of the file, at the start of the templates under the conditionals at the very top, add the following NetworkPolicy resources:
{{- if .Values.networkPolicies.enabled }}
{{- if .Values.minio.enabled }}
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: tokengen-ingress-minio
  namespace: {{ .Release.Namespace }}
  annotations:
    "helm.sh/hook": post-install
    "helm.sh/hook-weight": "-10"
    "helm.sh/hook-delete-policy": hook-succeeded,hook-failed,before-hook-creation
spec:
  podSelector:
    matchLabels:
      app: minio
      app.kubernetes.io/instance: {{ .Release.Name }}
  ingress:
    - from:
      - podSelector:
          matchLabels:
            {{- include "enterprise-logs.tokengenLabels" . | nindent 14 }}
            {{- with .Values.enterprise.tokengen.labels }}
            {{- toYaml . | nindent 14 }}
            {{- end }}
      ports:
        - port: 9000
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: tokengen-egress-minio
  namespace: {{ .Release.Namespace }}
  annotations:
    "helm.sh/hook": post-install
    "helm.sh/hook-weight": "-10"
    "helm.sh/hook-delete-policy": hook-succeeded,hook-failed,before-hook-creation
spec:
  podSelector:
    matchLabels:
      app: minio
      app.kubernetes.io/instance: {{ .Release.Name }}
  egress:
    - to:
      - podSelector:
          matchLabels:
            {{- include "enterprise-logs.tokengenLabels" . | nindent 14 }}
            {{- with .Values.enterprise.tokengen.labels }}
            {{- toYaml . | nindent 14 }}
            {{- end }}
      ports:
        - port: 9000
---
{{- end }}
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: api-egress-tokengen-job
  namespace: {{ .Release.Namespace }}
  annotations:
    "helm.sh/hook": post-install
    "helm.sh/hook-weight": "-10"
    "helm.sh/hook-delete-policy": hook-succeeded,hook-failed,before-hook-creation
spec:
  egress:
  - to:
    - ipBlock:
        cidr: {{ .Values.networkPolicies.controlPlaneCidr }}
        {{- if eq .Values.networkPolicies.controlPlaneCidr "0.0.0.0/0" }}
        # ONLY Block requests to AWS metadata IP
        except:
        - 169.254.169.254/32
        {{- end }}
  podSelector:
    matchLabels:
      {{- include "enterprise-logs.tokengenLabels" . | nindent 6 }}
      {{- with .Values.enterprise.tokengen.labels }}
      {{- toYaml . | nindent 6 }}
      {{- end }}
  policyTypes:
  - Egress
---
{{- end }}

chart/templates/write/poddisruptionbudget-write.yamlπŸ“œ

  • Ensure that there is no hard-coded spec for the PDB template

      {{- with .Values.write.podDisruptionBudget.maxUnavailable }}
      maxUnavailable: {{ . }}
      {{- else }}
      minAvailable: {{ .Values.write.podDisruptionBudget.minAvailable | default 0 }}
      {{- end }}
    

chart/templates/backend/service-write.yamlπŸ“œ

  • Ensure that the grpc port specifies an appProtocol of tcp, as in:
- name: grpc
  port: {{ .Values.loki.server.grpc_listen_port }}
  targetPort: grpc
  appProtocol: tcp
  protocol: TCP

chart/templates/_helpers.tplπŸ“œ

  • On line 13 for the $default function, remove the ternary function and ensure the definition looks just like:
{{- $default := "loki" }
  • On line 201, ensure the following block for minio looks like:
{{- if .Values.minio.enabled -}}
s3:
  endpoint: {{ include "loki.minio" $ }}
  bucketnames: {{ $.Values.loki.storage.bucketNames.chunks }}
  secret_access_key: {{ $.Values.minio.secrets.secretKey }}
  access_key_id: {{ $.Values.minio.secrets.accessKey }}
  s3forcepathstyle: true
  insecure: true
  • On line 349, ensure that s3.bucketnames looks like:
s3:
  bucketnames: {{ $.Values.loki.storage.bucketNames.ruler }}

chart/templates/service-memberlist.yamlπŸ“œ

  • Ensure that the tcp port specifies an appProtocol of tcp, as in:
- name: tcp
  port: 7946
  targetPort: http-memberlist
  protocol: TCP
  appProtocol: tcp

automountServiceAccountTokenπŸ“œ

The mutating Kyverno policy named update-automountserviceaccounttokens is leveraged to harden all ServiceAccounts in this package with automountServiceAccountToken: false.

This policy revokes access to the K8s API for Pods utilizing said ServiceAccounts. If a Pod truly requires access to the K8s API (for app functionality), the Pod is added to the pods: array of the same mutating policy. This grants the Pod access to the API, and creates a Kyverno PolicyException to prevent an alert.

Testing new Loki VersionπŸ“œ

NOTE: For these testing steps it is good to do them on both a clean install and an upgrade. For clean install, point Loki to your branch. For an upgrade do an install with Loki pointing to the latest tag, then perform a helm upgrade with Loki pointing to your branch.

Deploy Loki Scalable as a part of BigBangπŸ“œ

You will want to install with:

  • Loki, Promtail, Fluentbit, Tempo, Monitoring, MinioOperator and Istio packages enabled

overrides/loki.yaml

clusterAuditor:
  enabled: false

gatekeeper:
  enabled: false

istioOperator:
  enabled: true

istio:
  enabled: true
  loki:
    enabled: true

monitoring:
  enabled: true
  values:
    istio:
      enabled: true

loki:
  enabled: true
  values:
    istio:
      enabled: true
  git:
    tag: ""
    branch: "renovate/ironbank"

promtail:
  enabled: true

tempo:
  enabled: true

jaeger:
  enabled: false

twistlock:
  enabled: false

kyvernoPolicies:
  values:
    exclude:
      any:
      # Allows k3d load balancer to bypass policies.
      - resources:
          namespaces:
          - istio-system
          names:
          - svclb-*
    policies:
      restrict-host-path-mount-pv:
        parameters:
          allow:
          - /var/lib/rancher/k3s/storage/pvc-*
  • Visit https://grafana.dev.bigbang.mil and login with default credentials
  • Navigate to Connections -> Data Sources -> Loki
  • Click Save & Test to ensure Data Source changes can be saved successfully.
  • Search dashboards for Loki Dashboard Quick Search and confirm log data is being populated/no error messages.

Deploy Loki Monolith as a part of BigBangπŸ“œ

Loki Monolith is tested during the β€œpackage tests” stage of loki pipelines.

You will want to install with:

  • Loki, Promtail, Tempo, Monitoring and Istio packages enabled

overrides/loki.yaml

clusterAuditor:
  enabled: false

gatekeeper:
  enabled: false

istioOperator:
  enabled: true

istio:
  enabled: true

monitoring:
  enabled: true

loki:
  enabled: true
  git:
    tag: ""
    branch: "renovate/ironbank"

promtail:
  enabled: true

tempo:
  enabled: true

jaeger:
  enabled: false

twistlock:
  enabled: false

kyvernoPolicies:
  values:
    exclude:
      any:
      # Allows k3d load balancer to bypass policies.
      - resources:
          namespaces:
          - istio-system
          names:
          - svclb-*
    policies:
      restrict-host-path-mount-pv:
        parameters:
          allow:
          - /var/lib/rancher/k3s/storage/pvc-*
  • Visit https://grafana.bigbang.dev and login with default credentials
  • Navigate to Connections -> Data Sources -> Loki
  • Click Save & Test to ensure Data Source changes can be saved successfully.
  • Search dashboards for Loki Dashboard Quick Search and confirm log data is being populated/no error messages.

When in doubt with any testing or upgrade steps, reach out to the CODEOWNERS for assistance.