Troubleshooting

Troubleshooting📜

Error Failed to Flush Chunk📜

The Fluentbit pods on the Release Cluster may have ocasional issues with reliably sending their 2000+ logs per minute to ElasticSearch because ES is not tuned properly

Warnings/Errors should look like:

[ warn] [engine] failed to flush chunk '1-1625056025.433132869.flb', retry in 257 seconds: task_id=788, input=storage_backlog.2 > output=es.0 (out_id=0)
[error] [output:es:es.0] HTTP status=429 URI=/_bulk, response:
{"error":{"root_cause":[{"type":"es_rejected_execution_exception","reason":"rejected execution of coordinating operation [coordinating_and_primary_bytes=105667381, replica_bytes=0, all_bytes=105667381, coordinating_operation_bytes=2480713, max_coordinating_and_primary_bytes=107374182]"}]

Fix involves increasing resource.requests, resource.limits, and heap for ElasticSearch data pods in chart/values.yaml

logging:
  values:
    elasticsearch:
      data:
        resources:
          requests:
            cpu: 2
            memory: 10Gi
          limits:
            cpu: 3
            memory: 14Gi
        heap:
          min: 4g
          max: 4g

Error Cannot Increase Buffer📜

In a heavily utilized production cluser, an intermitent warning that the buffer could not be increased may appear

Warning:

[ warn] [http_client] cannot increase buffer: current=32000 requested=64768 max=32000

Fix involves increasing the Buffer_Size within the Kubernetes Filter in fluentbit/chart/values.yaml

fluentbit:
  values:
    config:
      filters: |
        [FILTER]
            Name kubernetes
            Match kube.*
            Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
            Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token
            Merge_Log On
            Merge_Log_Key log_processed
            K8S-Logging.Parser On
            K8S-Logging.Exclude Off
            Buffer_Size 1M

Yellow ES Health Status and Unassigned Shards📜

To check Cluster Health run:

kubectl get elasticsearch -A

To view the status of shards run:

curl -XGET -H 'Content-Type: application/json' -ku "elastic:$(kubectl get secrets -n logging logging-ek-es-elastic-user -o go-template='{{.data.elastic | base64decode}}')" "https://localhost:9200/_cat/shards?h=index,shard,prirep,state,un
assigned.reason"

To fix, run the following commands:

kubectl port-forward svc/logging-ek-es-http -n logging 9200:9200

curl -XPUT -H 'Content-Type: application/json' -ku "elastic:$(kubectl get secrets -n logging logging-ek-es-elastic-user -o go-template='{{.data.elastic | base64decode}}')" "https://localhost:9200/_cluster/settings" -d '{ "index.routing.allocation.disable_allocation": false }'

curl -XPUT -H 'Content-Type: application/json' -ku "elastic:$(kubectl get secrets -n logging logging-ek-es-elastic-user -o go-template='{{.data.elastic | base64decode}}')" "https://localhost:9200/_cluster/settings" -d '{ "transient" : { "cluster.routing.allocation.enable" : "all" } }'

CPU/Memory Limits and Heap📜

CPU/Memory limits and Heap must be configured to match and have sufficient resources with the heap min and max equal in chart/values.yaml

  master:
    resources:
      limits:
        cpu: 1
        memory: 4Gi
      requests:
        cpu: 1
        memory: 4Gi
    heap:
      # -- Elasticsearch master Java heap Xms setting
      min: 2g
      # -- Elasticsearch master Java heap Xmx setting
      max: 2g

Crash due to too low map count📜

VM Max Map Count must be set or it will result in a crash due to the default OS limits on nmap being too low Must be set as root in /etc/sysctl.conf and verified by running sysctl vm.max_map_count Automatically set in k3d-dev.sh

sysctl -w vm.max_map_count=262144