Troubleshooting📜
Error: Failed to Flush Chunk📜
The Fluentbit pods on the release cluster may have occasional issues reliably sending logs to Elasticsearch when ES is not tuned properly.
Warnings/errors look like:
[ warn] [engine] failed to flush chunk '1-1625056025.433132869.flb', retry in 257 seconds: task_id=788, input=storage_backlog.2 > output=es.0 (out_id=0)
[error] [output:es:es.0] HTTP status=429 URI=/_bulk, response:
{"error":{"root_cause":[{"type":"es_rejected_execution_exception","reason":"rejected execution of coordinating operation [coordinating_and_primary_bytes=105667381, replica_bytes=0, all_bytes=105667381, coordinating_operation_bytes=2480713, max_coordinating_and_primary_bytes=107374182]"}]
Fix involves increasing resource.requests, resource.limits, and heap for Elasticsearch data pods:
elasticsearchKibana:
values:
elasticsearch:
data:
resources:
requests:
cpu: 2
memory: 10Gi
limits:
cpu: 3
memory: 14Gi
heap:
min: 4g
max: 4g
Error: Cannot Increase Buffer📜
In a heavily utilized production cluster, an intermittent warning that the buffer could not be increased may appear:
[ warn] [http_client] cannot increase buffer: current=32000 requested=64768 max=32000
Fix involves increasing the Buffer_Size within the Kubernetes Filter in fluentbit/chart/values.yaml:
fluentbit:
values:
config:
filters: |
[FILTER]
Name kubernetes
Match kube.*
Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token
Merge_Log On
Merge_Log_Key log_processed
K8S-Logging.Parser On
K8S-Logging.Exclude Off
Buffer_Size 1M
Yellow ES Health Status and Unassigned Shards📜
To check cluster health:
kubectl get elasticsearch -A
To view the status of shards:
curl -XGET -H 'Content-Type: application/json' -ku "elastic:$(kubectl get secrets -n logging logging-ek-es-elastic-user -o go-template='{{.data.elastic | base64decode}}')" "https://localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason"
To fix:
kubectl port-forward svc/logging-ek-es-http -n logging 9200:9200
curl -XPUT -H 'Content-Type: application/json' -ku "elastic:$(kubectl get secrets -n logging logging-ek-es-elastic-user -o go-template='{{.data.elastic | base64decode}}')" "https://localhost:9200/_cluster/settings" -d '{ "index.routing.allocation.disable_allocation": false }'
curl -XPUT -H 'Content-Type: application/json' -ku "elastic:$(kubectl get secrets -n logging logging-ek-es-elastic-user -o go-template='{{.data.elastic | base64decode}}')" "https://localhost:9200/_cluster/settings" -d '{ "transient" : { "cluster.routing.allocation.enable" : "all" } }'
CPU/Memory Limits and Heap📜
CPU/Memory limits and heap must be configured with sufficient resources, with heap min and max equal:
elasticsearchKibana:
values:
elasticsearch:
master:
resources:
limits:
cpu: 1
memory: 4Gi
requests:
cpu: 1
memory: 4Gi
heap:
min: 2g
max: 2g
Crash Due to Low Map Count📜
vm.max_map_count must be set or Elasticsearch will crash due to default OS limits being too low. Must be set as root in /etc/sysctl.conf and verified by running sysctl vm.max_map_count. Automatically set in k3d-dev.sh.
sysctl -w vm.max_map_count=262144