Skip to content

TroubleshootingπŸ“œ

Fluentbit did not restart after Elasticsearch re-deployment/secret update.πŸ“œ

If you are having issues with Fluentbit connectivity to Elastic (caused by an Elastic redeployment/cert update) you should be able to resolve them by restarting Fluentbit. For example, run the below to cycle all pods in the Fluentbit daemonset:

kubectl rollout restart daemonset/logging-fluent-bit -n logging

Too many open filesπŸ“œ

In some environments you may see errors such as the one below:

fluent-bit [2022/07/05 16:20:10] [error] [build/plugins/in_tail/CMakeFiles/flb-plugin-in_tail.dir/compiler_depend.ts:304 errno=24] Too many open files

Currently the best way to resolve this issue is to update each of your nodes’ number of inotify resources and file watches:

/proc/sys/fs/inotify/max_user_watches
/proc/sys/fs/inotify/max_user_instances

For example, BigBang testing environments have the following values set:

cat /proc/sys/fs/inotify/max_user_instances
1024
cat /proc/sys/fs/inotify/max_user_watches
1048576

You can update the limits temporarily, with the following commands (setting the values to 1024 and 1048576 respectively in this example):

sudo sysctl fs.inotify.max_user_instances=1024
sudo sysctl fs.inotify.max_user_watches=1048576
sudo sysctl -p

In order to make the changes permanent, i.e. to persist through a node reboot:

echo "fs.inotify.max_user_instances=1024" > /etc/sysctl.d/fs-inotify-max_user_instances.conf
sysctl -w fs.inotify.max_user_instances=1024
echo "fs.inotify.max_user_watches=1048576" > /etc/sysctl.d/fs-inotify-max_user_watches.conf
sysctl -w fs.inotify.max_user_watches=1048576
sysctl -p

Checkout https://github.com/fluent/fluent-bit/issues/1777#issuecomment-562178857 for more details

If updating the above does not help to resolve the issue you can test switching to stat watch for the input config. To do this set Inotify_Watcher to false, which can be done via the below values:

config:
  inputs: |
    [INPUT]
        Name tail
        Path /var/log/containers/*.log
        # -- Excluding fluentbit logs from sending to ECK, along with gatekeeper-audit logs which are shipped by clusterAuditor.
        Exclude_Path /var/log/containers/*fluent*.log
        Parser containerd
        Tag kube.*
        Mem_Buf_Limit 50MB
        Skip_Long_Lines On
        storage.type filesystem
        Inotify_Watcher false

    [INPUT]
        Name systemd
        Tag host.*
        Systemd_Filter _SYSTEMD_UNIT=kubelet.service
        Read_From_Tail On
        storage.type filesystem

Additional details on this configuration value can be reviewed on the tail plugin configuration page. Note that there has been report of possible memory issues with stat watcher. If using this method make sure you have an alert/adequate visibility into Fluentbit memory usage.