-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Open
Description
Bug Report
Describe the bug
When running fluent-bit as a daemonset in a K8s cluster, after 20-30 minutes, gets soft lockup - CPU#3 stuck for 715s! [kubelet:15343] [kubelet:15343]. The loaded process completely loads 1 of the 12 virtual processors, causing the virtual machine to freeze. A queue of processes builds up, making it impossible to execute commands on the server. Change servers didn't help. The problem is still here. Log messages on pods didn't helps, there's no errors even in debug mode, only in VM syslog.
To Reproduce
- Rubular link if applicable:
- Example log message if applicable:
Message from syslogd@server at Oct 30 17:06:13 ...
kernel:watchdog: BUG: soft lockup - CPU#5 stuck for 715s! [flb-pipeline:15343]
Message from syslogd@server at Oct 30 17:06:13 ...
kernel:watchdog: BUG: soft lockup - CPU#3 stuck for 715s! [kubelet:32048]
- Steps to reproduce the problem: IDK how
Expected behavior
fluent-bit works
Screenshots
Your Environment
- Version used: 4.1.0
- Configuration: DaemonSet for all nodes
- Environment name and version (e.g. Kubernetes? What version?): Kubeadm kubernetes 1.31.5
- Server type and version: Linux {serverName} 6.9.5-1.res7.x86_64 Fix include paths #1 SMP PREEMPT_DYNAMIC Thu Jun 20 12:06:12 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
- Operating System and version: RHEL 7
- Filters and plugins:
[SERVICE]
Daemon Off
Flush 1
Log_Level info
Parsers_File /fluent-bit/etc/parsers.conf
Parsers_File /fluent-bit/etc/conf/custom_parsers.conf
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_Port 2020
Health_Check On
[INPUT]
Name tail
Path /var/log/containers/*.log
multiline.parser cri
Tag kube.*
Mem_Buf_Limit 100MB
Skip_Long_Lines On
Refresh_Interval 10
db /var/log/fluent-bit.db
[FILTER]
Name kubernetes
Match kube.*
Merge_Log On
Keep_Log Off
K8S-Logging.Parser On
K8S-Logging.Exclude On
[FILTER]
Name modify
Match kube.*
Copy kubernetes k8s
Add cluster production
[FILTER]
Name nest
Match kube.*
Operation lift
Nested_under k8s
Add_prefix k8s_
[FILTER]
Name nest
Match kube.*
Operation lift
Nested_under k8s_annotations
Add_prefix k8s_annotations_
[FILTER]
Name nest
Match kube.*
Operation lift
Nested_under k8s_labels
Add_prefix k8s_labels_
[FILTER]
Name modify
Match kube.*
Rename k8s_pod_name pod
Rename k8s_pod_id pod_id
Rename k8s_namespace_name namespace
Rename k8s_host host
Rename k8s_container_name container
Rename k8s_container_image image
Rename k8s_labels_app.kubernetes.io/name app
Rename k8s_annotations_log/log_new_kafka_topic log_new_kafka_topic
Rename k8s_annotations_log/log.new.kafka.topic log_new_kafka_topic
Rename X-B3-TraceId traceId
Rename x-b3-traceid traceId
Remove time
Remove _p
Remove kubernetes
Remove_wildcard k8s_
[OUTPUT]
Name kafka
Match kube.*
Brokers broker1:9092,broker2:9092,broker3:9092
Topics logs-unrouted
Topic_key log_new_kafka_topic
Dynamic_topic On
Timestamp_Key @timestamp
Timestamp_format iso8601
Format json
rdkafka.request.required.acks 1
rdkafka.sasl.mechanism SCRAM-SHA-512
rdkafka.sasl.username Fluentbit
rdkafka.sasl.password Fluentbit
rdkafka.security.protocol sasl_ssl
rdkafka.ssl.ca.location /certs/ca.crt
Additional context
It hurts because I can't ship logs to kafka and server "die" so I need to drain node for reroute apps