Skip to content

kernel:watchdog: BUG: soft lockup flb-pipeline 4.1.0 #11093

@DuraCHYo

Description

@DuraCHYo

Bug Report

Describe the bug
When running fluent-bit as a daemonset in a K8s cluster, after 20-30 minutes, gets soft lockup - CPU#3 stuck for 715s! [kubelet:15343] [kubelet:15343]. The loaded process completely loads 1 of the 12 virtual processors, causing the virtual machine to freeze. A queue of processes builds up, making it impossible to execute commands on the server. Change servers didn't help. The problem is still here. Log messages on pods didn't helps, there's no errors even in debug mode, only in VM syslog.

To Reproduce

  • Rubular link if applicable:
  • Example log message if applicable:
Message from syslogd@server at Oct 30 17:06:13 ...
kernel:watchdog: BUG: soft lockup - CPU#5 stuck for 715s! [flb-pipeline:15343]
Message from syslogd@server at Oct 30 17:06:13 ...
kernel:watchdog: BUG: soft lockup - CPU#3 stuck for 715s! [kubelet:32048]
  • Steps to reproduce the problem: IDK how

Expected behavior
fluent-bit works

Screenshots

Image Image Image Image Image

Your Environment

  • Version used: 4.1.0
  • Configuration: DaemonSet for all nodes
  • Environment name and version (e.g. Kubernetes? What version?): Kubeadm kubernetes 1.31.5
  • Server type and version: Linux {serverName} 6.9.5-1.res7.x86_64 Fix include paths #1 SMP PREEMPT_DYNAMIC Thu Jun 20 12:06:12 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
  • Operating System and version: RHEL 7
  • Filters and plugins:
[SERVICE]
    Daemon Off
    Flush 1
    Log_Level info
    Parsers_File /fluent-bit/etc/parsers.conf
    Parsers_File /fluent-bit/etc/conf/custom_parsers.conf
    HTTP_Server On
    HTTP_Listen 0.0.0.0
    HTTP_Port 2020
    Health_Check On

[INPUT]
    Name tail
    Path /var/log/containers/*.log
    multiline.parser cri
    Tag kube.*
    Mem_Buf_Limit 100MB
    Skip_Long_Lines On
    Refresh_Interval 10
    db /var/log/fluent-bit.db
[FILTER]
    Name kubernetes
    Match kube.*
    Merge_Log On
    Keep_Log Off
    K8S-Logging.Parser On
    K8S-Logging.Exclude On

[FILTER]
    Name modify
    Match kube.*
    Copy kubernetes k8s
    Add cluster production

[FILTER]
    Name          nest
    Match         kube.*
    Operation     lift
    Nested_under  k8s
    Add_prefix    k8s_

[FILTER]
    Name          nest
    Match         kube.*
    Operation     lift
    Nested_under  k8s_annotations
    Add_prefix    k8s_annotations_
[FILTER]
    Name          nest
    Match         kube.*
    Operation     lift
    Nested_under  k8s_labels
    Add_prefix    k8s_labels_

[FILTER]
    Name modify
    Match kube.*
    Rename k8s_pod_name pod
    Rename k8s_pod_id pod_id
    Rename k8s_namespace_name namespace
    Rename k8s_host host
    Rename k8s_container_name container
    Rename k8s_container_image image
    Rename k8s_labels_app.kubernetes.io/name app
    Rename k8s_annotations_log/log_new_kafka_topic log_new_kafka_topic
    Rename k8s_annotations_log/log.new.kafka.topic log_new_kafka_topic    
    Rename X-B3-TraceId traceId
    Rename x-b3-traceid traceId
    Remove time
    Remove _p
    Remove kubernetes
    Remove_wildcard k8s_      

[OUTPUT]
    Name kafka
    Match kube.*
    Brokers broker1:9092,broker2:9092,broker3:9092
    Topics logs-unrouted
    Topic_key log_new_kafka_topic
    Dynamic_topic On
    Timestamp_Key @timestamp
    Timestamp_format iso8601
    Format json
    rdkafka.request.required.acks 1
    rdkafka.sasl.mechanism SCRAM-SHA-512
    rdkafka.sasl.username Fluentbit
    rdkafka.sasl.password Fluentbit
    rdkafka.security.protocol sasl_ssl
    rdkafka.ssl.ca.location /certs/ca.crt

Additional context
It hurts because I can't ship logs to kafka and server "die" so I need to drain node for reroute apps

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions