-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Open
Description
Bug Report
Describe the bug
Our docker build runs a --dry-run at the end as a simple smoke test to catch config errors, this time it ran into a segfault:
1.271 configuration test is successful
1.271 [2025/11/04 13:22:28] [engine] caught signal (SIGSEGV)
1.330 #0 0x4fc70f in flb_output_exit() at src/flb_output.c:548
1.330 #1 0x50bcb4 in flb_engine_shutdown() at src/flb_engine.c:1225
1.330 #2 0x4e6e14 in flb_destroy() at src/flb_lib.c:240
1.331 #3 0x459323 in flb_main_run() at src/fluent-bit.c:1436
1.331 #4 0x75752af2f5cf in ???() at ???:0
1.331 #5 0x75752af2f67f in ???() at ???:0
1.331 #6 0x456ce4 in ???() at ???:0
1.331 #7 0xffffffffffffffff in ???() at ???:0
To Reproduce
4.0.13 has no segfault, 4.1.0 and 4.1.1 have.
Heres the cmake command we use to build fluent-bit:
cmake -DFLB_MINIMAL=Yes \
-DFLB_KAFKA=Off \
-DFLB_RELEASE=On \
-DFLB_TLS=ON \
-DFLB_BINARY=ON \
-DFLB_SIMD=On \
-DFLB_JEMALLOC=On \
-DFLB_SIGNV4=Yes \
-DFLB_CONFIG_YAML=Yes \
-DFLB_IN_DOCKER_EVENTS=ON \
-DFLB_IN_TCP=ON \
-DFLB_IN_SYSLOG=ON \
-DFLB_IN_EMITTER=ON \
-DFLB_FILTER_REWRITE_TAG=ON \
-DFLB_FILTER_MODIFY=ON \
-DFLB_FILTER_NEST=ON \
-DFLB_FILTER_PARSER=ON \
-DFLB_FILTER_GREP=ON \
-DFLB_OUT_LOKI=ON \
-DFLB_OUT_S3=ON \
-DFLB_OUT_SPLUNK=ON \
-DFLB_PROCESSOR_METRICS_SELECTOR=ON \
-DFLB_PROCESSOR_LABELS=ON \
../
make -j $(getconf _NPROCESSORS_ONLN)
I installed valgrind in the container image and ran the dry run under valgrind:
valgrind on FLB_RELEASE=ON
bash-5.1# valgrind -- /fluent-bit/bin/fluent-bit -c "/fluent-bit/fluentbit.yaml" --dry-run
==98== Memcheck, a memory error detector
==98== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al.
==98== Using Valgrind-3.24.0 and LibVEX; rerun with -h for copyright info
==98== Command: /fluent-bit/bin/fluent-bit -c /fluent-bit/fluentbit.yaml --dry-run
==98==
Fluent Bit v4.1.0
* Copyright (C) 2015-2025 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io
______ _ _ ______ _ _ ___ __
| ___| | | | | ___ (_) | / | / |
| |_ | |_ _ ___ _ __ | |_ | |_/ /_| |_ __ __/ /| | `| |
| _| | | | | |/ _ \ '_ \| __| | ___ \ | __| \ \ / / /_| | | |
| | | | |_| | __/ | | | |_ | |_/ / | |_ \ V /\___ |__| |_
\_| |_|\__,_|\___|_| |_|\__| \____/|_|\__| \_/ |_(_)___/
[2025/11/04 13:41:26.631168127] [ warn] [env] variable ${...} is used but not set ( a few of those .... cut for brevity)
configuration test is successful
==98== Invalid read of size 8
==98== at 0x56F3E1: cb_s3_worker_exit (s3.c:1115)
==98== by 0x4FC70F: flb_output_exit (flb_output.c:548)
==98== by 0x50BCB4: flb_engine_shutdown (flb_engine.c:1225)
==98== by 0x4E6E14: flb_destroy (flb_lib.c:240)
==98== by 0x459323: flb_main_run (fluent-bit.c:1436)
==98== by 0x51C65CF: (below main) (in /usr/lib64/libc.so.6)
==98== Address 0x2c0 is not stack'd, malloc'd or (recently) free'd
==98==
[2025/11/04 13:41:26] [engine] caught signal (SIGSEGV)
#0 0x4fc70f in flb_output_exit() at src/flb_output.c:548
#1 0x50bcb4 in flb_engine_shutdown() at src/flb_engine.c:1225
#2 0x4e6e14 in flb_destroy() at src/flb_lib.c:240
#3 0x459323 in flb_main_run() at src/fluent-bit.c:1436
#4 0x51c65cf in ???() at ???:0
#5 0x51c667f in ???() at ???:0
#6 0x456ce4 in ???() at ???:0
#7 0xffffffffffffffff in ???() at ???:0
==98==
==98== Process terminating with default action of signal 6 (SIGABRT): dumping core
==98== at 0x5228E2C: __pthread_kill_implementation (in /usr/lib64/libc.so.6)
==98== by 0x51DBB45: raise (in /usr/lib64/libc.so.6)
==98== by 0x51C5832: abort (in /usr/lib64/libc.so.6)
==98== by 0x4576C7: flb_signal_handler (fluent-bit.c:636)
==98== by 0x51DBBEF: ??? (in /usr/lib64/libc.so.6)
==98== by 0x56F3E0: cb_s3_worker_exit (s3.c:1111)
==98== by 0x5463C5F: ???
==98== by 0x4FC70F: flb_output_exit (flb_output.c:548)
==98== by 0x50BCB4: flb_engine_shutdown (flb_engine.c:1225)
==98== by 0x4E6E14: flb_destroy (flb_lib.c:240)
==98== by 0x459323: flb_main_run (fluent-bit.c:1436)
==98== by 0x51C65CF: (below main) (in /usr/lib64/libc.so.6)
==98==
==98== HEAP SUMMARY:
==98== in use at exit: 254,788 bytes in 856 blocks
==98== total heap usage: 3,092 allocs, 2,236 frees, 444,857 bytes allocated
==98==
==98== LEAK SUMMARY:
==98== definitely lost: 0 bytes in 0 blocks
==98== indirectly lost: 0 bytes in 0 blocks
==98== possibly lost: 159,402 bytes in 828 blocks
==98== still reachable: 95,386 bytes in 28 blocks
==98== of which reachable via heuristic:
==98== newarray : 320 bytes in 4 blocks
==98== suppressed: 0 bytes in 0 blocks
==98== Rerun with --leak-check=full to see details of leaked memory
==98==
==98== For lists of detected and suppressed errors, rerun with: -s
==98== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Aborted (core dumped)
A valgrind run with FLB_RELEASE=Off
bash-5.1# valgrind -- /fluent-bit/bin/fluent-bit -c "/fluent-bit/fluentbit.yaml" --dry-run
==97== Memcheck, a memory error detector
==97== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al.
==97== Using Valgrind-3.24.0 and LibVEX; rerun with -h for copyright info
==97== Command: /fluent-bit/bin/fluent-bit -c /fluent-bit/fluentbit.yaml --dry-run
==97==
Fluent Bit v4.1.0
* Copyright (C) 2015-2025 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io
______ _ _ ______ _ _ ___ __
| ___| | | | | ___ (_) | / | / |
| |_ | |_ _ ___ _ __ | |_ | |_/ /_| |_ __ __/ /| | `| |
| _| | | | | |/ _ \ '_ \| __| | ___ \ | __| \ \ / / /_| | | |
| | | | |_| | __/ | | | |_ | |_/ / | |_ \ V /\___ |__| |_
\_| |_|\__,_|\___|_| |_|\__| \____/|_|\__| \_/ |_(_)___/
[2025/11/04 13:47:17.956698294] [ warn] [env] variable ${...} is used but not set
configuration test is successful
==97== Invalid read of size 8
==97== at 0x772AEC: cb_s3_worker_exit (s3.c:1115)
==97== by 0x53A690: flb_output_exit (flb_output.c:548)
==97== by 0x56AC63: flb_engine_shutdown (flb_engine.c:1225)
==97== by 0x4F6300: flb_destroy (flb_lib.c:240)
==97== by 0x45D7F5: flb_main_run (fluent-bit.c:1436)
==97== by 0x5D32D2: flb_supervisor_run (flb_supervisor.c:626)
==97== by 0x45DB68: flb_main (fluent-bit.c:1564)
==97== by 0x45DB8A: main (fluent-bit.c:1572)
==97== Address 0x2c0 is not stack'd, malloc'd or (recently) free'd
==97==
[2025/11/04 13:47:17] [engine] caught signal (SIGSEGV)
#0 0x772aec in cb_s3_worker_exit() at plugins/out_s3/s3.c:1115
#1 0x53a690 in flb_output_exit() at src/flb_output.c:548
#2 0x56ac63 in flb_engine_shutdown() at src/flb_engine.c:1225
#3 0x4f6300 in flb_destroy() at src/flb_lib.c:240
#4 0x45d7f5 in flb_main_run() at src/fluent-bit.c:1436
#5 0x5d32d2 in flb_supervisor_run() at src/flb_supervisor.c:626
#6 0x45db68 in flb_main() at src/fluent-bit.c:1564
#7 0x45db8a in main() at src/fluent-bit.c:1572
#8 0x51c65cf in ???() at ???:0
#9 0x51c667f in ???() at ???:0
#10 0x456a04 in ???() at ???:0
#11 0xffffffffffffffff in ???() at ???:0
==97==
==97== Process terminating with default action of signal 6 (SIGABRT): dumping core
==97== at 0x5228E2C: __pthread_kill_implementation (in /usr/lib64/libc.so.6)
==97== by 0x51DBB45: raise (in /usr/lib64/libc.so.6)
==97== by 0x51C5832: abort (in /usr/lib64/libc.so.6)
==97== by 0x45C226: flb_signal_handler (fluent-bit.c:636)
==97== by 0x51DBBEF: ??? (in /usr/lib64/libc.so.6)
==97== by 0x772AEB: cb_s3_worker_exit (s3.c:1115)
==97== by 0x53A690: flb_output_exit (flb_output.c:548)
==97== by 0x56AC63: flb_engine_shutdown (flb_engine.c:1225)
==97== by 0x4F6300: flb_destroy (flb_lib.c:240)
==97== by 0x45D7F5: flb_main_run (fluent-bit.c:1436)
==97== by 0x5D32D2: flb_supervisor_run (flb_supervisor.c:626)
==97== by 0x45DB68: flb_main (fluent-bit.c:1564)
==97==
==97== HEAP SUMMARY:
==97== in use at exit: 254,788 bytes in 856 blocks
==97== total heap usage: 3,092 allocs, 2,236 frees, 444,857 bytes allocated
==97==
==97== LEAK SUMMARY:
==97== definitely lost: 0 bytes in 0 blocks
==97== indirectly lost: 0 bytes in 0 blocks
==97== possibly lost: 159,402 bytes in 828 blocks
==97== still reachable: 95,386 bytes in 28 blocks
==97== of which reachable via heuristic:
==97== newarray : 320 bytes in 4 blocks
==97== suppressed: 0 bytes in 0 blocks
==97== Rerun with --leak-check=full to see details of leaked memory
==97==
==97== For lists of detected and suppressed errors, rerun with: -s
==97== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Aborted (core dumped)
I also reduced our config to a smaller config I can share that still produces the same error:
service:
flush_interval: 1
daemon: Off
log_level: debug
pipeline:
inputs:
- name: tcp
listen: 0.0.0.0
port: 24224
format: json
tag: input.json_tcp
buffer_size: 128 #kb
outputs:
- name: s3
match: 'common_output'
region: eu-west-1
bucket: TEST
store_dir_limit_size: 200M
compression: gzip
total_file_size: 50M
s3_key_format: "/%Y/%m/%d/%H_%M_%S_$UUID.json"I seems to be related to the s3 plugin (also matches valgrind output). Using a different output does not cause the segfault
Expected behavior
no segfault in dry run
Your Environment
- Version used: 4.1.0, 4.1.1
- Configuration: see above
- Environment name and version (e.g. Kubernetes? What version?):
- Server type and version:
- Operating System and version: rocky 9.6 on WSL kernel 6.6.87, glibc 2.34-168.el9
- Filters and plugins: s3