Skip to content

dry-run segfaults on linux if config has s3 output #11104

@themasch

Description

@themasch

Bug Report

Describe the bug

Our docker build runs a --dry-run at the end as a simple smoke test to catch config errors, this time it ran into a segfault:

1.271 configuration test is successful
1.271 [2025/11/04 13:22:28] [engine] caught signal (SIGSEGV)
1.330 #0  0x4fc70f            in  flb_output_exit() at src/flb_output.c:548
1.330 #1  0x50bcb4            in  flb_engine_shutdown() at src/flb_engine.c:1225
1.330 #2  0x4e6e14            in  flb_destroy() at src/flb_lib.c:240
1.331 #3  0x459323            in  flb_main_run() at src/fluent-bit.c:1436
1.331 #4  0x75752af2f5cf      in  ???() at ???:0
1.331 #5  0x75752af2f67f      in  ???() at ???:0
1.331 #6  0x456ce4            in  ???() at ???:0
1.331 #7  0xffffffffffffffff  in  ???() at ???:0

To Reproduce

4.0.13 has no segfault, 4.1.0 and 4.1.1 have.

Heres the cmake command we use to build fluent-bit:

cmake -DFLB_MINIMAL=Yes \
           -DFLB_KAFKA=Off \
           -DFLB_RELEASE=On \
           -DFLB_TLS=ON \
           -DFLB_BINARY=ON \
           -DFLB_SIMD=On \
           -DFLB_JEMALLOC=On \
           -DFLB_SIGNV4=Yes \
           -DFLB_CONFIG_YAML=Yes \
           -DFLB_IN_DOCKER_EVENTS=ON \
           -DFLB_IN_TCP=ON \
           -DFLB_IN_SYSLOG=ON \
           -DFLB_IN_EMITTER=ON \
           -DFLB_FILTER_REWRITE_TAG=ON \
           -DFLB_FILTER_MODIFY=ON \
           -DFLB_FILTER_NEST=ON \
           -DFLB_FILTER_PARSER=ON \
           -DFLB_FILTER_GREP=ON \
           -DFLB_OUT_LOKI=ON \
           -DFLB_OUT_S3=ON \
           -DFLB_OUT_SPLUNK=ON \
           -DFLB_PROCESSOR_METRICS_SELECTOR=ON \
           -DFLB_PROCESSOR_LABELS=ON \
           ../
make -j $(getconf _NPROCESSORS_ONLN)

I installed valgrind in the container image and ran the dry run under valgrind:

valgrind on FLB_RELEASE=ON
bash-5.1# valgrind -- /fluent-bit/bin/fluent-bit -c "/fluent-bit/fluentbit.yaml" --dry-run
==98== Memcheck, a memory error detector
==98== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al.
==98== Using Valgrind-3.24.0 and LibVEX; rerun with -h for copyright info
==98== Command: /fluent-bit/bin/fluent-bit -c /fluent-bit/fluentbit.yaml --dry-run
==98== 
Fluent Bit v4.1.0
* Copyright (C) 2015-2025 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

______ _                  _    ______ _ _             ___   __  
|  ___| |                | |   | ___ (_) |           /   | /  | 
| |_  | |_   _  ___ _ __ | |_  | |_/ /_| |_  __   __/ /| | `| | 
|  _| | | | | |/ _ \ '_ \| __| | ___ \ | __| \ \ / / /_| |  | | 
| |   | | |_| |  __/ | | | |_  | |_/ / | |_   \ V /\___  |__| |_
\_|   |_|\__,_|\___|_| |_|\__| \____/|_|\__|   \_/     |_(_)___/


[2025/11/04 13:41:26.631168127] [ warn] [env] variable ${...} is used but not set         ( a few of those .... cut for brevity)
configuration test is successful
==98== Invalid read of size 8
==98==    at 0x56F3E1: cb_s3_worker_exit (s3.c:1115)
==98==    by 0x4FC70F: flb_output_exit (flb_output.c:548)
==98==    by 0x50BCB4: flb_engine_shutdown (flb_engine.c:1225)
==98==    by 0x4E6E14: flb_destroy (flb_lib.c:240)
==98==    by 0x459323: flb_main_run (fluent-bit.c:1436)
==98==    by 0x51C65CF: (below main) (in /usr/lib64/libc.so.6)
==98==  Address 0x2c0 is not stack'd, malloc'd or (recently) free'd
==98== 
[2025/11/04 13:41:26] [engine] caught signal (SIGSEGV)
#0  0x4fc70f            in  flb_output_exit() at src/flb_output.c:548
#1  0x50bcb4            in  flb_engine_shutdown() at src/flb_engine.c:1225
#2  0x4e6e14            in  flb_destroy() at src/flb_lib.c:240
#3  0x459323            in  flb_main_run() at src/fluent-bit.c:1436
#4  0x51c65cf           in  ???() at ???:0
#5  0x51c667f           in  ???() at ???:0
#6  0x456ce4            in  ???() at ???:0
#7  0xffffffffffffffff  in  ???() at ???:0
==98== 
==98== Process terminating with default action of signal 6 (SIGABRT): dumping core
==98==    at 0x5228E2C: __pthread_kill_implementation (in /usr/lib64/libc.so.6)
==98==    by 0x51DBB45: raise (in /usr/lib64/libc.so.6)
==98==    by 0x51C5832: abort (in /usr/lib64/libc.so.6)
==98==    by 0x4576C7: flb_signal_handler (fluent-bit.c:636)
==98==    by 0x51DBBEF: ??? (in /usr/lib64/libc.so.6)
==98==    by 0x56F3E0: cb_s3_worker_exit (s3.c:1111)
==98==    by 0x5463C5F: ???
==98==    by 0x4FC70F: flb_output_exit (flb_output.c:548)
==98==    by 0x50BCB4: flb_engine_shutdown (flb_engine.c:1225)
==98==    by 0x4E6E14: flb_destroy (flb_lib.c:240)
==98==    by 0x459323: flb_main_run (fluent-bit.c:1436)
==98==    by 0x51C65CF: (below main) (in /usr/lib64/libc.so.6)
==98== 
==98== HEAP SUMMARY:
==98==     in use at exit: 254,788 bytes in 856 blocks
==98==   total heap usage: 3,092 allocs, 2,236 frees, 444,857 bytes allocated
==98== 
==98== LEAK SUMMARY:
==98==    definitely lost: 0 bytes in 0 blocks
==98==    indirectly lost: 0 bytes in 0 blocks
==98==      possibly lost: 159,402 bytes in 828 blocks
==98==    still reachable: 95,386 bytes in 28 blocks
==98==                       of which reachable via heuristic:
==98==                         newarray           : 320 bytes in 4 blocks
==98==         suppressed: 0 bytes in 0 blocks
==98== Rerun with --leak-check=full to see details of leaked memory
==98== 
==98== For lists of detected and suppressed errors, rerun with: -s
==98== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Aborted (core dumped)
A valgrind run with FLB_RELEASE=Off
bash-5.1# valgrind -- /fluent-bit/bin/fluent-bit -c "/fluent-bit/fluentbit.yaml" --dry-run
==97== Memcheck, a memory error detector
==97== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al.
==97== Using Valgrind-3.24.0 and LibVEX; rerun with -h for copyright info
==97== Command: /fluent-bit/bin/fluent-bit -c /fluent-bit/fluentbit.yaml --dry-run
==97== 
Fluent Bit v4.1.0
* Copyright (C) 2015-2025 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

______ _                  _    ______ _ _             ___   __  
|  ___| |                | |   | ___ (_) |           /   | /  | 
| |_  | |_   _  ___ _ __ | |_  | |_/ /_| |_  __   __/ /| | `| | 
|  _| | | | | |/ _ \ '_ \| __| | ___ \ | __| \ \ / / /_| |  | | 
| |   | | |_| |  __/ | | | |_  | |_/ / | |_   \ V /\___  |__| |_
\_|   |_|\__,_|\___|_| |_|\__| \____/|_|\__|   \_/     |_(_)___/


[2025/11/04 13:47:17.956698294] [ warn] [env] variable ${...} is used but not set
configuration test is successful
==97== Invalid read of size 8
==97==    at 0x772AEC: cb_s3_worker_exit (s3.c:1115)
==97==    by 0x53A690: flb_output_exit (flb_output.c:548)
==97==    by 0x56AC63: flb_engine_shutdown (flb_engine.c:1225)
==97==    by 0x4F6300: flb_destroy (flb_lib.c:240)
==97==    by 0x45D7F5: flb_main_run (fluent-bit.c:1436)
==97==    by 0x5D32D2: flb_supervisor_run (flb_supervisor.c:626)
==97==    by 0x45DB68: flb_main (fluent-bit.c:1564)
==97==    by 0x45DB8A: main (fluent-bit.c:1572)
==97==  Address 0x2c0 is not stack'd, malloc'd or (recently) free'd
==97== 
[2025/11/04 13:47:17] [engine] caught signal (SIGSEGV)
#0  0x772aec            in  cb_s3_worker_exit() at plugins/out_s3/s3.c:1115
#1  0x53a690            in  flb_output_exit() at src/flb_output.c:548
#2  0x56ac63            in  flb_engine_shutdown() at src/flb_engine.c:1225
#3  0x4f6300            in  flb_destroy() at src/flb_lib.c:240
#4  0x45d7f5            in  flb_main_run() at src/fluent-bit.c:1436
#5  0x5d32d2            in  flb_supervisor_run() at src/flb_supervisor.c:626
#6  0x45db68            in  flb_main() at src/fluent-bit.c:1564
#7  0x45db8a            in  main() at src/fluent-bit.c:1572
#8  0x51c65cf           in  ???() at ???:0
#9  0x51c667f           in  ???() at ???:0
#10 0x456a04            in  ???() at ???:0
#11 0xffffffffffffffff  in  ???() at ???:0
==97== 
==97== Process terminating with default action of signal 6 (SIGABRT): dumping core
==97==    at 0x5228E2C: __pthread_kill_implementation (in /usr/lib64/libc.so.6)
==97==    by 0x51DBB45: raise (in /usr/lib64/libc.so.6)
==97==    by 0x51C5832: abort (in /usr/lib64/libc.so.6)
==97==    by 0x45C226: flb_signal_handler (fluent-bit.c:636)
==97==    by 0x51DBBEF: ??? (in /usr/lib64/libc.so.6)
==97==    by 0x772AEB: cb_s3_worker_exit (s3.c:1115)
==97==    by 0x53A690: flb_output_exit (flb_output.c:548)
==97==    by 0x56AC63: flb_engine_shutdown (flb_engine.c:1225)
==97==    by 0x4F6300: flb_destroy (flb_lib.c:240)
==97==    by 0x45D7F5: flb_main_run (fluent-bit.c:1436)
==97==    by 0x5D32D2: flb_supervisor_run (flb_supervisor.c:626)
==97==    by 0x45DB68: flb_main (fluent-bit.c:1564)
==97== 
==97== HEAP SUMMARY:
==97==     in use at exit: 254,788 bytes in 856 blocks
==97==   total heap usage: 3,092 allocs, 2,236 frees, 444,857 bytes allocated
==97== 
==97== LEAK SUMMARY:
==97==    definitely lost: 0 bytes in 0 blocks
==97==    indirectly lost: 0 bytes in 0 blocks
==97==      possibly lost: 159,402 bytes in 828 blocks
==97==    still reachable: 95,386 bytes in 28 blocks
==97==                       of which reachable via heuristic:
==97==                         newarray           : 320 bytes in 4 blocks
==97==         suppressed: 0 bytes in 0 blocks
==97== Rerun with --leak-check=full to see details of leaked memory
==97== 
==97== For lists of detected and suppressed errors, rerun with: -s
==97== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Aborted (core dumped)

I also reduced our config to a smaller config I can share that still produces the same error:

service:
    flush_interval: 1
    daemon: Off
    log_level: debug

pipeline:
    inputs:
        -   name: tcp
            listen: 0.0.0.0
            port: 24224
            format: json
            tag: input.json_tcp
            buffer_size: 128 #kb

    outputs:
        -   name: s3
            match: 'common_output'
            region: eu-west-1
            bucket: TEST
            store_dir_limit_size: 200M
            compression: gzip
            total_file_size: 50M
            s3_key_format: "/%Y/%m/%d/%H_%M_%S_$UUID.json"

I seems to be related to the s3 plugin (also matches valgrind output). Using a different output does not cause the segfault

Expected behavior
no segfault in dry run

Your Environment

  • Version used: 4.1.0, 4.1.1
  • Configuration: see above
  • Environment name and version (e.g. Kubernetes? What version?):
  • Server type and version:
  • Operating System and version: rocky 9.6 on WSL kernel 6.6.87, glibc 2.34-168.el9
  • Filters and plugins: s3

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions