-
Notifications
You must be signed in to change notification settings - Fork 1.8k
router: extend router interface to support metrics (v3) #11108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Replace config routing mask fields with router context pointer. Add router struct with mask state and metrics counters. Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Add router context creation, metrics initialization, and cleanup. Router owns routing mask buffers and maintains routing state. Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Update all routing mask functions to accept router context. Fix mask size calculation and add helper getter functions. Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Create router in config init and destroy in config exit. Ensure router lifecycle aligns with config lifecycle. Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Record routing metrics for successful flushes, retries, and drops. Track records and bytes per input/output pair. Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Replace all config-based mask operations with router-based calls. Add routing metrics for dropped chunks and fix mask size calculations. Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Update mask get/set/clear operations to use router instead of config. Fix mask size calculations using router helper functions. Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Include router context counters in metrics export. Enable routing metrics scraping without additional config. Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Fix mask operations and size calculations to use router slot count. Prevent buffer overruns during backlog processing. Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Replace config-based mask checks with router-based calls. Ensure mask evaluation uses current router state. Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Create router context in test setup and update all mask operations. Add router cleanup in test teardown. Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
WalkthroughExtracts routing state into a new struct Changes
Sequence Diagram(s)sequenceDiagram
participant Config
participant Router
participant RoutesMask
participant Input
participant Engine
rect rgba(0,128,96,0.06)
Config->>Router: flb_router_create(config)
Router-->>Config: config->router = router
Router->>RoutesMask: flb_routes_mask_set_size(mask_size, router)
end
rect rgba(0,64,160,0.06)
Input->>Router: flb_routes_mask_get_size(router)
Input->>RoutesMask: flb_routes_mask_get_bit(mask, val, router)
Engine->>Router: record metrics (in_name / out_name) via router->cmt
end
rect rgba(128,0,0,0.04)
Config->>Router: flb_router_destroy(router)
Router-->>Config: cleanup metrics & masks
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used🧬 Code graph analysis (1)src/flb_config.c (5)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (30)
🔇 Additional comments (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
plugins/in_storage_backlog/sb.c (1)
662-673: Same over-allocation in init: allocate by mask size, not slots.Use element count from flb_routes_mask_get_size(config->router).
- size_t slots; + size_t mask_size; @@ - slots = flb_routes_mask_get_slots(config->router); - ctx->dummy_routes_mask = flb_calloc(slots, - sizeof(flb_route_mask_element)); + mask_size = flb_routes_mask_get_size(config->router); + ctx->dummy_routes_mask = flb_calloc(mask_size, + sizeof(flb_route_mask_element));src/flb_routes_mask.c (2)
97-116: Fix incorrect error message.The warning message at Line 108 says "Can't clear bit" but this function is
flb_routes_mask_set_bit. The message should say "Can't set bit" to match the function's purpose.Apply this diff:
- if (value < 0 || value >= router->route_mask_slots) { - flb_warn("[routes_mask] Can't clear bit (%d) past limits of bitfield", + if (value < 0 || value >= router->route_mask_slots) { + flb_warn("[routes_mask] Can't set bit (%d) past limits of bitfield", value);
125-144: Fix incorrect error message.The warning message at Line 136 says "Can't set bit" but this function is
flb_routes_mask_clear_bit. The message should say "Can't clear bit" to match the function's purpose.Apply this diff:
if (value < 0 || value >= router->route_mask_slots) { - flb_warn("[routes_mask] Can't set bit (%d) past limits of bitfield", + flb_warn("[routes_mask] Can't clear bit (%d) past limits of bitfield", value);
🧹 Nitpick comments (1)
src/flb_metrics_exporter.c (1)
312-319: Router CMetrics are correctly appended; consider unifying ret checks.The cmt_cat(ctx->router->cmt) path mirrors storage handling and guards on -1. The log metrics path above checks ret != 0; optional: normalize to ret == -1 for consistency with cmt_cat’s contract.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (14)
include/fluent-bit/flb_config.h(2 hunks)include/fluent-bit/flb_router.h(3 hunks)include/fluent-bit/flb_routes_mask.h(1 hunks)plugins/in_storage_backlog/sb.c(4 hunks)src/flb_config.c(2 hunks)src/flb_engine.c(10 hunks)src/flb_input_chunk.c(19 hunks)src/flb_input_log.c(7 hunks)src/flb_metrics_exporter.c(2 hunks)src/flb_router.c(2 hunks)src/flb_routes_mask.c(7 hunks)src/flb_task.c(2 hunks)tests/internal/conditional_routing.c(8 hunks)tests/internal/input_chunk_routes.c(5 hunks)
🧰 Additional context used
🧠 Learnings (8)
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.
Applied to files:
include/fluent-bit/flb_router.hsrc/flb_engine.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.
Applied to files:
include/fluent-bit/flb_router.hsrc/flb_engine.c
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.
Applied to files:
include/fluent-bit/flb_router.hsrc/flb_engine.c
📚 Learning: 2025-08-29T06:25:02.561Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.
Applied to files:
include/fluent-bit/flb_router.hsrc/flb_engine.c
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit, the correct CMake flag for using system librdkafka is `FLB_PREFER_SYSTEM_LIB_KAFKA=ON`.
Applied to files:
src/flb_engine.c
📚 Learning: 2025-08-29T06:24:44.797Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:26-26
Timestamp: 2025-08-29T06:24:44.797Z
Learning: In Fluent Bit, ZSTD support is always available and enabled by default. The build system automatically detects and uses either the system libzstd library or builds the bundled ZSTD version. Unlike other optional dependencies like Arrow which use conditional compilation guards (e.g., FLB_HAVE_ARROW), ZSTD does not require conditional includes or build flags.
Applied to files:
src/flb_engine.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.
Applied to files:
src/flb_engine.c
📚 Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like #ifdef FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.
Applied to files:
src/flb_engine.c
🧬 Code graph analysis (12)
src/flb_metrics_exporter.c (1)
lib/cmetrics/src/cmt_cat.c (1)
cmt_cat(638-649)
include/fluent-bit/flb_router.h (1)
src/flb_router.c (3)
flb_router_metrics_create(356-361)flb_router_create(363-387)flb_router_destroy(389-402)
src/flb_router.c (3)
lib/cmetrics/src/cmt_counter.c (1)
cmt_counter_create(26-81)include/fluent-bit/flb_mem.h (2)
flb_calloc(84-96)flb_free(126-128)src/flb_routes_mask.c (1)
flb_routes_empty_mask_destroy(205-216)
tests/internal/input_chunk_routes.c (3)
src/flb_router.c (2)
flb_router_create(363-387)flb_router_destroy(389-402)src/flb_env.c (1)
flb_env_destroy(95-99)src/flb_routes_mask.c (2)
flb_routes_mask_set_size(218-235)flb_routes_mask_get_bit(154-173)
src/flb_input_log.c (1)
src/flb_routes_mask.c (3)
flb_routes_mask_get_size(27-34)flb_routes_mask_is_empty(175-185)flb_routes_mask_clear_bit(125-144)
src/flb_engine.c (3)
src/flb_input.c (1)
flb_input_name(790-797)src/flb_output.c (1)
flb_output_name(1087-1094)src/flb_routes_mask.c (1)
flb_routes_mask_set_size(218-235)
plugins/in_storage_backlog/sb.c (2)
src/flb_routes_mask.c (1)
flb_routes_mask_get_slots(36-43)include/fluent-bit/flb_mem.h (1)
flb_calloc(84-96)
src/flb_input_chunk.c (5)
src/flb_routes_mask.c (2)
flb_routes_mask_is_empty(175-185)flb_routes_mask_get_size(27-34)lib/cmetrics/src/cmt_counter.c (1)
cmt_counter_add(119-135)src/flb_input.c (1)
flb_input_name(790-797)src/flb_output.c (1)
flb_output_name(1087-1094)include/fluent-bit/flb_mem.h (1)
flb_calloc(84-96)
tests/internal/conditional_routing.c (3)
src/flb_input_chunk.c (1)
flb_input_chunk_create(1952-2065)src/flb_routes_mask.c (3)
flb_routes_mask_set_bit(97-116)flb_routes_mask_get_bit(154-173)flb_routes_mask_set_size(218-235)src/flb_router.c (2)
flb_router_create(363-387)flb_router_destroy(389-402)
include/fluent-bit/flb_routes_mask.h (1)
src/flb_routes_mask.c (10)
flb_routes_mask_set_by_tag(49-88)flb_routes_mask_get_bit(154-173)flb_routes_mask_set_bit(97-116)flb_routes_mask_clear_bit(125-144)flb_routes_mask_is_empty(175-185)flb_routes_empty_mask_create(187-203)flb_routes_empty_mask_destroy(205-216)flb_routes_mask_set_size(218-235)flb_routes_mask_get_size(27-34)flb_routes_mask_get_slots(36-43)
src/flb_routes_mask.c (1)
include/fluent-bit/flb_mem.h (2)
flb_calloc(84-96)flb_free(126-128)
src/flb_config.c (2)
src/flb_router.c (2)
flb_router_create(363-387)flb_router_destroy(389-402)src/flb_routes_mask.c (1)
flb_routes_mask_set_size(218-235)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (30)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
- GitHub Check: PR - fuzzing test
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-without-cxx (3.31.6)
- GitHub Check: pr-compile-centos-7
🔇 Additional comments (21)
src/flb_metrics_exporter.c (1)
37-37: Header include is correct.Including flb_router.h here is required for ctx->router->cmt access.
include/fluent-bit/flb_config.h (2)
34-35: Forward declaration looks good.No header cycle; keeps struct pointer usage lightweight.
298-299: Replacing mask fields with router pointer: verify lifecycle.Make sure flb_config_init() creates config->router and flb_config_exit() destroys it to avoid leaks and null derefs.
src/flb_task.c (1)
735-739: Route-mask calls correctly migrated to router.Looks consistent with the new API. Please confirm config->router is always initialized before tasks are created; otherwise get_bit() returns 0 and routes would be skipped.
Also applies to: 791-794
plugins/in_storage_backlog/sb.c (1)
320-323: Correct router usage in get_bit.Passing backlog->ins->config->router to flb_routes_mask_get_bit() is correct.
src/flb_input_log.c (1)
182-201: Route-mask reset and emptiness check migrated correctly.Using flb_routes_mask_get_size() for memset and flb_routes_mask_is_empty(..., ins->config->router) is correct and aligns with the new router API.
include/fluent-bit/flb_router.h (2)
52-67: Router struct and routing metrics fields look good.Fields cover mask sizing and CMetrics context; drop counters included.
204-207: Public router lifecycle/metrics API OK; confirm label schema.Please verify router_metrics_create() initializes counters with 2 labels (input, output) to match usage in flb_input_chunk.c.
src/flb_input_chunk.c (3)
964-968: Mask allocation sized by elements is correct.Using flb_routes_mask_get_size(in->config->router) to allocate the mask is the right dimension.
1145-1147: memset size uses element count; good.Zeroing routes_mask with sizeof(flb_route_mask_element) * mask_size is correct.
2332-2342: Empty‑mask check migrated to router.Logic remains equivalent with the new API.
tests/internal/input_chunk_routes.c (3)
69-85: Test setup correctly initializes router.Creating router and setting mask size in init_test_config() is necessary for new API usage.
255-259: Test teardown destroys router.Prevents leaks across tests.
616-625: Assertions migrated to router-based get_bit().Checks remain valid with config.router.
Also applies to: 777-786
src/flb_routes_mask.c (7)
27-43: LGTM: Clean accessor functions with proper NULL guards.The new size and slots accessor functions provide a safe, consistent interface for retrieving router mask dimensions.
49-88: LGTM: Router-based refactoring is consistent and safe.The function correctly transitions to the router-based API with appropriate NULL guards and accessor usage.
154-173: LGTM: Correctly refactored with proper bounds validation.The function properly transitions to router-based bounds checking with appropriate NULL handling.
175-185: LGTM: Proper NULL guards and router-based comparison.The function correctly handles NULL cases and uses router fields for the empty mask comparison.
187-203: LGTM: Safe mask creation with proper cleanup and error handling.The function correctly handles router NULL checks, cleans up existing masks, and validates allocation results.
205-216: LGTM: Safe deallocation with proper NULL handling.The function correctly guards against NULL router and sets the pointer to NULL after freeing.
218-235: LGTM: Correct size computation with proper validation.The function properly validates input, applies ceiling division to compute element count, updates router fields, and delegates mask creation. The formula
(mask_size + BITS - 1) / BITScorrectly rounds up.
| size_t slots; | ||
|
|
||
| memset(&dummy_input_chunk, 0, sizeof(struct flb_input_chunk)); | ||
|
|
||
| slots = flb_routes_mask_get_slots(context->ins->config->router); | ||
| memset(context->dummy_routes_mask, | ||
| 0, | ||
| context->ins->config->route_mask_slots * sizeof(flb_route_mask_element)); | ||
| slots * sizeof(flb_route_mask_element)); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Over-allocation: using slots (bits) instead of size (elements) for mask operations.
flb_routes_mask_get_slots() returns bit-count; you’re allocating/clearing 64× more than needed. Use flb_routes_mask_get_size() for element count.
Apply this fix:
- size_t slots;
+ size_t mask_size;
@@
- slots = flb_routes_mask_get_slots(context->ins->config->router);
- memset(context->dummy_routes_mask,
- 0,
- slots * sizeof(flb_route_mask_element));
+ mask_size = flb_routes_mask_get_size(context->ins->config->router);
+ memset(context->dummy_routes_mask, 0,
+ mask_size * sizeof(flb_route_mask_element));📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| size_t slots; | |
| memset(&dummy_input_chunk, 0, sizeof(struct flb_input_chunk)); | |
| slots = flb_routes_mask_get_slots(context->ins->config->router); | |
| memset(context->dummy_routes_mask, | |
| 0, | |
| context->ins->config->route_mask_slots * sizeof(flb_route_mask_element)); | |
| slots * sizeof(flb_route_mask_element)); | |
| size_t mask_size; | |
| memset(&dummy_input_chunk, 0, sizeof(struct flb_input_chunk)); | |
| mask_size = flb_routes_mask_get_size(context->ins->config->router); | |
| memset(context->dummy_routes_mask, 0, | |
| mask_size * sizeof(flb_route_mask_element)); |
🤖 Prompt for AI Agents
In plugins/in_storage_backlog/sb.c around lines 284 to 292, the code uses
flb_routes_mask_get_slots() (bit count) to compute the number of elements for
allocation/clearing, causing a 64x over-allocation; replace that call with
flb_routes_mask_get_size() to get the actual element count and use it when
computing the byte length for memset (elements * sizeof(flb_route_mask_element))
so the mask is cleared/allocated with the correct size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it could be an optimized way. But I have a question. If we use this way to allocate heap memories, it'll need to reallocate to fix the size of heap memories for route masks? This is because it could be using a possibly bigger size of allocation here, but the re-allocate costs could be high or any other re-allocating places are implemented to extend/shrink the size of heap memories that are using for route masks?
If there's no code path to re-allocate/adjust places for them, it could be worthless suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# First, let's examine the implementation of flb_routes_mask_get_slots() and flb_routes_mask_get_size()
echo "=== Implementation of flb_routes_mask_get_size() and flb_routes_mask_get_slots() ==="
ast-grep --pattern 'flb_routes_mask_get_size($$$) {
$$$
}'
ast-grep --pattern 'flb_routes_mask_get_slots($$$) {
$$$
}'
Length of output: 84
🏁 Script executed:
#!/bin/bash
# Let's look at the source file for these functions
echo "=== Contents of src/flb_routes_mask.c for these functions ==="
rg -A 10 "^size_t flb_routes_mask_get_(size|slots)" src/flb_routes_mask.c
Length of output: 444
🏁 Script executed:
#!/bin/bash
# Let's see how dummy_routes_mask is allocated in cb_sb_init
echo "=== Allocation of dummy_routes_mask in cb_sb_init ==="
rg -B 2 -A 5 "ctx->dummy_routes_mask = flb_calloc" plugins/in_storage_backlog/sb.c
Length of output: 316
🏁 Script executed:
#!/bin/bash
# Let's check other places in the codebase where routes_mask is allocated or sized
echo "=== Other allocations using flb_routes_mask_get_size ==="
rg -n "flb_routes_mask_get_size.*config->router" --type c -C 2
echo ""
echo "=== Other allocations using flb_routes_mask_get_slots ==="
rg -n "flb_routes_mask_get_slots.*config->router" --type c -C 2
Length of output: 2028
🏁 Script executed:
#!/bin/bash
# Let's examine the struct definition to understand what route_mask_slots and route_mask_size mean
echo "=== Router struct definition ==="
rg -A 15 "^struct flb_router" include/fluent-bit/flb_router.h
Length of output: 856
@cosmo0920, great question! Let me clarify - there's no reallocation happening here, and this is actually a bug fix, not just an optimization.
The Issue
The semantic difference between the two functions:
route_mask_slots= number of bits (i.e., max number of outputs/routes)route_mask_size= number of elements (uint64_t) =ceil(slots / 64)
Since each flb_route_mask_element (uint64_t) holds 64 bits, if you have 3 outputs, you need:
route_mask_slots= 3 (bits)route_mask_size= 1 (element, since 3 bits fit in one uint64_t)
Current Bug
The storage_backlog code currently:
- Allocates
slotselements (3 elements = 192 bytes) when it only needssizeelements (1 element = 8 bytes) - Clears
slotselements in the loop when it only needs to clearsizeelements
This is a ~64× over-allocation and over-clearing, not a design choice.
No Reallocation Occurs
The lifecycle is simple:
- Allocate once in
cb_sb_init(line 671) - Clear and reuse in
sb_append_chunk_to_segregated_backlogs(line 289) — called in a loop - Free once in
cb_sb_exit
The mask size is determined at router initialization and never changes. There's no code path that reallocates or adjusts this buffer.
Consistency with Rest of Codebase
Every other allocation/clear operation in the refactored code uses get_size():
src/flb_input_chunk.cline 964 & 2024: allocations useget_size()src/flb_input_log.cline 182: memset usesget_size()src/flb_routes_mask.cline 67: memset usesget_size()
The storage_backlog plugin is the only place still using get_slots() incorrectly.
Recommendation
Apply the fix to match the rest of the codebase. It will:
- ✅ Fix the over-allocation bug
- ✅ Reduce repeated memset overhead in the loop
- ✅ Make the code consistent with the router refactoring pattern
- ✅ No reallocation costs (because there's no reallocation)
🧬 Code Graph Analysis Results
[src/flb_routes_mask.c] lines 36-43
size_t flb_routes_mask_get_slots(struct flb_router *router)
{
if (router == NULL) {
return 0;
}
return router->route_mask_slots;
}
[src/flb_routes_mask.c] lines 49-88
int flb_routes_mask_set_by_tag(flb_route_mask_element *routes_mask,
const char *tag,
int tag_len,
struct flb_input_instance *in)
{
int has_routes = 0;
size_t size;
struct mk_list *o_head;
struct flb_output_instance *o_ins;
if (!in) {
return 0;
}
if (in->config == NULL || in->config->router == NULL) {
return 0;
}
/* Clear the bit field */
size = flb_routes_mask_get_size(in->config->router);
memset(routes_mask, 0, sizeof(flb_route_mask_element) * size);
/* Find all matching routes for the given tag */
mk_list_foreach(o_head, &in->config->outputs) {
o_ins = mk_list_entry(o_head,
struct flb_output_instance, _head);
if (flb_router_match(tag, tag_len, o_ins->match
#ifdef FLB_HAVE_REGEX
, o_ins->match_regex
#else
, NULL
#endif
)) {
flb_routes_mask_set_bit(routes_mask, o_ins->id, o_ins->config->router);
has_routes = 1;
}
}
return has_routes;
}
[src/flb_routes_mask.c] lines 154-173
int flb_routes_mask_get_bit(flb_route_mask_element *routes_mask, int value,
struct flb_router *router)
{
int index;
uint64_t bit;
if (router == NULL) {
return 0;
}
if (value < 0 || value >= router->route_mask_slots) {
flb_warn("[routes_mask] Can't get bit (%d) past limits of bitfield",
value);
return 0;
}
index = value / FLB_ROUTES_MASK_ELEMENT_BITS;
bit = 1ULL << (value % FLB_ROUTES_MASK_ELEMENT_BITS);
return (routes_mask[index] & bit) != 0ULL;
}
[src/include/fluent-bit/flb_mem.h] lines 84-96
void *flb_calloc(size_t n, const size_t size) {
if (size == 0) {
return NULL;
}
#ifdef FLB_HAVE_TESTS_OSSFUZZ
// Add chance of failure. Used by fuzzing to test error-handling code.
if (flb_fuzz_get_probability(1)) {
return NULL;
}
#endif
return calloc(n, size);
}
[src/include/fluent-bit/flb_mem.h] lines 126-128
static inline void flb_free(void *ptr) {
free(ptr);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@edsiper This @coderabbitai's point of view could be correct for this patch. So, could you take a look on it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cosmo0920, thank you for the review! I appreciate you confirming the analysis.
@edsiper, the issue is that both the allocation (line 671) and memset (line 289) in the storage_backlog plugin are using flb_routes_mask_get_slots() where they should use flb_routes_mask_get_size(). This causes a 64× over-allocation since slots counts bits while size counts uint64_t elements.
The fix is straightforward and makes the code consistent with the rest of the router refactoring. Happy to clarify anything if needed!
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/flb_config.c(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/flb_config.c (4)
src/flb_router.c (2)
flb_router_create(356-382)flb_router_destroy(384-397)src/flb_kernel.c (1)
flb_kernel_destroy(151-161)src/config_format/flb_config_format.c (1)
flb_cf_destroy(151-157)src/flb_routes_mask.c (1)
flb_routes_mask_set_size(218-235)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (30)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
- GitHub Check: pr-compile-without-cxx (3.31.6)
- GitHub Check: pr-compile-centos-7
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
- GitHub Check: PR - fuzzing test
🔇 Additional comments (1)
src/flb_config.c (1)
627-627: LGTM!The router destruction is correctly placed in the teardown sequence and
flb_router_destroysafely handles NULL internally.
3bd7997 to
56fb5cf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/flb_config.c(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/flb_config.c (5)
src/flb_router.c (2)
flb_router_create(356-382)flb_router_destroy(384-397)src/flb_kernel.c (1)
flb_kernel_destroy(151-161)src/config_format/flb_config_format.c (1)
flb_cf_destroy(151-157)include/fluent-bit/flb_mem.h (1)
flb_free(126-128)src/flb_routes_mask.c (1)
flb_routes_mask_set_size(218-235)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (30)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
- GitHub Check: pr-compile-without-cxx (3.31.6)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
- GitHub Check: PR - fuzzing test
- GitHub Check: pr-compile-centos-7
🔇 Additional comments (1)
src/flb_config.c (1)
637-637: Proper router cleanup during config exit.The router destruction is correctly integrated into the config exit flow. The encapsulation is improved by having
flb_router_destroy()handle internal cleanup (routes mask, metrics) rather than calling these functions directly from config code.
| ret = flb_routes_mask_set_size(1, config->router); | ||
| if (ret != 0) { | ||
| flb_error("[config] routing mask dimensioning failed"); | ||
| flb_router_destroy(config->router); | ||
| if (config->kernel) { | ||
| flb_kernel_destroy(config->kernel); | ||
| } | ||
| flb_cf_destroy(cf); | ||
| flb_free(config); | ||
| return NULL; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previous review concern addressed, but memory leak remains.
The return value of flb_routes_mask_set_size() is now properly checked and the router is correctly destroyed on failure, addressing the past review comment. However, the same memory leak for http_listen and http_port exists in this error path.
Apply this diff to fix the leak:
ret = flb_routes_mask_set_size(1, config->router);
if (ret != 0) {
flb_error("[config] routing mask dimensioning failed");
flb_router_destroy(config->router);
+#ifdef FLB_HAVE_HTTP_SERVER
+ if (config->http_listen) {
+ flb_free(config->http_listen);
+ }
+ if (config->http_port) {
+ flb_free(config->http_port);
+ }
+#endif
if (config->kernel) {
flb_kernel_destroy(config->kernel);
}
flb_cf_destroy(cf);
flb_free(config);
return NULL;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ret = flb_routes_mask_set_size(1, config->router); | |
| if (ret != 0) { | |
| flb_error("[config] routing mask dimensioning failed"); | |
| flb_router_destroy(config->router); | |
| if (config->kernel) { | |
| flb_kernel_destroy(config->kernel); | |
| } | |
| flb_cf_destroy(cf); | |
| flb_free(config); | |
| return NULL; | |
| } | |
| ret = flb_routes_mask_set_size(1, config->router); | |
| if (ret != 0) { | |
| flb_error("[config] routing mask dimensioning failed"); | |
| flb_router_destroy(config->router); | |
| #ifdef FLB_HAVE_HTTP_SERVER | |
| if (config->http_listen) { | |
| flb_free(config->http_listen); | |
| } | |
| if (config->http_port) { | |
| flb_free(config->http_port); | |
| } | |
| #endif | |
| if (config->kernel) { | |
| flb_kernel_destroy(config->kernel); | |
| } | |
| flb_cf_destroy(cf); | |
| flb_free(config); | |
| return NULL; | |
| } |
🤖 Prompt for AI Agents
In src/flb_config.c around lines 317 to 327, the error path after
flb_routes_mask_set_size() fails correctly destroys the router and kernel but
leaks memory held by config->http_listen and config->http_port; free those two
string members (checking for non-NULL if desired) before calling
flb_free(config) (i.e., flb_free(config->http_listen);
flb_free(config->http_port);) so all allocated config resources are released on
this failure path.
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
56fb5cf to
5c65379
Compare
This PR introduces routing metrics into the Fluent Bit router. The router API now uses dedicated contexts that extend the existing design to provide proper metric instrumentation, details:
Goals:
Here is an example of the new routing metrics:
Fluent Bit Pipeline
Metrics exposed
output:
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
Summary by CodeRabbit