ROX-31012: Optimize memory for UpdateComputer connection deduper#17041
ROX-31012: Optimize memory for UpdateComputer connection deduper#17041
Conversation
|
This change is part of the following stack: Change managed by git-spice. |
|
Skipping CI for Draft Pull Request. |
|
Images are ready for the commit at 96be698. To use with deploy scripts, first |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #17041 +/- ##
=======================================
Coverage 48.80% 48.81%
=======================================
Files 2724 2724
Lines 203154 203120 -34
=======================================
- Hits 99157 99147 -10
+ Misses 96211 96197 -14
+ Partials 7786 7776 -10
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
f636ddd to
425a08d
Compare
784ee0f to
6fae477
Compare
6fae477 to
79b19fa
Compare
5e67e12 to
677d6ce
Compare
79b19fa to
a3d8491
Compare
677d6ce to
175c010
Compare
bf181fd to
41d6369
Compare
a3d8491 to
5f8e07c
Compare
There was a problem hiding this comment.
Hey there - I've reviewed your changes - here's some feedback:
- Consider aliasing set.Set[indicator.BinaryHash] to a descriptive type (e.g. BinaryHashSet) to improve readability and reduce type repetition.
- The NetworkConn_BinaryHash_Migration_Plan.md is quite large—consider moving it to an internal design document or wiki to keep this repo focused on code.
- If you’re logging or debugging connection keys, add a helper to format BinaryHash values as hex strings for readability.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider aliasing set.Set[indicator.BinaryHash] to a descriptive type (e.g. BinaryHashSet) to improve readability and reduce type repetition.
- The NetworkConn_BinaryHash_Migration_Plan.md is quite large—consider moving it to an internal design document or wiki to keep this repo focused on code.
- If you’re logging or debugging connection keys, add a helper to format BinaryHash values as hex strings for readability.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
I deliberately removed it. I am not a big fan of introducing type aliases if we don't define any functions on them.
Removed. Now it is attached to PR description.
Currently:
I am not adding it now because of:
|
2f90967 to
a8e0ccb
Compare
…hash, uint64) Migrated connection deduplication from string keys (~32B) to uint64 BinaryHash (8B), matching endpoint/process patterns. Updated dedupers, tests, and memory calculations. Partially AI-assisted.
a8e0ccb to
f6c26d4
Compare
|
/retest |
Replace NumericEndpoint (56 bytes) with BinaryHash (uint64) as map keys, following the pattern from PR #17041. This enables runtime.mapassign_fast64 for map operations instead of the slower generic mapassign. Implementation: - Add BinaryHash type and BinaryKey() method to NumericEndpoint - Use xxhash for fast, non-cryptographic hashing - Update all endpoint maps to use BinaryHash keys - Hash endpoints once before map operations - Update debug/logging to format hashes as hex Benchmark results (Phase 1 → Phase 2): Speed (sec/op): - Geometric mean: -51.66% (2x faster!) - Large_Incremental: 7.764ms → 3.863ms (-50.24%) - Medium_Incremental: 1.454ms → 834µs (-42.63%) - ApplySingleNoLock: -63% to -68% Memory (B/op): - Large workloads: 1.24MB → 932KB (-24.63%) - ApplyRepeated: 271KB → 196KB (-27.74%) Trade-off: - Allocations: +22% (hashing overhead) Combined improvement (Baseline → Phase 2): Speed: - Geometric mean: -51.66% (2x faster!) - Medium_Incremental: 1.833ms → 834µs (-54.47%) Memory: - Large workloads: 1.98MB → 932KB (-53.02%) - ApplyRepeated: 458KB → 196KB (-57.31%) Why this works: - uint64 keys use mapassign_fast64 (single CMP instruction for comparison) - NumericEndpoint keys use generic mapassign (56 bytes to hash and compare) - Reduced map storage overhead (8 bytes vs 56 bytes per key) - Better CPU cache utilization Trade-offs: - Public IP tracking optimization disabled (can't extract IP from hash) - Hash collision probability: 0.027% for 100M endpoints (negligible for K8s scale) - Debug output shows hashes instead of endpoint details Related to ROX-33201 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Implemented with Cursor using the following implementation plan: NetworkConn_BinaryHash_Migration_Plan.md
Description
Migrated NetworkConn deduplication from string-based keys to BinaryHash (uint64) keys for memory efficiency, achieving ~75% memory reduction (from ~32 bytes to 8 bytes per connection).
What changed:
NetworkConn.BinaryKey()method using xxhashconnectionsDeduperfrom*set.StringSetto*set.Set[indicator.BinaryHash]closedConnTimestampsfrommap[string]closedConnEntrytomap[indicator.BinaryHash]closedConnEntrynewStringSetPtr()helper andBinaryHashSettype aliasWhy:
Connection deduplication was using hex string keys (16 chars + Go overhead), consuming significant memory. This change aligns NetworkConn with the existing BinaryHash pattern already used successfully for ContainerEndpoint and ProcessListening dedupers.
AI contribution:
User-facing documentation
Testing and quality
Automated testing
How I validated my change
sensor/common/networkflow/...packages - all passingThis is an internal optimization with no external API changes or functional behavior differences.