Skip to content

⚡️ Speed up function _map_str_double by 7%#25

Open
codeflash-ai[bot] wants to merge 1 commit intomainfrom
codeflash/optimize-_map_str_double-mgup1aq3
Open

⚡️ Speed up function _map_str_double by 7%#25
codeflash-ai[bot] wants to merge 1 commit intomainfrom
codeflash/optimize-_map_str_double-mgup1aq3

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 17, 2025

📄 7% (0.07x) speedup for _map_str_double in src/deepgram/extensions/telemetry/proto_encoder.py

⏱️ Runtime : 10.5 milliseconds 9.85 milliseconds (best of 176 runs)

📝 Explanation and details

Impact: low
Impact_explanation: Looking at this optimization report, I need to assess the impact based on the provided rubric.

Analysis:

  1. Overall Runtime Details:

    • Original: 10.5ms, Optimized: 9.85ms (6.6% speedup)
    • While the absolute runtime is above 100 microseconds, the relative speedup of 6.6% is below the 15% threshold mentioned in the rubric
  2. Generated Tests Performance:

    • Small operations (single entries, edge cases): Mostly showing minimal improvements or slight regressions (0-3% range)
    • Large-scale operations (1000 entries): Showing consistent 6-7% improvements
    • Many test cases show <2% improvements or even slight slowdowns, which the rubric considers low impact
  3. Hot Path Analysis:

    • The function _map_str_double is called from _encode_telemetry_event
    • This appears to be a telemetry encoding function, but there's no indication it's called in a tight loop
    • The calling context suggests it's used once per telemetry event, not in a multiplicative hot path
  4. Performance Consistency:

    • The optimization shows inconsistent results across test cases
    • Small dictionaries often perform worse or marginally better
    • Only large dictionaries (1000+ entries) show meaningful improvements
    • This pattern suggests the optimization is not consistently beneficial
  5. Asymptotic Complexity:

    • The optimization doesn't appear to change the algorithmic complexity
    • It's primarily about reducing function call overhead and byte concatenations

Assessment:

  • The 6.6% speedup is below the 15% threshold for significance
  • Performance gains are inconsistent across different input sizes
  • Many test cases show <2% improvements or regressions
  • No evidence of being in a multiplicative hot path
  • No improvement in asymptotic complexity

END OF IMPACT EXPLANATION

The optimized code achieves a 6% speedup through strategic elimination of byte concatenations and redundant operations:

Key Optimizations:

  1. Efficient byte joining in _len_delimited: Replaced multiple + concatenations with a single b"".join() call, reducing intermediate byte object creation.

  2. Eliminated redundant function calls in _map_str_double: The original code called _string() and _double() helper functions, which then called _len_delimited() again. The optimized version inlines these operations using direct _key(), _varint(), and struct.pack() calls, avoiding the extra function call overhead and intermediate concatenations.

  3. Pre-computed bytearray.extend reference: Stores out.extend in a local variable append to avoid repeated attribute lookups during the loop.

  4. Single UTF-8 encoding per key: Uses k.encode("utf-8") directly in the join operation rather than encoding it separately in helper functions.

Performance Impact by Test Case:

  • Large-scale operations show the biggest gains (6-7% faster on 1000+ entries) where the reduced function call overhead and concatenation optimizations compound
  • Small dictionaries show minimal or slight regression due to the added complexity of inlined operations
  • Unicode and edge cases maintain similar performance while preserving correctness

The optimization particularly excels when processing large maps with many entries, where the elimination of redundant function calls and more efficient byte building significantly reduces total execution time.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 41 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 2 Passed
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from __future__ import annotations

import struct
import typing

# imports
import pytest  # used for our unit tests
from src.deepgram.extensions.telemetry.proto_encoder import _map_str_double

# unit tests

# Helper functions for decoding protobuf wire format for test verification
def decode_varint(data, offset=0):
    """Decodes a varint from data[offset:] and returns (value, new_offset)."""
    shift = 0
    result = 0
    while True:
        b = data[offset]
        result |= ((b & 0x7F) << shift)
        offset += 1
        if not (b & 0x80):
            break
        shift += 7
    return result, offset

def parse_map_entry(entry_bytes):
    """Parse a single map entry (len-delimited) and return (key, value)."""
    offset = 0
    offset += 1
    key_len, offset = decode_varint(entry_bytes, offset)
    key_bytes = entry_bytes[offset:offset+key_len]
    key = key_bytes.decode('utf-8')
    offset += key_len
    offset += 1
    value_bytes = entry_bytes[offset:offset+8]
    value = struct.unpack('<d', value_bytes)[0]
    offset += 8
    return key, value

def parse_map_field(field_number, encoded):
    """Parse the output of _map_str_double and return a dict of key->value."""
    offset = 0
    result = {}
    key_byte = (field_number << 3) | 2
    while offset < len(encoded):
        offset += 1
        entry_len, offset = decode_varint(encoded, offset)
        entry_bytes = encoded[offset:offset+entry_len]
        key, value = parse_map_entry(entry_bytes)
        result[key] = value
        offset += entry_len
    return result

# 1. Basic Test Cases

def test_empty_dict_returns_empty_bytes():
    # Should return b"" for empty dict
    codeflash_output = _map_str_double(1, {}) # 629ns -> 643ns (2.18% slower)
    codeflash_output = _map_str_double(1, None) # 370ns -> 355ns (4.23% faster)

def test_single_entry():
    # Test with a single key-value pair
    d = {"foo": 42.0}
    codeflash_output = _map_str_double(3, d); out = codeflash_output # 8.83μs -> 9.00μs (1.83% slower)
    # Should decode to the same mapping
    parsed = parse_map_field(3, out)

def test_multiple_entries():
    # Test with multiple key-value pairs
    d = {"a": 1.5, "b": -3.2, "c": 0.0}
    codeflash_output = _map_str_double(2, d); out = codeflash_output # 14.4μs -> 14.0μs (2.53% faster)
    parsed = parse_map_field(2, out)

def test_non_integer_field_number():
    # Test with a higher field number
    d = {"x": 7.7}
    codeflash_output = _map_str_double(15, d); out = codeflash_output # 8.72μs -> 8.71μs (0.172% faster)
    parsed = parse_map_field(15, out)

def test_float_and_int_values():
    # Should coerce int values to float
    d = {"int": 5, "float": 3.14}
    codeflash_output = _map_str_double(1, d); out = codeflash_output # 12.0μs -> 12.0μs (0.175% faster)
    parsed = parse_map_field(1, out)

# 2. Edge Test Cases

def test_unicode_keys():
    # Test with unicode characters in keys
    d = {"ключ": 1.23, "你好": 4.56}
    codeflash_output = _map_str_double(1, d); out = codeflash_output # 11.8μs -> 11.8μs (0.492% faster)
    parsed = parse_map_field(1, out)

def test_empty_string_key():
    # Test with empty string as key
    d = {"": 99.9}
    codeflash_output = _map_str_double(1, d); out = codeflash_output # 8.66μs -> 8.70μs (0.494% slower)
    parsed = parse_map_field(1, out)

def test_special_float_values():
    # Test with inf, -inf, nan values
    import math
    d = {"inf": float('inf'), "ninf": float('-inf'), "nan": float('nan')}
    codeflash_output = _map_str_double(1, d); out = codeflash_output # 14.1μs -> 14.1μs (0.142% faster)
    parsed = parse_map_field(1, out)

def test_large_and_small_floats():
    # Test with very large and very small float values
    d = {"big": 1e308, "small": 1e-308}
    codeflash_output = _map_str_double(1, d); out = codeflash_output # 11.6μs -> 11.5μs (0.390% faster)
    parsed = parse_map_field(1, out)

def test_negative_zero():
    # Test with -0.0 value
    d = {"negzero": -0.0}
    codeflash_output = _map_str_double(1, d); out = codeflash_output # 8.51μs -> 8.66μs (1.78% slower)
    parsed = parse_map_field(1, out)

def test_order_preserved():
    # Dict order should be preserved in Python 3.7+
    d = {"a": 1.0, "b": 2.0, "c": 3.0}
    codeflash_output = _map_str_double(1, d); out = codeflash_output # 14.3μs -> 14.0μs (1.79% faster)
    # The entries should appear in the same order as inserted
    # We'll check the order of the keys in the encoded bytes
    offset = 0
    keys_in_bytes = []
    key_byte = (1 << 3) | 2
    while offset < len(out):
        offset += 1
        entry_len, offset = decode_varint(out, offset)
        entry_bytes = out[offset:offset+entry_len]
        k, _ = parse_map_entry(entry_bytes)
        keys_in_bytes.append(k)
        offset += entry_len

def test_non_string_key_raises():
    # Should raise TypeError if key is not a string
    d = {1: 2.0}
    with pytest.raises(AttributeError):
        _map_str_double(1, d) # 3.93μs -> 6.14μs (36.0% slower)

def test_non_float_value_casts():
    # Should cast values to float
    class MyFloat(float): pass
    d = {"a": MyFloat(3.14)}
    codeflash_output = _map_str_double(1, d); out = codeflash_output # 9.07μs -> 9.23μs (1.74% slower)
    parsed = parse_map_field(1, out)

# 3. Large Scale Test Cases

def test_large_number_of_entries():
    # Test with 1000 entries
    d = {f"key{i}": float(i) for i in range(1000)}
    codeflash_output = _map_str_double(1, d); out = codeflash_output # 2.53ms -> 2.38ms (6.45% faster)
    parsed = parse_map_field(1, out)

def test_large_keys_and_values():
    # Test with long string keys and large float values
    d = {("x" * 250) + str(i): float(1e10 + i) for i in range(10)}
    codeflash_output = _map_str_double(1, d); out = codeflash_output # 38.6μs -> 36.7μs (5.20% faster)
    parsed = parse_map_field(1, out)
    for i in range(10):
        key = ("x" * 250) + str(i)

def test_performance_large_dict(monkeypatch):
    # Test that function is reasonably fast for 1000 entries
    import time
    d = {f"k{i}": float(i) for i in range(1000)}
    start = time.time()
    codeflash_output = _map_str_double(1, d); out = codeflash_output # 2.54ms -> 2.37ms (7.25% faster)
    duration = time.time() - start


#------------------------------------------------
from __future__ import annotations

import struct
import typing

# imports
import pytest  # used for our unit tests
from src.deepgram.extensions.telemetry.proto_encoder import _map_str_double

# function to test
# --- Protobuf wire helpers (proto3) ---


def _varint(value: int) -> bytes:
    if value < 0:
        # For this usage we only encode non-negative values
        value &= (1 << 64) - 1
    out = bytearray()
    while value > 0x7F:
        out.append((value & 0x7F) | 0x80)
        value >>= 7
    out.append(value)
    return bytes(out)

def _key(field_number: int, wire_type: int) -> bytes:
    return _varint((field_number << 3) | wire_type)
from src.deepgram.extensions.telemetry.proto_encoder import _map_str_double

# unit tests

# Helper to decode the protobuf output for map<string,double>
def decode_map_str_double(field_number, data):
    """
    Decodes the protobuf-encoded map<string,double> as produced by _map_str_double.
    Returns a list of (key, value) tuples.
    """
    res = []
    i = 0
    while i < len(data):
        # Read key for the map entry
        key_byte = _key(field_number, 2)
        i += len(key_byte)
        # Read length
        length, shift = 0, 0
        while True:
            b = data[i]
            i += 1
            length |= (b & 0x7F) << shift
            if not (b & 0x80):
                break
            shift += 7
        entry_end = i + length
        i += 1
        # Read string length
        str_len, shift = 0, 0
        while True:
            b = data[i]
            i += 1
            str_len |= (b & 0x7F) << shift
            if not (b & 0x80):
                break
            shift += 7
        key = data[i:i+str_len].decode('utf-8')
        i += str_len
        i += 1
        value = struct.unpack("<d", data[i:i+8])[0]
        i += 8
        res.append((key, value))
    return res

# ------------------- BASIC TEST CASES -------------------

def test_empty_dict_returns_empty_bytes():
    # Should return empty bytes for empty dict
    codeflash_output = _map_str_double(1, {}) # 816ns -> 769ns (6.11% faster)

def test_none_returns_empty_bytes():
    # Should return empty bytes for None input
    codeflash_output = _map_str_double(1, None) # 635ns -> 641ns (0.936% slower)

def test_single_entry_simple():
    # Single key-value pair
    codeflash_output = _map_str_double(2, {"foo": 1.5}); result = codeflash_output # 9.07μs -> 9.14μs (0.766% slower)
    decoded = decode_map_str_double(2, result)

def test_multiple_entries():
    # Multiple key-value pairs
    d = {"a": 1.0, "b": 2.5, "c": -3.75}
    codeflash_output = _map_str_double(3, d); result = codeflash_output # 14.3μs -> 14.2μs (0.798% faster)
    decoded = decode_map_str_double(3, result)

def test_float_and_int_values():
    # Accepts both int and float values
    d = {"x": 42, "y": 3.14}
    codeflash_output = _map_str_double(4, d); result = codeflash_output # 12.2μs -> 11.7μs (3.95% faster)
    decoded = decode_map_str_double(4, result)

def test_field_number_variation():
    # Changing field_number changes the encoding
    d = {"key": 0.0}
    codeflash_output = _map_str_double(1, d); out1 = codeflash_output # 8.44μs -> 8.64μs (2.27% slower)
    codeflash_output = _map_str_double(5, d); out2 = codeflash_output # 3.92μs -> 3.87μs (1.14% faster)

# ------------------- EDGE TEST CASES -------------------

def test_empty_string_key():
    # Key is empty string
    d = {"": 7.77}
    codeflash_output = _map_str_double(1, d); result = codeflash_output # 8.58μs -> 8.54μs (0.433% faster)
    decoded = decode_map_str_double(1, result)

def test_zero_value():
    # Value is zero
    d = {"zero": 0.0}
    codeflash_output = _map_str_double(1, d); result = codeflash_output # 8.60μs -> 8.71μs (1.33% slower)
    decoded = decode_map_str_double(1, result)

def test_negative_value():
    # Value is negative
    d = {"neg": -123.456}
    codeflash_output = _map_str_double(1, d); result = codeflash_output # 8.55μs -> 8.71μs (1.84% slower)
    decoded = decode_map_str_double(1, result)

def test_large_double_value():
    # Very large double value
    d = {"big": 1.7e+308}
    codeflash_output = _map_str_double(1, d); result = codeflash_output # 8.50μs -> 8.64μs (1.55% slower)
    decoded = decode_map_str_double(1, result)

def test_small_double_value():
    # Very small double value
    d = {"small": 5e-324}
    codeflash_output = _map_str_double(1, d); result = codeflash_output # 8.63μs -> 8.69μs (0.691% slower)
    decoded = decode_map_str_double(1, result)

def test_nan_and_inf_values():
    # NaN and infinity
    d = {"nan": float('nan'), "inf": float('inf'), "ninf": float('-inf')}
    codeflash_output = _map_str_double(1, d); result = codeflash_output # 14.2μs -> 14.2μs (0.516% faster)
    decoded = decode_map_str_double(1, result)
    # NaN != NaN, so check using math.isnan
    import math
    keys = [k for k, _ in decoded]
    values = [v for _, v in decoded]
    nan_idx = keys.index("nan")
    inf_idx = keys.index("inf")
    ninf_idx = keys.index("ninf")

def test_unicode_key():
    # Unicode key should be encoded as UTF-8
    d = {"ключ": 3.14, "你好": 2.71}
    codeflash_output = _map_str_double(1, d); result = codeflash_output # 11.8μs -> 11.7μs (0.426% faster)
    decoded = decode_map_str_double(1, result)

def test_long_key():
    # Key is a long string
    key = "a" * 255
    d = {key: 1.23}
    codeflash_output = _map_str_double(1, d); result = codeflash_output # 10.3μs -> 10.3μs (0.097% slower)
    decoded = decode_map_str_double(1, result)

def test_zero_length_dict():
    # Explicitly test dict with no entries
    d = dict()
    codeflash_output = _map_str_double(1, d); result = codeflash_output # 658ns -> 643ns (2.33% faster)

def test_non_string_key_raises():
    # Non-string key should raise TypeError during encoding
    d = {42: 1.0}
    with pytest.raises(AttributeError):
        _map_str_double(1, d) # 3.97μs -> 6.05μs (34.4% slower)

def test_non_float_value_castable():
    # Value is a string that can be cast to float
    d = {"a": "2.5"}
    codeflash_output = _map_str_double(1, d); result = codeflash_output # 9.56μs -> 9.58μs (0.167% slower)
    decoded = decode_map_str_double(1, result)

def test_non_float_value_not_castable():
    # Value is a string that cannot be cast to float
    d = {"a": "hello"}
    with pytest.raises(ValueError):
        _map_str_double(1, d) # 8.29μs -> 8.58μs (3.38% slower)

def test_duplicate_keys():
    # Dict cannot have duplicate keys, but test that only one is present
    d = {"dup": 1.0, "dup": 2.0}
    codeflash_output = _map_str_double(1, d); result = codeflash_output # 8.68μs -> 8.80μs (1.41% slower)
    decoded = decode_map_str_double(1, result)

# ------------------- LARGE SCALE TEST CASES -------------------

def test_large_number_of_entries():
    # Test with 1000 entries
    d = {f"key{i}": float(i) for i in range(1000)}
    codeflash_output = _map_str_double(1, d); result = codeflash_output # 2.53ms -> 2.37ms (6.60% faster)
    decoded = decode_map_str_double(1, result)

def test_large_keys_and_values():
    # Test with large keys and large float values
    d = {("k"*100): 1.1e308, ("v"*200): -1.1e308}
    codeflash_output = _map_str_double(1, d); result = codeflash_output # 13.1μs -> 12.9μs (1.35% faster)
    decoded = decode_map_str_double(1, result)

def test_performance_large_map(monkeypatch):
    # Test that encoding 1000 items does not take too long
    import time
    d = {f"k{i}": float(i) for i in range(1000)}
    start = time.time()
    codeflash_output = _map_str_double(1, d); result = codeflash_output # 2.54ms -> 2.37ms (7.23% faster)
    duration = time.time() - start
    decoded = decode_map_str_double(1, result)

def test_order_preservation():
    # Map order should be preserved (since Python 3.7)
    d = {"first": 1.0, "second": 2.0, "third": 3.0}
    codeflash_output = _map_str_double(1, d); result = codeflash_output # 14.6μs -> 14.3μs (2.05% faster)
    decoded = decode_map_str_double(1, result)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from src.deepgram.extensions.telemetry.proto_encoder import _map_str_double

def test__map_str_double():
    _map_str_double(0, {'': 0.0})

def test__map_str_double_2():
    _map_str_double(0, {})
🔎 Concolic Coverage Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
codeflash_concolic_5p92pe1r/tmp2ndo3u2x/test_concolic_coverage.py::test__map_str_double 8.56μs 8.73μs -1.94%⚠️
codeflash_concolic_5p92pe1r/tmp2ndo3u2x/test_concolic_coverage.py::test__map_str_double_2 600ns 623ns -3.69%⚠️

To edit these changes git checkout codeflash/optimize-_map_str_double-mgup1aq3 and push.

Codeflash

Impact: low
 Impact_explanation: Looking at this optimization report, I need to assess the impact based on the provided rubric.

**Analysis:**

1. **Overall Runtime Details**: 
   - Original: 10.5ms, Optimized: 9.85ms (6.6% speedup)
   - While the absolute runtime is above 100 microseconds, the relative speedup of 6.6% is below the 15% threshold mentioned in the rubric

2. **Generated Tests Performance**:
   - Small operations (single entries, edge cases): Mostly showing minimal improvements or slight regressions (0-3% range)
   - Large-scale operations (1000 entries): Showing consistent 6-7% improvements
   - Many test cases show <2% improvements or even slight slowdowns, which the rubric considers low impact

3. **Hot Path Analysis**:
   - The function `_map_str_double` is called from `_encode_telemetry_event` 
   - This appears to be a telemetry encoding function, but there's no indication it's called in a tight loop
   - The calling context suggests it's used once per telemetry event, not in a multiplicative hot path

4. **Performance Consistency**:
   - The optimization shows inconsistent results across test cases
   - Small dictionaries often perform worse or marginally better
   - Only large dictionaries (1000+ entries) show meaningful improvements
   - This pattern suggests the optimization is not consistently beneficial

5. **Asymptotic Complexity**:
   - The optimization doesn't appear to change the algorithmic complexity
   - It's primarily about reducing function call overhead and byte concatenations

**Assessment**: 
- The 6.6% speedup is below the 15% threshold for significance
- Performance gains are inconsistent across different input sizes
- Many test cases show <2% improvements or regressions
- No evidence of being in a multiplicative hot path
- No improvement in asymptotic complexity

 END OF IMPACT EXPLANATION

The optimized code achieves a **6% speedup** through strategic elimination of byte concatenations and redundant operations:

**Key Optimizations:**

1. **Efficient byte joining in `_len_delimited`**: Replaced multiple `+` concatenations with a single `b"".join()` call, reducing intermediate byte object creation.

2. **Eliminated redundant function calls in `_map_str_double`**: The original code called `_string()` and `_double()` helper functions, which then called `_len_delimited()` again. The optimized version inlines these operations using direct `_key()`, `_varint()`, and `struct.pack()` calls, avoiding the extra function call overhead and intermediate concatenations.

3. **Pre-computed `bytearray.extend` reference**: Stores `out.extend` in a local variable `append` to avoid repeated attribute lookups during the loop.

4. **Single UTF-8 encoding per key**: Uses `k.encode("utf-8")` directly in the join operation rather than encoding it separately in helper functions.

**Performance Impact by Test Case:**
- **Large-scale operations show the biggest gains** (6-7% faster on 1000+ entries) where the reduced function call overhead and concatenation optimizations compound
- **Small dictionaries** show minimal or slight regression due to the added complexity of inlined operations
- **Unicode and edge cases** maintain similar performance while preserving correctness

The optimization particularly excels when processing large maps with many entries, where the elimination of redundant function calls and more efficient byte building significantly reduces total execution time.
@codeflash-ai codeflash-ai bot requested a review from aseembits93 October 17, 2025 10:16
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants