Skip to content

⚡️ Speed up _sanitize_tag_value() by 812% in sentry_sdk/metrics.py#17

Open
codeflash-ai[bot] wants to merge 1 commit intomasterfrom
codeflash/optimize-_sanitize_tag_value-2024-06-15T08.58.12
Open

⚡️ Speed up _sanitize_tag_value() by 812% in sentry_sdk/metrics.py#17
codeflash-ai[bot] wants to merge 1 commit intomasterfrom
codeflash/optimize-_sanitize_tag_value-2024-06-15T08.58.12

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Jun 15, 2024

📄 _sanitize_tag_value() in sentry_sdk/metrics.py

📈 Performance improved by 812% (8.12x faster)

⏱️ Runtime went down from 5.08 milliseconds to 558 microseconds

Explanation and details

Certainly! Here's a more optimized version of the program that should run faster by avoiding the creation of a translation table for every function call.

This version directly uses chained replace calls to avoid the overhead of creating a translation table. Each replace call operates in linear time, and combining them is generally more efficient for a small number of replacements.

Correctness verification

The new optimized code was tested for correctness. The results are listed below.

🔘 (none found) − ⚙️ Existing Unit Tests

✅ 37 Passed − 🌀 Generated Regression Tests

(click to show generated tests)
# imports
import pytest  # used for our unit tests
from sentry_sdk.metrics import _sanitize_tag_value

# unit tests

# Basic Functionality
def test_single_special_characters():
    assert _sanitize_tag_value("Hello\nWorld") == "Hello\\nWorld"
    assert _sanitize_tag_value("Hello\rWorld") == "Hello\\rWorld"
    assert _sanitize_tag_value("Hello\tWorld") == "Hello\\tWorld"
    assert _sanitize_tag_value("Hello\\World") == "Hello\\\\World"
    assert _sanitize_tag_value("Hello|World") == "Hello\\u{7c}World"
    assert _sanitize_tag_value("Hello,World") == "Hello\\u{2c}World"

def test_multiple_special_characters():
    assert _sanitize_tag_value("Hello\nWorld\rTest\tString") == "Hello\\nWorld\\rTest\\tString"
    assert _sanitize_tag_value("Path\\to\\file") == "Path\\\\to\\\\file"
    assert _sanitize_tag_value("Column1|Column2,Column3") == "Column1\\u{7c}Column2\\u{2c}Column3"

# Edge Cases
def test_empty_string():
    assert _sanitize_tag_value("") == ""

def test_string_without_special_characters():
    assert _sanitize_tag_value("HelloWorld") == "HelloWorld"
    assert _sanitize_tag_value("Just a regular string") == "Just a regular string"

def test_string_with_only_special_characters():
    assert _sanitize_tag_value("\n\r\t\\|,") == "\\n\\r\\t\\\\\\u{7c}\\u{2c}"
    assert _sanitize_tag_value("|\n,\t") == "\\u{7c}\\n\\u{2c}\\t"

def test_string_with_repeated_special_characters():
    assert _sanitize_tag_value("\n\n\n") == "\\n\\n\\n"
    assert _sanitize_tag_value("\\\\\\") == "\\\\\\\\\\\\"
    assert _sanitize_tag_value("|||||") == "\\u{7c}\\u{7c}\\u{7c}\\u{7c}\\u{7c}"
    assert _sanitize_tag_value(",,,,") == "\\u{2c}\\u{2c}\\u{2c}\\u{2c}"

# Unicode and Non-ASCII Characters
def test_string_with_non_ascii_characters():
    assert _sanitize_tag_value("Café\nMünchen") == "Café\\nMünchen"
    assert _sanitize_tag_value("Привет\tмир") == "Привет\\tмир"
    assert _sanitize_tag_value("こんにちは\\世界") == "こんにちは\\\\世界"

def test_string_with_mixed_ascii_and_non_ascii_characters():
    assert _sanitize_tag_value("Hello\n世界") == "Hello\\n世界"
    assert _sanitize_tag_value("Test\tテスト") == "Test\\tテスト"

# Large Scale Test Cases
def test_very_large_string():
    assert _sanitize_tag_value("A" * 10000) == "A" * 10000
    assert _sanitize_tag_value("\n" * 10000) == "\\n" * 10000
    assert _sanitize_tag_value("Hello\nWorld" * 1000) == "Hello\\nWorld" * 1000

# Boundary Cases
def test_single_character_strings():
    assert _sanitize_tag_value("\n") == "\\n"
    assert _sanitize_tag_value("\r") == "\\r"
    assert _sanitize_tag_value("\t") == "\\t"
    assert _sanitize_tag_value("\\") == "\\\\"
    assert _sanitize_tag_value("|") == "\\u{7c}"
    assert _sanitize_tag_value(",") == "\\u{2c}"
    assert _sanitize_tag_value("A") == "A"

def test_string_with_adjacent_special_characters():
    assert _sanitize_tag_value("\n\r\t") == "\\n\\r\\t"
    assert _sanitize_tag_value("\\|,") == "\\\\\\u{7c}\\u{2c}"

# Mixed Content
def test_string_with_mixed_content():
    assert _sanitize_tag_value("Hello\nWorld\rThis\tis\\a|test,string") == "Hello\\nWorld\\rThis\\tis\\\\a\\u{7c}test\\u{2c}string"
    assert _sanitize_tag_value("Line1\nLine2\rLine3\tLine4\\End|Pipe,Comma") == "Line1\\nLine2\\rLine3\\tLine4\\\\End\\u{7c}Pipe\\u{2c}Comma"

🔘 (none found) − ⏪ Replay Tests

Certainly! Here's a more optimized version of the program that should run faster by avoiding the creation of a translation table for every function call.



This version directly uses chained `replace` calls to avoid the overhead of creating a translation table. Each `replace` call operates in linear time, and combining them is generally more efficient for a small number of replacements.
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Jun 15, 2024
@codeflash-ai codeflash-ai bot requested a review from misrasaurabh1 June 15, 2024 08:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants