Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 69 additions & 17 deletions InternalDocs/profiling_binary_format.md
Original file line number Diff line number Diff line change
Expand Up @@ -272,33 +272,85 @@ byte.

## Frame Table

The frame table stores deduplicated frame entries:
The frame table stores deduplicated frame entries with full source position
information and bytecode opcode:

```
+----------------------+
| filename_idx: varint |
| funcname_idx: varint |
| lineno: svarint |
+----------------------+ (repeated for each frame)
+----------------------------+
| filename_idx: varint |
| funcname_idx: varint |
| lineno: svarint |
| end_lineno_delta: svarint |
| column: svarint |
| end_column_delta: svarint |
| opcode: u8 |
+----------------------------+ (repeated for each frame)
```

Each unique (filename, funcname, lineno) combination gets one entry. Two
calls to the same function at different line numbers produce different
frame entries; two calls at the same line number share one entry.
### Field Definitions

| Field | Type | Description |
|------------------|---------------|----------------------------------------------------------|
| filename_idx | varint | Index into string table for file name |
| funcname_idx | varint | Index into string table for function name |
| lineno | zigzag varint | Start line number (-1 for synthetic frames) |
| end_lineno_delta | zigzag varint | Delta from lineno (end_lineno = lineno + delta) |
| column | zigzag varint | Start column offset in UTF-8 bytes (-1 if not available) |
| end_column_delta | zigzag varint | Delta from column (end_column = column + delta) |
| opcode | u8 | Python bytecode opcode (0-254) or 255 for None |

### Delta Encoding

Position end values use delta encoding for efficiency:

- `end_lineno = lineno + end_lineno_delta`
- `end_column = column + end_column_delta`

Typical values:
- `end_lineno_delta`: Usually 0 (single-line expressions) → encodes to 1 byte
- `end_column_delta`: Usually 5-20 (expression width) → encodes to 1 byte

This saves ~1-2 bytes per frame compared to absolute encoding. When the base
value (lineno or column) is -1 (not available), the delta is stored as 0 and
the reconstructed value is -1.

### Sentinel Values

- `opcode = 255`: No opcode captured
- `lineno = -1`: Synthetic frame (no source location)
- `column = -1`: Column offset not available

### Deduplication

Each unique (filename, funcname, lineno, end_lineno, column, end_column,
opcode) combination gets one entry. This enables instruction-level profiling
where multiple bytecode instructions on the same line can be distinguished.

Strings and frames are deduplicated separately because they have different
cardinalities and reference patterns. A codebase might have hundreds of
unique source files but thousands of unique functions. Many functions share
the same filename, so storing the filename index in each frame entry (rather
than the full string) provides an additional layer of deduplication. A frame
entry is just three varints (typically 3-6 bytes) rather than two full
strings plus a line number.

Line numbers use signed varint (zigzag encoding) rather than unsigned to
handle edge cases. Synthetic frames—generated frames that don't correspond
directly to Python source code, such as C extension boundaries or internal
interpreter frames—use line number 0 or -1 to indicate the absence of a
source location. Zigzag encoding ensures these small negative values encode
entry is typically 7-9 bytes rather than two full strings plus location data.

### Size Analysis

Typical frame size with delta encoding:
- file_idx: 1-2 bytes
- func_idx: 1-2 bytes
- lineno: 1-2 bytes
- end_lineno_delta: 1 byte (usually 0)
- column: 1 byte (usually < 64)
- end_column_delta: 1 byte (usually < 64)
- opcode: 1 byte

**Total: ~7-9 bytes per frame**

Line numbers and columns use signed varint (zigzag encoding) to handle
sentinel values efficiently. Synthetic frames—generated frames that don't
correspond directly to Python source code, such as C extension boundaries or
internal interpreter frames—use -1 to indicate the absence of a source
location. Zigzag encoding ensures these small negative values encode
efficiently (−1 becomes 1, which is one byte) rather than requiring the
maximum varint length.

Expand Down
2 changes: 1 addition & 1 deletion Lib/profiling/sampling/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -649,7 +649,7 @@ def _validate_args(args, parser):
)

# Validate --opcodes is only used with compatible formats
opcodes_compatible_formats = ("live", "gecko", "flamegraph", "heatmap")
opcodes_compatible_formats = ("live", "gecko", "flamegraph", "heatmap", "binary")
if getattr(args, 'opcodes', False) and args.format not in opcodes_compatible_formats:
parser.error(
f"--opcodes is only compatible with {', '.join('--' + f for f in opcodes_compatible_formats)}."
Expand Down
184 changes: 143 additions & 41 deletions Lib/test/test_profiling/test_sampling_profiler/test_binary_format.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,10 +29,17 @@
)


def make_frame(filename, lineno, funcname):
"""Create a FrameInfo struct sequence."""
location = LocationInfo((lineno, lineno, -1, -1))
return FrameInfo((filename, location, funcname, None))
def make_frame(filename, lineno, funcname, end_lineno=None, column=None,
end_column=None, opcode=None):
"""Create a FrameInfo struct sequence with full location info and opcode."""
if end_lineno is None:
end_lineno = lineno
if column is None:
column = 0
if end_column is None:
end_column = 0
location = LocationInfo((lineno, end_lineno, column, end_column))
return FrameInfo((filename, location, funcname, opcode))


def make_thread(thread_id, frames, status=0):
Expand All @@ -54,6 +61,36 @@ def extract_lineno(location):
return location


def extract_location(location):
"""Extract full location info as dict from location tuple or None."""
if location is None:
return {"lineno": 0, "end_lineno": 0, "column": 0, "end_column": 0}
if isinstance(location, tuple) and len(location) >= 4:
return {
"lineno": location[0] if location[0] is not None else 0,
"end_lineno": location[1] if location[1] is not None else 0,
"column": location[2] if location[2] is not None else 0,
"end_column": location[3] if location[3] is not None else 0,
}
# Fallback for old-style location
lineno = location[0] if isinstance(location, tuple) else location
return {"lineno": lineno or 0, "end_lineno": lineno or 0, "column": 0, "end_column": 0}


def frame_to_dict(frame):
"""Convert a FrameInfo to a dict."""
loc = extract_location(frame.location)
return {
"filename": frame.filename,
"funcname": frame.funcname,
"lineno": loc["lineno"],
"end_lineno": loc["end_lineno"],
"column": loc["column"],
"end_column": loc["end_column"],
"opcode": frame.opcode,
}


class RawCollector:
"""Collector that captures all raw data grouped by thread."""

Expand All @@ -68,15 +105,7 @@ def collect(self, stack_frames, timestamps_us):
count = len(timestamps_us)
for interp in stack_frames:
for thread in interp.threads:
frames = []
for frame in thread.frame_info:
frames.append(
{
"filename": frame.filename,
"funcname": frame.funcname,
"lineno": extract_lineno(frame.location),
}
)
frames = [frame_to_dict(f) for f in thread.frame_info]
key = (interp.interpreter_id, thread.thread_id)
sample = {"status": thread.status, "frames": frames}
for _ in range(count):
Expand All @@ -93,15 +122,7 @@ def samples_to_by_thread(samples):
for sample in samples:
for interp in sample:
for thread in interp.threads:
frames = []
for frame in thread.frame_info:
frames.append(
{
"filename": frame.filename,
"funcname": frame.funcname,
"lineno": extract_lineno(frame.location),
}
)
frames = [frame_to_dict(f) for f in thread.frame_info]
key = (interp.interpreter_id, thread.thread_id)
by_thread[key].append(
{
Expand Down Expand Up @@ -187,25 +208,15 @@ def assert_samples_equal(self, expected_samples, collector):
for j, (exp_frame, act_frame) in enumerate(
zip(exp["frames"], act["frames"])
):
self.assertEqual(
exp_frame["filename"],
act_frame["filename"],
f"Thread ({interp_id}, {thread_id}), sample {i}, "
f"frame {j}: filename mismatch",
)
self.assertEqual(
exp_frame["funcname"],
act_frame["funcname"],
f"Thread ({interp_id}, {thread_id}), sample {i}, "
f"frame {j}: funcname mismatch",
)
self.assertEqual(
exp_frame["lineno"],
act_frame["lineno"],
f"Thread ({interp_id}, {thread_id}), sample {i}, "
f"frame {j}: lineno mismatch "
f"(expected {exp_frame['lineno']}, got {act_frame['lineno']})",
)
for field in ("filename", "funcname", "lineno", "end_lineno",
"column", "end_column", "opcode"):
self.assertEqual(
exp_frame[field],
act_frame[field],
f"Thread ({interp_id}, {thread_id}), sample {i}, "
f"frame {j}: {field} mismatch "
f"(expected {exp_frame[field]!r}, got {act_frame[field]!r})",
)


class TestBinaryRoundTrip(BinaryFormatTestBase):
Expand Down Expand Up @@ -484,6 +495,97 @@ def test_threads_interleaved_samples(self):
self.assertEqual(count, 60)
self.assert_samples_equal(samples, collector)

def test_full_location_roundtrip(self):
"""Full source location (end_lineno, column, end_column) roundtrips."""
frames = [
make_frame("test.py", 10, "func1", end_lineno=12, column=4, end_column=20),
make_frame("test.py", 20, "func2", end_lineno=20, column=8, end_column=45),
make_frame("test.py", 30, "func3", end_lineno=35, column=0, end_column=100),
]
samples = [[make_interpreter(0, [make_thread(1, frames)])]]
collector, count = self.roundtrip(samples)
self.assertEqual(count, 1)
self.assert_samples_equal(samples, collector)

def test_opcode_roundtrip(self):
"""Opcode values roundtrip exactly."""
opcodes = [0, 1, 50, 100, 150, 200, 254] # Valid Python opcodes
samples = []
for opcode in opcodes:
frame = make_frame("test.py", 10, "func", opcode=opcode)
samples.append([make_interpreter(0, [make_thread(1, [frame])])])
collector, count = self.roundtrip(samples)
self.assertEqual(count, len(opcodes))
self.assert_samples_equal(samples, collector)

def test_opcode_none_roundtrip(self):
"""Opcode=None (sentinel 255) roundtrips as None."""
frame = make_frame("test.py", 10, "func", opcode=None)
samples = [[make_interpreter(0, [make_thread(1, [frame])])]]
collector, count = self.roundtrip(samples)
self.assertEqual(count, 1)
self.assert_samples_equal(samples, collector)

def test_mixed_location_and_opcode(self):
"""Mixed full location and opcode data roundtrips."""
frames = [
make_frame("a.py", 10, "a", end_lineno=15, column=4, end_column=30, opcode=100),
make_frame("b.py", 20, "b", end_lineno=20, column=0, end_column=50, opcode=None),
make_frame("c.py", 30, "c", end_lineno=32, column=8, end_column=25, opcode=50),
]
samples = [[make_interpreter(0, [make_thread(1, frames)])]]
collector, count = self.roundtrip(samples)
self.assertEqual(count, 1)
self.assert_samples_equal(samples, collector)

def test_delta_encoding_multiline(self):
"""Multi-line spans (large end_lineno delta) roundtrip correctly."""
# This tests the delta encoding: end_lineno = lineno + delta
frames = [
make_frame("test.py", 1, "small", end_lineno=1, column=0, end_column=10),
make_frame("test.py", 100, "medium", end_lineno=110, column=0, end_column=50),
make_frame("test.py", 1000, "large", end_lineno=1500, column=0, end_column=200),
]
samples = [[make_interpreter(0, [make_thread(1, frames)])]]
collector, count = self.roundtrip(samples)
self.assertEqual(count, 1)
self.assert_samples_equal(samples, collector)

def test_column_positions_preserved(self):
"""Various column positions are preserved exactly."""
columns = [(0, 10), (4, 50), (8, 100), (100, 200)]
samples = []
for col, end_col in columns:
frame = make_frame("test.py", 10, "func", column=col, end_column=end_col)
samples.append([make_interpreter(0, [make_thread(1, [frame])])])
collector, count = self.roundtrip(samples)
self.assertEqual(count, len(columns))
self.assert_samples_equal(samples, collector)

def test_same_line_different_opcodes(self):
"""Same line with different opcodes creates distinct frames."""
# This tests that opcode is part of the frame key
frames = [
make_frame("test.py", 10, "func", opcode=100),
make_frame("test.py", 10, "func", opcode=101),
make_frame("test.py", 10, "func", opcode=102),
]
samples = [[make_interpreter(0, [make_thread(1, [f])]) for f in frames]]
collector, count = self.roundtrip(samples)
# Verify all three opcodes are preserved distinctly
self.assertEqual(count, 3)

def test_same_line_different_columns(self):
"""Same line with different columns creates distinct frames."""
frames = [
make_frame("test.py", 10, "func", column=0, end_column=10),
make_frame("test.py", 10, "func", column=15, end_column=25),
make_frame("test.py", 10, "func", column=30, end_column=40),
]
samples = [[make_interpreter(0, [make_thread(1, [f])]) for f in frames]]
collector, count = self.roundtrip(samples)
self.assertEqual(count, 3)


class TestBinaryEdgeCases(BinaryFormatTestBase):
"""Tests for edge cases in binary format."""
Expand Down
24 changes: 19 additions & 5 deletions Modules/_remote_debugging/binary_io.h
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,10 @@ extern "C" {
#define BINARY_FORMAT_MAGIC_SWAPPED 0x48434154 /* Byte-swapped magic for endianness detection */
#define BINARY_FORMAT_VERSION 1

/* Sentinel values for optional frame fields */
#define OPCODE_NONE 255 /* No opcode captured (u8 sentinel) */
#define LOCATION_NOT_AVAILABLE (-1) /* lineno/column not available (zigzag sentinel) */

/* Conditional byte-swap macros for cross-endian file reading.
* Uses Python's optimized byte-swap functions from pycore_bitutils.h */
#define SWAP16_IF(swap, x) ((swap) ? _Py_bswap16(x) : (x))
Expand Down Expand Up @@ -172,18 +176,28 @@ typedef struct {
size_t compressed_buffer_size;
} ZstdCompressor;

/* Frame entry - combines all frame data for better cache locality */
/* Frame entry - combines all frame data for better cache locality.
* Stores full source position (line, end_line, column, end_column) and opcode.
* Delta values are computed during serialization for efficiency. */
typedef struct {
uint32_t filename_idx;
uint32_t funcname_idx;
int32_t lineno;
int32_t lineno; /* Start line number (-1 for synthetic frames) */
int32_t end_lineno; /* End line number (-1 if not available) */
int32_t column; /* Start column in UTF-8 bytes (-1 if not available) */
int32_t end_column; /* End column in UTF-8 bytes (-1 if not available) */
uint8_t opcode; /* Python opcode (0-254) or OPCODE_NONE (255) */
} FrameEntry;

/* Frame key for hash table lookup */
/* Frame key for hash table lookup - includes all fields for proper deduplication */
typedef struct {
uint32_t filename_idx;
uint32_t funcname_idx;
int32_t lineno;
int32_t end_lineno;
int32_t column;
int32_t end_column;
uint8_t opcode;
} FrameKey;

/* Pending RLE sample - buffered for run-length encoding */
Expand Down Expand Up @@ -305,8 +319,8 @@ typedef struct {
PyObject **strings;
uint32_t strings_count;

/* Parsed frame table: packed as [filename_idx, funcname_idx, lineno] */
uint32_t *frame_data;
/* Parsed frame table: array of FrameEntry structures */
FrameEntry *frames;
uint32_t frames_count;

/* Sample data region */
Expand Down
Loading
Loading