Skip to content

_remote_debugging: binary format only total_samples:u32 #151292

@maurycy

Description

@maurycy

Bug report

Bug description:

The binary format defines total_samples as just u32:

uint32_t total_samples;

#define HDR_SIZE_SAMPLES 4
#define HDR_OFF_THREADS (HDR_OFF_SAMPLES + HDR_SIZE_SAMPLES)

That's not that much... With just 100khz:

Threads overflow after...
1 ~11.9 h
4 ~3.0 h
10 ~71 min
64 ~11 min

especially if we aim for continous profiling of real production systems. But even of macOS, I'm observing ~0.5-1MHz on my mach_vm_remap branch already...

To make matters worse, since #150349 it results in OverflowError and the binary file is just corrupted:

2026-06-11T02:03:53.357502000+0200 maurycy@gimel /Users/maurycy/src/github.com/maurycy/cpython (vmremap 0bdde7f?) % ls -l /tmp/overflow.bin
-rw-r--r--@ 1 root  wheel  4894411257 Jun 11 01:45 /tmp/overflow.bin
2026-06-11T02:03:58.920689000+0200 maurycy@gimel /Users/maurycy/src/github.com/maurycy/cpython (vmremap 0bdde7f?) % head -c 128 /tmp/overflow.bin | xxd
00000000: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000010: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000020: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000030: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000040: 28b5 2ffd 0058 5c1d 013a d500 1621 8025  (./..X\..:...!.%
00000050: 49d2 0133 3003 331a 3330 3d32 35a9 ccc0  I..30.3.30=25...
00000060: 3aa9 049c b5d6 b69d d296 3122 4488 0c6d  :.........1"D..m
00000070: 0148 0150 014f 0946 32bd b931 318c b6de  .H.P.O.F2..11...

as the header is written on finalize:


if (FSEEK64(writer->fp, 0, SEEK_SET) < 0) {
PyErr_SetFromErrno(PyExc_IOError);
return -1;
}
/* Convert file offsets and counts to fixed-width types for portable header format.
* This ensures correct behavior on both little-endian and big-endian systems. */
uint64_t string_table_offset_u64 = (uint64_t)string_table_offset;
uint64_t frame_table_offset_u64 = (uint64_t)frame_table_offset;
uint32_t thread_count_u32 = (uint32_t)writer->thread_count;
uint32_t compression_type_u32 = (uint32_t)writer->compression_type;
uint8_t header[FILE_HEADER_SIZE] = {0};
uint32_t magic = BINARY_FORMAT_MAGIC;
uint32_t version = BINARY_FORMAT_VERSION;
memcpy(header + HDR_OFF_MAGIC, &magic, HDR_SIZE_MAGIC);
memcpy(header + HDR_OFF_VERSION, &version, HDR_SIZE_VERSION);
header[HDR_OFF_PY_MAJOR] = PY_MAJOR_VERSION;
header[HDR_OFF_PY_MINOR] = PY_MINOR_VERSION;
header[HDR_OFF_PY_MICRO] = PY_MICRO_VERSION;
memcpy(header + HDR_OFF_START_TIME, &writer->start_time_us, HDR_SIZE_START_TIME);
memcpy(header + HDR_OFF_INTERVAL, &writer->sample_interval_us, HDR_SIZE_INTERVAL);
memcpy(header + HDR_OFF_SAMPLES, &writer->total_samples, HDR_SIZE_SAMPLES);
memcpy(header + HDR_OFF_THREADS, &thread_count_u32, HDR_SIZE_THREADS);
memcpy(header + HDR_OFF_STR_TABLE, &string_table_offset_u64, HDR_SIZE_STR_TABLE);
memcpy(header + HDR_OFF_FRAME_TABLE, &frame_table_offset_u64, HDR_SIZE_FRAME_TABLE);
memcpy(header + HDR_OFF_COMPRESSION, &compression_type_u32, HDR_SIZE_COMPRESSION);
if (fwrite_checked_allow_threads(header, FILE_HEADER_SIZE, writer->fp) < 0) {

To put aside that u32 is just too little, we should be graceful here.

Other fields seem to be fine, but need a double check here. I think that rotating files is a stop gap, and we need chunking.

Reproduction

(as a part of routing stress tests of maurycy#3)

[130] 2026-06-10T23:35:58.225891000+0200 maurycy@gimel /Users/maurycy/src/github.com/maurycy/cpython (vmremap 92ad857*?) % sudo ./python.exe -m profiling.sampling run --binary -r 1000khz -d 25000 -o /tmp/overflow.bin --realtime-stats busywork.py
Stats: 647,833.9Hz (1.5µs) Min: 615,379.9Hz Max: 705,712.7Hz N=4294631818 Cache: 100.0% (4294631818+0/1)Traceback (most recent call last):
  File "<frozen runpy>", line 201, in _run_module_as_main
  File "<frozen runpy>", line 87, in _run_code
  File "/Users/maurycy/src/github.com/maurycy/cpython/Lib/profiling/sampling/__main__.py", line 65, in <module>
    main()
    ~~~~^^
  File "/Users/maurycy/src/github.com/maurycy/cpython/Lib/profiling/sampling/cli.py", line 977, in main
    _main()
    ~~~~~^^
  File "/Users/maurycy/src/github.com/maurycy/cpython/Lib/profiling/sampling/cli.py", line 1133, in _main
    handler(args)
    ~~~~~~~^^^^^^
  File "/Users/maurycy/src/github.com/maurycy/cpython/Lib/profiling/sampling/cli.py", line 1280, in _handle_run
    collector = sample(
        process.pid,
    ...<9 lines>...
        blocking=args.blocking,
    )
  File "/Users/maurycy/src/github.com/maurycy/cpython/Lib/profiling/sampling/sample.py", line 504, in sample
    profiler.sample(collector, duration_sec, async_aware=async_aware)
    ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/maurycy/src/github.com/maurycy/cpython/Lib/profiling/sampling/sample.py", line 167, in sample
    raise e from None
  File "/Users/maurycy/src/github.com/maurycy/cpython/Lib/profiling/sampling/sample.py", line 155, in sample
    collector.collect(stack_frames)
    ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^
  File "/Users/maurycy/src/github.com/maurycy/cpython/Lib/profiling/sampling/binary_collector.py", line 84, in collect
    self._writer.write_sample(stack_frames, timestamp_us)
    ~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OverflowError: too many samples for binary format
[1] 2026-06-11T01:47:42.249903000+0200 maurycy@gimel /Users/maurycy/src/github.com/maurycy/cpython (vmremap cd5a675*?) %
def hot_a(n):
    return sum(i * i for i in range(n))


def hot_b(n):
    return sum(i + i for i in range(n))


def worker():
    while True:
        hot_a(6_000_000)
        hot_b(6_000_000)


worker()

CPython versions tested on:

CPython main branch

Operating systems tested on:

macOS

Metadata

Metadata

Assignees

No one assigned

    Labels

    type-bugAn unexpected behavior, bug, or error
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions