chore(storage): add microbenchmark for MRD reads#17421
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a new microbenchmark script, benchmark_mrd_reads.py, to measure the performance of GCS Object Range Downloads using the AsyncMultiRangeDownloader (MRD). The review feedback highlights a potential ZeroDivisionError or incorrect slicing during random object generation if the upload chunk size exceeds the buffer size. Additionally, the reviewer suggests refactoring the upload and benchmark execution blocks to ensure that test objects are reliably cleaned up from the GCS bucket even if the upload or benchmark run fails.
| # Generate 10MiB of random buffer to slice from to avoid CPU urandom overhead | ||
| buffer_size = min(10 * 1024 * 1024, total_size_bytes) | ||
| random_buffer = os.urandom(buffer_size) |
There was a problem hiding this comment.
If chunk_size_bytes is larger than buffer_size (which is capped at 10 MiB), bytes_to_write can exceed buffer_size. This leads to two critical issues:
- If
bytes_to_writeis exactlybuffer_size + 1,buffer_size - bytes_to_write + 1becomes0, causing aZeroDivisionErrorwhen calculatingslice_startvia modulo. - If
bytes_to_writeis even larger,buffer_size - bytes_to_write + 1becomes negative, resulting in incorrect slicing ofrandom_bufferand silent data truncation/corruption during the upload.
To fix this, ensure buffer_size is at least chunk_size_bytes (up to total_size_bytes).
| # Generate 10MiB of random buffer to slice from to avoid CPU urandom overhead | |
| buffer_size = min(10 * 1024 * 1024, total_size_bytes) | |
| random_buffer = os.urandom(buffer_size) | |
| # Generate random buffer to slice from to avoid CPU urandom overhead. | |
| # Ensure buffer_size is at least chunk_size_bytes to prevent ZeroDivisionError or slicing beyond the buffer limit. | |
| buffer_size = min(max(10 * 1024 * 1024, chunk_size_bytes), total_size_bytes) | |
| random_buffer = os.urandom(buffer_size) |
| try: | ||
| chunk_size = min(upload_chunk_size_bytes, size_bytes) | ||
| await upload_random_object( | ||
| grpc_client, | ||
| args.bucket, | ||
| object_name, | ||
| size_bytes, | ||
| chunk_size, | ||
| ) | ||
| except Exception as e: | ||
| logging.error(f"Upload failed for size {size_str}: {e}") | ||
| sys.exit(1) | ||
|
|
||
| try: | ||
| chk_label = "Enabled" if enable_chk else "Disabled" |
There was a problem hiding this comment.
If upload_random_object fails or is interrupted, the benchmark script exits immediately without running the finally block that deletes the test object. This can leave orphaned objects in the GCS bucket.
By merging the upload phase into the main try block, we ensure that any partially uploaded or created object is always cleaned up by the finally block, even if the upload fails.
| try: | |
| chunk_size = min(upload_chunk_size_bytes, size_bytes) | |
| await upload_random_object( | |
| grpc_client, | |
| args.bucket, | |
| object_name, | |
| size_bytes, | |
| chunk_size, | |
| ) | |
| except Exception as e: | |
| logging.error(f"Upload failed for size {size_str}: {e}") | |
| sys.exit(1) | |
| try: | |
| chk_label = "Enabled" if enable_chk else "Disabled" | |
| try: | |
| chunk_size = min(upload_chunk_size_bytes, size_bytes) | |
| await upload_random_object( | |
| grpc_client, | |
| args.bucket, | |
| object_name, | |
| size_bytes, | |
| chunk_size, | |
| ) | |
| chk_label = "Enabled" if enable_chk else "Disabled" |
| finally: | ||
| try: | ||
| logging.info( | ||
| f"Cleaning up object gs://{args.bucket}/{object_name}..." | ||
| ) | ||
| await grpc_client.delete_object(args.bucket, object_name) | ||
| logging.info("Cleanup complete.") | ||
| except Exception as e: | ||
| logging.warning( | ||
| f"Warning: failed to delete test object {object_name}: {e}" | ||
| ) |
There was a problem hiding this comment.
This accompanies the change to merge the upload phase into the main try block. Adding an except block to the outer try ensures that any fatal benchmark or upload failures are logged nicely and exit with status code 1, while still executing the finally block to clean up the test object.
| finally: | |
| try: | |
| logging.info( | |
| f"Cleaning up object gs://{args.bucket}/{object_name}..." | |
| ) | |
| await grpc_client.delete_object(args.bucket, object_name) | |
| logging.info("Cleanup complete.") | |
| except Exception as e: | |
| logging.warning( | |
| f"Warning: failed to delete test object {object_name}: {e}" | |
| ) | |
| except Exception as e: | |
| logging.error(f"Benchmark run failed for size {size_str}: {e}") | |
| sys.exit(1) | |
| finally: | |
| try: | |
| logging.info( | |
| f"Cleaning up object gs://{args.bucket}/{object_name}..." | |
| ) | |
| await grpc_client.delete_object(args.bucket, object_name) | |
| logging.info("Cleanup complete.") | |
| except Exception as e: | |
| logging.warning( | |
| f"Warning: failed to delete test object {object_name}: {e}" | |
| ) |
References
- Avoid broad except Exception: blocks that silently return None. Instead, log the exception (e.g., using logger.warning) to aid in debugging and prevent masking underlying issues.
|
@chandra-siri, Please could you address the feedback from gemini-code-assist? |
Add a microbenchmark script for measuring multi-range downloader (MRD) reads performance.