cobalt/tools: Robustify analysis pipeline against invalid timestamps #7862

Awallky · 2025-11-04T17:41:16Z

The smaps pipeline was too complicated, trying to aggregate some of the memory consumers which were similarly named, such as those with the pattern "mem/shared_memory", among others. There are concerns that this masked or dropped the smaps data, leading to low counts for PSS and RSS. This change speculatively removes this aggregation in case this is a problem. This also fixes some bugs in the visualization script and adds testing to ensure memory leaks are properly accounted for.

The smaps analysis pipeline failed on a couple of attempts due to visualize_smaps_analysis.py receiving invalid timestamp strings from analyze_smaps_logs.py. This occurred when analyze_smaps_logs.py encountered log files with names that did not conform to the expected timestamp format.

This change addresses the issue by:

Modifying analyze_smaps_logs.py's extract_timestamp function to return None for filenames that do not contain a valid timestamp, instead of a placeholder string.
Updating the file filtering logic in analyze_smaps_logs.py to skip any log files for which a valid timestamp cannot be extracted, preventing them from being passed to downstream visualization.
Correcting the regular expression in extract_timestamp to accurately match the _processed.txt suffix of the log files generated by read_smaps_batch.py.
Ensuring the run_analysis_pipeline.py script uses the correct log directory path.

On top of that this adds a test hardening the data integrity for leak detection. The analyze_smaps_logs_test.py includes a test (test_analyze_logs_json_output) that simulates a memory leak. It creates dummy smaps files where a component (<leaking_lib>) shows increasing PSS and RSS values over time. This test then asserts that this time-series memory growth is correctly captured and structured within the JSON output. This ensures that the foundational data required for identifying leaks is accurately processed.

#vibe-coded
Bug: 456178181

The smaps analysis pipeline was failing due to visualize_smaps_analysis.py receiving invalid timestamp strings from analyze_smaps_logs.py. This occurred when analyze_smaps_logs.py encountered log files with names that did not conform to the expected timestamp format. This change addresses the issue by: - Modifying analyze_smaps_logs.py's extract_timestamp function to return None for filenames that do not contain a valid timestamp, instead of a placeholder string. - Updating the file filtering logic in analyze_smaps_logs.py to skip any log files for which a valid timestamp cannot be extracted, preventing them from being passed to downstream visualization. - Correcting the regular expression in extract_timestamp to accurately match the _processed.txt suffix of the log files generated by read_smaps_batch.py. - Ensuring the run_analysis_pipeline.py script uses the correct log directory path. These changes make the smaps analysis pipeline more resilient to malformed log filenames and prevent crashes during visualization, improving its overall usability and correctness.

…7862) The smaps pipeline was too complicated, trying to aggregate some of the memory consumers which were similarly named, such as those with the pattern "mem/shared_memory", among others. There are concerns that this masked or dropped the smaps data, leading to low counts for PSS and RSS. This change speculatively removes this aggregation in case this is a problem. This also fixes some bugs in the visualization script and adds testing to ensure memory leaks are properly accounted for. The smaps analysis pipeline failed on a couple of attempts due to visualize_smaps_analysis.py receiving invalid timestamp strings from analyze_smaps_logs.py. This occurred when analyze_smaps_logs.py encountered log files with names that did not conform to the expected timestamp format. This change addresses the issue by: - Modifying analyze_smaps_logs.py's extract_timestamp function to return None for filenames that do not contain a valid timestamp, instead of a placeholder string. - Updating the file filtering logic in analyze_smaps_logs.py to skip any log files for which a valid timestamp cannot be extracted, preventing them from being passed to downstream visualization. - Correcting the regular expression in extract_timestamp to accurately match the _processed.txt suffix of the log files generated by read_smaps_batch.py. - Ensuring the run_analysis_pipeline.py script uses the correct log directory path. On top of that this adds a test hardening the data integrity for leak detection. The analyze_smaps_logs_test.py includes a test (test_analyze_logs_json_output) that simulates a memory leak. It creates dummy smaps files where a component (<leaking_lib>) shows increasing PSS and RSS values over time. This test then asserts that this time-series memory growth is correctly captured and structured within the JSON output. This ensures that the foundational data required for identifying leaks is accurately processed. Bug: 456178181 (cherry picked from commit 448df2e)

cobalt-github-releaser-bot · 2025-11-05T00:14:26Z

Caution

Creating the cherry pick PR failed! Check the log at https://github.com/youtube/cobalt/actions/runs/19086314321 for details.

…st invalid timestamps (#7874) Refer to the original PR: #7862 The smaps pipeline was too complicated, trying to aggregate some of the memory consumers which were similarly named, such as those with the pattern "mem/shared_memory", among others. There are concerns that this masked or dropped the smaps data, leading to low counts for PSS and RSS. This change speculatively removes this aggregation in case this is a problem. This also fixes some bugs in the visualization script and adds testing to ensure memory leaks are properly accounted for. The smaps analysis pipeline failed on a couple of attempts due to visualize_smaps_analysis.py receiving invalid timestamp strings from analyze_smaps_logs.py. This occurred when analyze_smaps_logs.py encountered log files with names that did not conform to the expected timestamp format. This change addresses the issue by: - Modifying analyze_smaps_logs.py's extract_timestamp function to return None for filenames that do not contain a valid timestamp, instead of a placeholder string. - Updating the file filtering logic in analyze_smaps_logs.py to skip any log files for which a valid timestamp cannot be extracted, preventing them from being passed to downstream visualization. - Correcting the regular expression in extract_timestamp to accurately match the _processed.txt suffix of the log files generated by read_smaps_batch.py. - Ensuring the run_analysis_pipeline.py script uses the correct log directory path. On top of that this adds a test hardening the data integrity for leak detection. The analyze_smaps_logs_test.py includes a test (test_analyze_logs_json_output) that simulates a memory leak. It creates dummy smaps files where a component (<leaking_lib>) shows increasing PSS and RSS values over time. This test then asserts that this time-series memory growth is correctly captured and structured within the JSON output. This ensures that the foundational data required for identifying leaks is accurately processed. #vibe-coded Bug: 456178181 Co-authored-by: Adam Walls <avvall@google.com>

Awallky requested a review from johnxwork November 4, 2025 17:46

johnxwork approved these changes Nov 4, 2025

View reviewed changes

johnxwork enabled auto-merge (squash) November 4, 2025 17:51

johnxwork added the cp-26.android Cherry Pick to the 26.android branch label Nov 4, 2025

Merge branch 'main' into add-smaps-capture-scripts-add-more-tests

6faba40

johnxwork merged commit 448df2e into youtube:main Nov 4, 2025
608 of 634 checks passed

cobalt-github-releaser-bot mentioned this pull request Nov 4, 2025

Cherry pick PR #7862: cobalt/tools: Robustify analysis pipeline against invalid timestamps #7874

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cobalt/tools: Robustify analysis pipeline against invalid timestamps #7862

cobalt/tools: Robustify analysis pipeline against invalid timestamps #7862

Awallky commented Nov 4, 2025 •

edited

Loading

Uh oh!

Uh oh!

cobalt-github-releaser-bot commented Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cobalt/tools: Robustify analysis pipeline against invalid timestamps #7862

cobalt/tools: Robustify analysis pipeline against invalid timestamps #7862

Conversation

Awallky commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

cobalt-github-releaser-bot commented Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Awallky commented Nov 4, 2025 •

edited

Loading