youtube
diff --git a/‎cobalt/tools/performance/__init__.py‎
Lines changed: 0 additions & 15 deletions b/‎cobalt/tools/performance/__init__.py‎
Lines changed: 0 additions & 15 deletions
diff --git a/‎cobalt/tools/performance/smaps/README.md‎
Lines changed: 33 additions & 70 deletions b/‎cobalt/tools/performance/smaps/README.md‎
Lines changed: 33 additions & 70 deletions
diff --git a/‎cobalt/tools/performance/smaps/analyze_smaps_logs.py‎
Lines changed: 53 additions & 93 deletions b/‎cobalt/tools/performance/smaps/analyze_smaps_logs.py‎
Lines changed: 53 additions & 93 deletions
@@ -49,6 +49,35 @@ python3 smaps_capture.py [OPTIONS]
     python3 smaps_capture.py -o /tmp/my_smaps_logs -s R58M1293QYV
     ```
 
+## Unified Analysis Pipeline
+
+To simplify the analysis process, the `run_analysis_pipeline.py` script combines the batch processing, analysis, and visualization steps into a single command.
+
+### `run_analysis_pipeline.py`
+
+This script takes a directory of raw smaps logs and generates the final visualization PNG, handling all intermediate steps automatically.
+
+#### Usage
+
+```bash
+python3 run_analysis_pipeline.py <RAW_LOG_DIR> [OPTIONS]
+```
+
+#### Command-line Arguments
+
+*   `<RAW_LOG_DIR>` (required positional argument)
+    The path to the directory containing the raw smaps log files.
+*   `--output_image` (type: `str`, default: `smaps_analysis.png`)
+    The path where the final output PNG image will be saved.
+*   `--platform` (type: `str`, choices: `android`, `linux`, default: `android`)
+    Specify the platform for platform-specific aggregations.
+
+#### Example
+
+```bash
+python3 run_analysis_pipeline.py cobalt_smaps_logs --output_image my_analysis.png --platform android
+```
+
 ## Smaps Analysis and Batch Processing
 
 This directory also contains scripts for analyzing the captured smaps data.
@@ -128,12 +157,9 @@ python3 read_smaps_batch.py <RAW_LOGS_DIR> [OPTIONS]
 
 ### `analyze_smaps_logs.py`
 
-After processing a batch of smaps files, you can use this script to analyze the entire run. It reads a directory of processed smaps files, tracks memory usage over time, and generates a summary report.
+After processing a batch of smaps files, you can use this script to inspect the final memory state of the run. It reads a directory of processed smaps files and prints a detailed, non-aggregated memory breakdown from the last log file in the time series.
 
-The report includes:
-*   The top 10 largest memory consumers by PSS and RSS at the end of the run.
-*   The top 10 memory regions that have grown the most in PSS and RSS over the duration of the run.
-*   The overall change in total PSS and RSS.
+The primary purpose of this script is to either provide a snapshot of the final memory layout or to generate a JSON file containing the full time-series data. This JSON output is essential for the `visualize_smaps_analysis.py` script, which handles aggregation and visualization.
 
 The script can also output a structured JSON file containing the time-series data for further analysis or visualization.
 
@@ -148,11 +174,11 @@ python3 analyze_smaps_logs.py <PROCESSED_LOG_DIR> [OPTIONS]
 *   `<PROCESSED_LOG_DIR>` (required positional argument)
     The path to the directory containing the processed smaps log files.
 *   `--json_output` (type: `str`)
-    Optional: The path to a file where the JSON analysis output will be saved.
+    Optional: The path to a file where the JSON analysis output will be saved. This is required for visualization.
 
 #### Examples
 
-1.  **Print a text-based analysis to the console:**
+1.  **Print a detailed memory breakdown to the console:**
     ```bash
     python3 analyze_smaps_logs.py processed_logs
     ```
@@ -197,69 +223,6 @@ python3 visualize_smaps_analysis.py <JSON_FILE> [OPTIONS]
 python3 visualize_smaps_analysis.py analysis_output.json --output_image my_analysis.png
 ```
 
-## Unified Analysis Pipeline
-
-To simplify the analysis process, the `run_analysis_pipeline.py` script combines the batch processing, analysis, and visualization steps into a single command.
-
-### `run_analysis_pipeline.py`
-
-This script takes a directory of raw smaps logs and generates the final visualization PNG, handling all intermediate steps automatically.
-
-#### Usage
-
-```bash
-python3 run_analysis_pipeline.py <RAW_LOG_DIR> [OPTIONS]
-```
-
-#### Command-line Arguments
-
-*   `<RAW_LOG_DIR>` (required positional argument)
-    The path to the directory containing the raw smaps log files.
-*   `--output_image` (type: `str`, default: `smaps_analysis.png`)
-    The path where the final output PNG image will be saved.
-*   `--platform` (type: `str`, choices: `android`, `linux`, default: `android`)
-    Specify the platform for platform-specific aggregations.
-
-#### Example
-
-```bash
-python3 run_analysis_pipeline.py cobalt_smaps_logs --output_image my_analysis.png --platform android
-```
-
-## Improving Aggregation Rules
-
-The accuracy of this toolchain depends on its aggregation rules, which are heuristics based on known memory patterns. As Cobalt, Android, and third-party libraries evolve, new memory region names can appear. It is crucial to periodically check for and categorize these new regions to prevent gaps in the analysis.
-
-### How to Check for New Patterns
-
-1.  **Temporarily Disable Aggregation:** Open `run_analysis_pipeline.py` and remove the `-d` (or `--aggregate_android`) flag from the `batch_args` list. This will cause the batch processor to output a "raw" report with no special grouping.
-
-2.  **Run the Pipeline:** Execute the modified script on a recent and representative set of `smaps` logs.
-
-    ```bash
-    python3 run_analysis_pipeline.py /path/to/your/recent/logs
-    ```
-
-3.  **Examine the Raw Output:** The analysis printed to the console will now be much more detailed. Scan the "Top Largest Consumers" and "Top Memory Increases" lists. Look for patterns or repeated names that are not being grouped, such as:
-    *   New `[anon:<name>]` labels (e.g., we discovered `[anon:scudo:*]`).
-    *   Driver or shared memory regions (e.g., `/dev/ashmem/*`).
-    *   JIT or code cache regions (e.g., `/memfd:jit-cache`).
-    *   Any other large, unexplained region that appears frequently.
-
-4.  **Add New Aggregation Rules:** Open `read_smaps.py` and add new `re.sub()` rules within the `if args.aggregate_android:` block. Place more specific rules *before* more general ones.
-
-    ```python
-    # Example for adding a new rule for Skia resources
-    if args.aggregate_android:
-      key = re.sub(r'\[(anon:skia.*)\]', r'<\1>', key)  # New rule
-      key = re.sub(r'\[(anon:scudo:.*)\]', r'<\1>', key)
-      # ... other rules
-    ```
-
-5.  **Re-enable Aggregation:** Add the `-d` flag back to `run_analysis_pipeline.py` and re-run the pipeline to confirm that your new categories appear correctly.
-
-By following this process periodically, you can maintain a comprehensive and accurate view of the application's memory usage.
-
 ## Extending the Toolchain with New Fields
 
 The toolchain is designed to be extensible, allowing you to add new fields from the raw smaps files to the analysis and visualization. The key is to follow the data pipeline from the processor (`read_smaps.py`) to the analyzer (`analyze_smaps_logs.py`) and visualizer (`visualize_smaps_analysis.py`).
 
@@ -16,11 +16,10 @@
 """Parses and analyzes processed smaps logs."""
 
 import argparse
-from collections import defaultdict, OrderedDict
+from collections import OrderedDict
 import json
 import os
 import re
-import sys
 
 
 class ParsingError(Exception):
@@ -78,36 +77,19 @@ def parse_smaps_file(filepath):
             # This will skip non-integer lines, like the repeated header
             continue
 
-  # Second pass for aggregation
-  aggregated_data = OrderedDict()
-  shared_mem_total = defaultdict(int)
-  for name, data in memory_data.items():
-    if name.startswith('mem/shared_memory'):
-      for field, value in data.items():
-        shared_mem_total[field] += value
-    else:
-      aggregated_data[name] = data
-
-  if shared_mem_total:
-    aggregated_data['[mem/shared_memory]'] = dict(shared_mem_total)
-
-  return aggregated_data, total_data
+  return memory_data, total_data
 
 
 def extract_timestamp(filename):
   """Extracts the timestamp (YYYYMMDD_HHMMSS) from the filename for sorting."""
-  match = re.search(r'_(\d{8})_(\d{6})_\d{4}_processed\.txt$', filename)
+  match = re.search(r'smaps_(\d{8})_(\d{6})_\d+_processed\.txt$', filename)
   if match:
     return f'{match.group(1)}_{match.group(2)}'
 
-  print(
-      f"Warning: Could not extract timestamp from '{filename}'. "
-      'File will be sorted last.',
-      file=sys.stderr)
-  return '00000000_000000'  # Default for files without a clear timestamp
+  return None  # Return None if no timestamp is found
 
 
-def get_top_consumers(memory_data, metric='pss', top_n=5):
+def get_top_consumers(memory_data, metric='pss', top_n=10):
   """Returns the top N memory consumers by a given metric."""
   if not memory_data:
     return []
@@ -120,35 +102,34 @@ def get_top_consumers(memory_data, metric='pss', top_n=5):
 
 def analyze_logs(log_dir, json_output_filepath=None):
   """Analyzes a directory of processed smaps logs."""
-  all_files = [
-      os.path.join(log_dir, f)
-      for f in os.listdir(log_dir)
-      if f.endswith('_processed.txt')
-  ]
-  all_files.sort(key=extract_timestamp)
-
-  if not all_files:
-    print(f'No processed smaps files found in {log_dir}')
+  all_files_with_ts = []
+  for f in os.listdir(log_dir):
+    if f.endswith('_processed.txt'):
+      filepath = os.path.join(log_dir, f)
+      timestamp = extract_timestamp(os.path.basename(filepath))
+      if timestamp:
+        all_files_with_ts.append((timestamp, filepath))
+
+  if not all_files_with_ts:
+    print(f'No processed smaps files with valid timestamps found in {log_dir}')
     return
 
+  # Sort files based on the extracted timestamp
+  all_files_with_ts.sort(key=lambda x: x[0])
+  all_files = [filepath for _, filepath in all_files_with_ts]
+
   print(f'Analyzing {len(all_files)} processed smaps files...')
 
   # List to store structured data for JSON output
   analysis_data = []
 
-  # Store data over time for each memory region
-  total_history = defaultdict(list)
-
-  first_timestamp = None
   last_timestamp = None
   last_memory_data = None
-  first_memory_data = None
+  last_total_data = None
 
   for filepath in all_files:
     filename = os.path.basename(filepath)
     timestamp = extract_timestamp(filename)
-    if not first_timestamp:
-      first_timestamp = timestamp
     last_timestamp = timestamp
 
     try:
@@ -157,10 +138,10 @@ def analyze_logs(log_dir, json_output_filepath=None):
       print(f'Warning: {e}')
       continue
 
-    if first_memory_data is None:
-      first_memory_data = memory_data
-    last_memory_data = memory_data  # Keep track of the last data
+    last_memory_data = memory_data
+    last_total_data = total_data
 
+    # Still collect all data for the JSON output
     current_snapshot = {
         'timestamp':
             timestamp,
@@ -174,11 +155,8 @@ def analyze_logs(log_dir, json_output_filepath=None):
     }
     analysis_data.append(current_snapshot)
 
-    for metric, value in total_data.items():
-      total_history[metric].append(value)
-
   print('\n' + '=' * 50)
-  print(f'Analysis from {first_timestamp} to {last_timestamp}')
+  print(f'Analysis of the last log: {last_timestamp}')
   print('=' * 50)
 
   # Output JSON data if requested
@@ -187,57 +165,38 @@ def analyze_logs(log_dir, json_output_filepath=None):
       json.dump(analysis_data, f, indent=2)
     print(f'JSON analysis saved to {json_output_filepath}')
 
-  # 1. Largest Consumers by the end log
-  print('\nOverall Total Memory Change:')
-  print('\nTop 10 Largest Consumers by the End Log (PSS):')
-  top_pss = get_top_consumers(last_memory_data, metric='pss', top_n=10)
-  for name, data in top_pss:
-    print(f"  - {name}: {data.get('pss', 0)} kB PSS, "
-          f"{data.get('rss', 0)} kB RSS")
-
-  print('\nTop 10 Largest Consumers by the End Log (RSS):')
-  top_rss = get_top_consumers(last_memory_data, metric='rss', top_n=10)
-  for name, data in top_rss:
-    print(f"  - {name}: {data.get('rss', 0)} kB RSS, "
-          f"{data.get('pss', 0)} kB PSS")
-
-  # 2. Top 10 Increases in Memory Over Time
-  print('\nTop 10 Memory Increases Over Time (PSS):')
-  pss_growth = []
-  if last_memory_data and first_memory_data:
-    all_keys = set(first_memory_data.keys()) | set(last_memory_data.keys())
-    for r_name in all_keys:
-      initial_pss = first_memory_data.get(r_name, {}).get('pss', 0)
-      final_pss = last_memory_data.get(r_name, {}).get('pss', 0)
-      growth = final_pss - initial_pss
-      if growth > 0:
-        pss_growth.append((r_name, growth))
-
-  pss_growth.sort(key=lambda item: item[1], reverse=True)
-  for name, growth in pss_growth[:10]:
-    print(f'  - {name}: +{growth} kB PSS')
-
-  print('\nTop 10 Memory Increases Over Time (RSS):')
-  rss_growth = []
-  if last_memory_data and first_memory_data:
-    all_keys = set(first_memory_data.keys()) | set(last_memory_data.keys())
-    for r_name in all_keys:
-      initial_rss = first_memory_data.get(r_name, {}).get('rss', 0)
-      final_rss = last_memory_data.get(r_name, {}).get('rss', 0)
-      growth = final_rss - initial_rss
-      if growth > 0:
-        rss_growth.append((r_name, growth))
-
-  rss_growth.sort(key=lambda item: item[1], reverse=True)
-  for name, growth in rss_growth[:10]:
-    print(f'  - {name}: +{growth} kB RSS')
+  # 1. Top 10 Consumers from the final log
+  if last_memory_data:
+    print('\nTop 10 Largest Consumers by PSS:')
+    top_pss = get_top_consumers(last_memory_data, metric='pss', top_n=10)
+    for name, data in top_pss:
+      print(f"  - {name}: {data.get('pss', 0)} kB PSS")
+
+    print('\nTop 10 Largest Consumers by RSS:')
+    top_rss = get_top_consumers(last_memory_data, metric='rss', top_n=10)
+    for name, data in top_rss:
+      print(f"  - {name}: {data.get('rss', 0)} kB RSS")
+
+  # 2. Detailed breakdown from the final log
+  if last_memory_data:
+    print('\n' + '-' * 50)
+    print('Full Memory Breakdown:')
+    for name, data in last_memory_data.items():
+      print(f"  - {name}: {data.get('pss', 0)} kB PSS, "
+            f"{data.get('rss', 0)} kB RSS")
+
+  if last_total_data:
+    print('\n' + '-' * 50)
+    print('Total Memory:')
+    for metric, value in last_total_data.items():
+      print(f'  - {metric.upper()}: {value} kB')
 
 
 def run_smaps_analysis_tool(argv=None):
   """Parses arguments and runs the smaps log analysis."""
   parser = argparse.ArgumentParser(
-      description='Analyze processed smaps logs to identify '
-      'memory consumers and growth.')
+      description='Analyzes processed smaps logs to display the memory '
+      'breakdown of the final log file.')
   parser.add_argument(
       'log_dir',
       type=str,
@@ -246,7 +205,8 @@ def run_smaps_analysis_tool(argv=None):
   parser.add_argument(
       '--json_output',
       type=str,
-      help='Optional: Path to a file where JSON output will be saved.')
+      help='Optional: Path to a file where JSON output will be saved for '
+      'use with other tools like visualize_smaps_analysis.py.')
   args = parser.parse_args(argv)
   analyze_logs(args.log_dir, args.json_output)