youtube
diff --git a/‎cobalt/tools/performance/smaps/README.md‎
Lines changed: 135 additions & 4 deletions b/‎cobalt/tools/performance/smaps/README.md‎
Lines changed: 135 additions & 4 deletions
diff --git a/‎cobalt/tools/performance/smaps/analyze_smaps_logs.py‎
Lines changed: 69 additions & 33 deletions b/‎cobalt/tools/performance/smaps/analyze_smaps_logs.py‎
Lines changed: 69 additions & 33 deletions
diff --git a/‎cobalt/tools/performance/smaps/analyze_smaps_logs_test.py‎
Lines changed: 3 additions & 6 deletions b/‎cobalt/tools/performance/smaps/analyze_smaps_logs_test.py‎
Lines changed: 3 additions & 6 deletions
@@ -131,22 +131,150 @@ python3 read_smaps_batch.py <SMAPS_FILES...> [OPTIONS]
 After processing a batch of smaps files, you can use this script to analyze the entire run. It reads a directory of processed smaps files, tracks memory usage over time, and generates a summary report.
 
 The report includes:
-*   The top 5 largest memory consumers by PSS and RSS at the end of the run.
-*   The top 5 memory regions that have grown the most in PSS and RSS over the duration of the run.
+*   The top 10 largest memory consumers by PSS and RSS at the end of the run.
+*   The top 10 memory regions that have grown the most in PSS and RSS over the duration of the run.
 *   The overall change in total PSS and RSS.
 
+The script can also output a structured JSON file containing the time-series data for further analysis or visualization.
+
 #### Usage
 
 ```bash
-python3 analyze_smaps_logs.py <PROCESSED_LOG_DIR>
+python3 analyze_smaps_logs.py <PROCESSED_LOG_DIR> [OPTIONS]
 ```
 
+#### Command-line Arguments
+
+*   `<PROCESSED_LOG_DIR>` (required positional argument)
+    The path to the directory containing the processed smaps log files.
+*   `--json_output` (type: `str`)
+    Optional: The path to a file where the JSON analysis output will be saved.
+
+#### Examples
+
+1.  **Print a text-based analysis to the console:**
+    ```bash
+    python3 analyze_smaps_logs.py processed_logs
+    ```
+
+2.  **Generate a JSON output file for visualization:**
+    ```bash
+    python3 analyze_smaps_logs.py processed_logs --json_output analysis_output.json
+    ```
+
+### `visualize_smaps_analysis.py`
+
+This script takes the JSON output from `analyze_smaps_logs.py` and generates a dashboard-style PNG image with three plots:
+1.  Total PSS and RSS memory usage over time.
+2.  A stacked area chart of the top 10 memory consumers.
+3.  A line chart of the top 10 memory growers.
+
+This provides a quick and intuitive way to visualize memory behavior and identify potential leaks.
+
+#### Prerequisites
+
+This script requires the `pandas` and `matplotlib` libraries. You can install them using pip:
+```bash
+pip install pandas matplotlib
+```
+
+#### Usage
+
+```bash
+python3 visualize_smaps_analysis.py <JSON_FILE> [OPTIONS]
+```
+
+#### Command-line Arguments
+
+*   `<JSON_FILE>` (required positional argument)
+    The path to the input JSON file generated by `analyze_smaps_logs.py`.
+*   `--output_image` (type: `str`, default: `smaps_analysis.png`)
+    The path where the output PNG image will be saved.
+
 #### Example
 
 ```bash
-python3 analyze_smaps_logs.py processed_logs
+python3 visualize_smaps_analysis.py analysis_output.json --output_image my_analysis.png
 ```
 
+## Extending the Toolchain with New Fields
+
+The toolchain is designed to be extensible, allowing you to add new fields from the raw smaps files to the analysis and visualization. The key is to follow the data pipeline from the processor (`read_smaps.py`) to the analyzer (`analyze_smaps_logs.py`) and visualizer (`visualize_smaps_analysis.py`).
+
+Here is a step-by-step guide using the `Locked` field as an example.
+
+### Step 1: Update the Processor (`read_smaps.py`)
+
+This is the most critical step to get the new data into the processed files.
+
+1.  **Add the new field to the `fields` tuple:**
+    In `cobalt/tools/performance/smaps/read_smaps.py`, add your new field (in lowercase) to the `fields` string.
+
+    ```python
+    # --- BEFORE ---
+    # fields = ('size rss pss ... swap swap_pss').split()
+
+    # --- AFTER ---
+    fields = ('size rss pss shr_clean shr_dirty priv_clean priv_dirty '
+              'referenced anonymous anonhuge swap swap_pss locked').split()
+    MemDetail = namedtuple('name', fields)
+    ```
+
+2.  **Update the `MemDetail` creation:**
+    The `parse_smaps_entry` function automatically parses all fields into a dictionary. You just need to use the new field when creating the `MemDetail` object.
+
+    ```python
+    # --- BEFORE ---
+    # d = MemDetail(..., data['swap'], data['swappss'])
+
+    # --- AFTER ---
+    d = MemDetail(
+        data['size'], data['rss'], data['pss'], data['sharedclean'],
+        data['shareddirty'], data['privateclean'], data['privatedirty'],
+        data['referenced'], data['anonymous'], data['anonhugepages'],
+        data['swap'], data['swappss'], data['locked'])
+    ```
+
+After these two changes, re-running `read_smaps_batch.py` will produce processed files that include a `locked` column.
+
+### Step 2: (Optional) Use the Field in the Analyzer (`analyze_smaps_logs.py`)
+
+The analyzer will now have access to the `locked` data. To display it, you can add it to the text report. For example, to show the total change in locked memory:
+
+```python
+# In analyze_logs in analyze_smaps_logs.py
+
+# Add this block to the "Overall Total Memory Change" section
+if 'locked' in total_history and len(total_history['locked']) > 1:
+  total_locked_change = total_history['locked'][-1] - total_history['locked'][0]
+  print(f'  Total Locked Change: {total_locked_change} kB')
+```
+
+### Step 3: (Optional) Use the Field in the Visualizer (`visualize_smaps_analysis.py`)
+
+To add the new field to the graph, you need to:
+
+1.  **Add the field to the JSON output in `analyze_smaps_logs.py`:**
+    ```python
+    # In the 'regions' list comprehension inside analyze_logs:
+    'regions': [{
+        'name': name,
+        'pss': data.get('pss', 0),
+        'rss': data.get('rss', 0),
+        'locked': data.get('locked', 0)  # Add this line
+    } for name, data in memory_data.items()]
+    ```
+
+2.  **Add a new plot in `visualize_smaps_analysis.py`:**
+    For example, to add a line for "Total Locked" memory to the first chart:
+    ```python
+    # In create_visualization in visualize_smaps_analysis.py:
+    total_locked = [d['total_memory'].get('locked', 0) for d in data]
+    ax1.plot(timestamps, total_locked, label='Total Locked', color='green')
+    ```
+
+By following this pattern, you can incorporate any field from the raw smaps files into the entire toolchain.
+
 ## Testing
 
 Unit tests are provided to ensure the functionality of the scripts. To run the tests, navigate to the project root directory and execute the following commands. Note that `__init__.py` handles Python path setup, so tests should always be run from the project root.
@@ -160,4 +288,7 @@ python3 -m unittest cobalt/tools/performance/smaps/read_smaps_test.py
 
 # For analyze_smaps_logs.py
 python3 -m unittest cobalt/tools/performance/smaps/analyze_smaps_logs_test.py
+
+# For visualize_smaps_analysis.py
+python3 -m unittest cobalt/tools/performance/smaps/visualize_smaps_analysis_test.py
 ```
@@ -17,6 +17,7 @@
 
 import argparse
 from collections import defaultdict, OrderedDict
+import json
 import os
 import re
 import sys
@@ -77,7 +78,20 @@ def parse_smaps_file(filepath):
             # This will skip non-integer lines, like the repeated header
             continue
 
-  return memory_data, total_data
+  # Second pass for aggregation
+  aggregated_data = OrderedDict()
+  shared_mem_total = defaultdict(int)
+  for name, data in memory_data.items():
+    if name.startswith('mem/shared_memory'):
+      for field, value in data.items():
+        shared_mem_total[field] += value
+    else:
+      aggregated_data[name] = data
+
+  if shared_mem_total:
+    aggregated_data['[mem/shared_memory]'] = dict(shared_mem_total)
+
+  return aggregated_data, total_data
 
 
 def extract_timestamp(filename):
@@ -104,7 +118,7 @@ def get_top_consumers(memory_data, metric='pss', top_n=5):
   return sorted_consumers[:top_n]
 
 
-def analyze_logs(log_dir):
+def analyze_logs(log_dir, json_output_filepath=None):
   """Analyzes a directory of processed smaps logs."""
   all_files = [
       os.path.join(log_dir, f)
@@ -119,13 +133,16 @@ def analyze_logs(log_dir):
 
   print(f'Analyzing {len(all_files)} processed smaps files...')
 
+  # List to store structured data for JSON output
+  analysis_data = []
+
   # Store data over time for each memory region
-  region_history = defaultdict(lambda: defaultdict(list))
   total_history = defaultdict(list)
 
   first_timestamp = None
   last_timestamp = None
   last_memory_data = None
+  first_memory_data = None
 
   for filepath in all_files:
     filename = os.path.basename(filepath)
@@ -140,11 +157,22 @@ def analyze_logs(log_dir):
       print(f'Warning: {e}')
       continue
 
+    if first_memory_data is None:
+      first_memory_data = memory_data
     last_memory_data = memory_data  # Keep track of the last data
 
-    for region_name, data in memory_data.items():
-      for metric, value in data.items():
-        region_history[region_name][metric].append(value)
+    current_snapshot = {
+        'timestamp':
+            timestamp,
+        'total_memory':
+            total_data,
+        'regions': [{
+            'name': name,
+            'pss': data.get('pss', 0),
+            'rss': data.get('rss', 0)
+        } for name, data in memory_data.items()]
+    }
+    analysis_data.append(current_snapshot)
 
     for metric, value in total_data.items():
       total_history[metric].append(value)
@@ -153,53 +181,57 @@ def analyze_logs(log_dir):
   print(f'Analysis from {first_timestamp} to {last_timestamp}')
   print('=' * 50)
 
+  # Output JSON data if requested
+  if json_output_filepath:
+    with open(json_output_filepath, 'w', encoding='utf-8') as f:
+      json.dump(analysis_data, f, indent=2)
+    print(f'JSON analysis saved to {json_output_filepath}')
+
   # 1. Largest Consumers by the end log
-  print('\nTop 5 Largest Consumers by the End Log (PSS):')
-  top_pss = get_top_consumers(last_memory_data, metric='pss', top_n=5)
+  print('\nOverall Total Memory Change:')
+  print('\nTop 10 Largest Consumers by the End Log (PSS):')
+  top_pss = get_top_consumers(last_memory_data, metric='pss', top_n=10)
   for name, data in top_pss:
     print(f"  - {name}: {data.get('pss', 0)} kB PSS, "
           f"{data.get('rss', 0)} kB RSS")
 
-  print('\nTop 5 Largest Consumers by the End Log (RSS):')
-  top_rss = get_top_consumers(last_memory_data, metric='rss', top_n=5)
+  print('\nTop 10 Largest Consumers by the End Log (RSS):')
+  top_rss = get_top_consumers(last_memory_data, metric='rss', top_n=10)
   for name, data in top_rss:
     print(f"  - {name}: {data.get('rss', 0)} kB RSS, "
           f"{data.get('pss', 0)} kB PSS")
 
-  # 2. Top 5 Increases in Memory Over Time
-  print('\nTop 5 Memory Increases Over Time (PSS):')
+  # 2. Top 10 Increases in Memory Over Time
+  print('\nTop 10 Memory Increases Over Time (PSS):')
   pss_growth = []
-  for region_name, history in region_history.items():
-    if 'pss' in history and len(history['pss']) > 1:
-      growth = history['pss'][-1] - history['pss'][0]
+  if last_memory_data and first_memory_data:
+    all_keys = set(first_memory_data.keys()) | set(last_memory_data.keys())
+    for r_name in all_keys:
+      initial_pss = first_memory_data.get(r_name, {}).get('pss', 0)
+      final_pss = last_memory_data.get(r_name, {}).get('pss', 0)
+      growth = final_pss - initial_pss
       if growth > 0:
-        pss_growth.append((region_name, growth))
+        pss_growth.append((r_name, growth))
 
   pss_growth.sort(key=lambda item: item[1], reverse=True)
-  for name, growth in pss_growth[:5]:
+  for name, growth in pss_growth[:10]:
     print(f'  - {name}: +{growth} kB PSS')
 
-  print('\nTop 5 Memory Increases Over Time (RSS):')
+  print('\nTop 10 Memory Increases Over Time (RSS):')
   rss_growth = []
-  for region_name, history in region_history.items():
-    if 'rss' in history and len(history['rss']) > 1:
-      growth = history['rss'][-1] - history['rss'][0]
+  if last_memory_data and first_memory_data:
+    all_keys = set(first_memory_data.keys()) | set(last_memory_data.keys())
+    for r_name in all_keys:
+      initial_rss = first_memory_data.get(r_name, {}).get('rss', 0)
+      final_rss = last_memory_data.get(r_name, {}).get('rss', 0)
+      growth = final_rss - initial_rss
       if growth > 0:
-        rss_growth.append((region_name, growth))
+        rss_growth.append((r_name, growth))
 
   rss_growth.sort(key=lambda item: item[1], reverse=True)
-  for name, growth in rss_growth[:5]:
+  for name, growth in rss_growth[:10]:
     print(f'  - {name}: +{growth} kB RSS')
 
-  # Overall Total Memory Change
-  print('\nOverall Total Memory Change:')
-  if 'pss' in total_history and len(total_history['pss']) > 1:
-    total_pss_change = total_history['pss'][-1] - total_history['pss'][0]
-    print(f'  Total PSS Change: {total_pss_change} kB')
-  if 'rss' in total_history and len(total_history['rss']) > 1:
-    total_rss_change = total_history['rss'][-1] - total_history['rss'][0]
-    print(f'  Total RSS Change: {total_rss_change} kB')
-
 
 def run_smaps_analysis_tool(argv=None):
   """Parses arguments and runs the smaps log analysis."""
@@ -211,8 +243,12 @@ def run_smaps_analysis_tool(argv=None):
       type=str,
       help='Path to the directory containing processed smaps log files.')
 
+  parser.add_argument(
+      '--json_output',
+      type=str,
+      help='Optional: Path to a file where JSON output will be saved.')
   args = parser.parse_args(argv)
-  analyze_logs(args.log_dir)
+  analyze_logs(args.log_dir, args.json_output)
 
 
 def main():
 
@@ -101,24 +101,21 @@ def test_parse_invalid_file(self):
   @patch('sys.stdout', new_callable=StringIO)
   def test_analyze_logs_output(self, mock_stdout):
     """Tests the main analysis function and captures its output."""
-    test_argv = [self.test_dir]
-    analyze_smaps_logs.run_smaps_analysis_tool(test_argv)
+    analyze_smaps_logs.analyze_logs(self.test_dir)
     output = mock_stdout.getvalue()
 
     # Check for top consumers in the end log
-    self.assertIn('Top 5 Largest Consumers by the End Log (PSS):', output)
+    self.assertIn('Top 10 Largest Consumers by the End Log (PSS):', output)
     self.assertIn('- <lib_B>: 1500 kB PSS', output)
     self.assertIn('- <lib_A>: 1200 kB PSS', output)
 
     # Check for memory growth
-    self.assertIn('Top 5 Memory Increases Over Time (PSS):', output)
+    self.assertIn('Top 10 Memory Increases Over Time (PSS):', output)
     self.assertIn('- <lib_B>: +1000 kB PSS', output)
     self.assertIn('- <lib_A>: +200 kB PSS', output)
 
     # Check for overall change
     self.assertIn('Overall Total Memory Change:', output)
-    self.assertIn('Total PSS Change: 1200 kB', output)
-    self.assertIn('Total RSS Change: 1300 kB', output)
 
   @patch('sys.stderr', new_callable=StringIO)
   def test_extract_timestamp_warning(self, mock_stderr):