-
Notifications
You must be signed in to change notification settings - Fork 1
FaultManager configuration for snapshots (freeze frame and rosbag) #26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Enable automatic rosbag capture when faults are confirmed:
- Configure fault_manager with rosbag recording enabled
- Record 10 seconds before + 2 seconds after fault confirmation
- Use MCAP format for cross-platform compatibility
- Store recordings in persistent Docker volume
Configuration:
- medkit_params.yaml: Add fault_manager parameters with snapshot
and rosbag configuration
- docker-compose.yml: Add medkit_data volume for persistent storage
- launch/demo.launch.py: Pass medkit_params to fault_manager node
- Dockerfile: Create /var/lib/ros2_medkit/rosbags directory
Recorded topics include odometry, pose, scan, velocity commands,
transforms, navigation status/feedback, costmaps, and diagnostics.
Access rosbags via REST API:
GET /faults/{code}/bulk-data/rosbags
GET /faults/{code}/snapshots
Add fault_manager configuration to sensor_diagnostics demo with: - SQLite storage for fault persistence - Freeze-frame snapshot capture on fault confirmation - MCAP rosbag recording with ring buffer (10s pre + 2s post fault) - Sensor topics included: /sensors/scan, imu, fix, image_raw Infrastructure changes: - Pass medkit_params to fault_manager node in launch file - Create /var/lib/ros2_medkit/rosbags dir in Dockerfile - Add sqlite3 package to Dockerfile - Add persistent medkit_data volume to docker-compose - Add bulk-data/snapshot demo steps to check-demo.sh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds FaultManager configuration and Docker persistence to enable automatic snapshot + rosbag (MCAP) capture when faults are confirmed in the TurtleBot3 Integration and Sensor Diagnostics demos.
Changes:
- Pass
medkit_params.yamlintoros2_medkit_fault_managerlaunch nodes so FaultManager can be configured via YAML. - Add FaultManager storage + snapshot/rosbag settings to demo
medkit_params.yamlfiles (SQLite DB + MCAP rosbags). - Persist
/var/lib/ros2_medkitvia named Docker volumes and document the new rosbag/snapshot API usage.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| demos/turtlebot3_integration/launch/demo.launch.py | Loads medkit_params.yaml for FaultManager so it can use the new snapshot/rosbag configuration. |
| demos/turtlebot3_integration/docker-compose.yml | Adds a named volume to persist FaultManager DB and rosbags across runs. |
| demos/turtlebot3_integration/config/medkit_params.yaml | Adds FaultManager SQLite + snapshot/rosbag configuration (MCAP). |
| demos/turtlebot3_integration/README.md | Documents rosbag snapshot capture behavior and related API usage + volume persistence note. |
| demos/turtlebot3_integration/Dockerfile | Creates storage directory (but see PR comment about volume masking). |
| demos/sensor_diagnostics/launch/demo.launch.py | Loads medkit_params.yaml for FaultManager so it can use the new snapshot/rosbag configuration. |
| demos/sensor_diagnostics/docker-compose.yml | Adds a named volume to persist FaultManager DB and rosbags across runs. |
| demos/sensor_diagnostics/config/medkit_params.yaml | Adds FaultManager SQLite + snapshot/rosbag configuration (MCAP). |
| demos/sensor_diagnostics/check-demo.sh | Extends demo script to call snapshot/bulk-data endpoints when faults exist. |
| demos/sensor_diagnostics/Dockerfile | Installs sqlite3 and creates storage directory (but see PR comment about volume masking). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Create storage directories for faults and rosbags | ||
| RUN mkdir -p /var/lib/ros2_medkit/rosbags |
Copilot
AI
Feb 6, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The image creates /var/lib/ros2_medkit/rosbags at build time, but docker-compose.yml mounts a named volume on /var/lib/ros2_medkit, which will hide this pre-created directory on first run. If fault_manager doesn’t create the subdirectory itself, rosbag recording may fail. Consider creating the directory at container start (e.g., in the compose command/entrypoint) or mounting a subpath volume specifically for rosbags.
demos/sensor_diagnostics/Dockerfile
Outdated
| # Create storage directories for faults and rosbags | ||
| RUN mkdir -p /var/lib/ros2_medkit/rosbags |
Copilot
AI
Feb 6, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issue as the TurtleBot3 demo: this directory is created during the image build, but docker-compose.yml mounts a named volume at /var/lib/ros2_medkit, so the directory won’t exist inside the mounted volume on first run. Ensure /var/lib/ros2_medkit/rosbags is created at container startup (or mount a subpath volume) so rosbag capture doesn’t fail.
| # Get fault details with environment data (includes snapshots) | ||
| curl http://localhost:8080/api/v1/faults/NAVIGATION_GOAL_ABORTED | jq | ||
|
|
||
| # Clear a specific fault | ||
| curl -X DELETE http://localhost:8080/api/v1/apps/diagnostic-bridge/faults/TURTLEBOT3_NODE | ||
| ``` |
Copilot
AI
Feb 6, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The README mixes identifiers in the fault endpoints: here it uses a fault code (NAVIGATION_GOAL_ABORTED) as the path segment, but later in the same doc it refers to {fault_id} for DELETE. Please clarify whether these endpoints take code or id, and make the examples consistent (use placeholders if needed).
| curl http://localhost:8080/api/v1/faults/NAVIGATION_GOAL_ABORTED/bulk-data | jq | ||
|
|
||
| # List rosbag files available for download | ||
| curl http://localhost:8080/api/v1/faults/NAVIGATION_GOAL_ABORTED/bulk-data/rosbags | jq | ||
|
|
||
| # Download a rosbag file (returns MCAP format) | ||
| curl -O http://localhost:8080/api/v1/faults/NAVIGATION_GOAL_ABORTED/bulk-data/rosbags/{bulk_data_id} | ||
|
|
||
| # Get fault snapshots (freeze frames) | ||
| curl http://localhost:8080/api/v1/faults/NAVIGATION_GOAL_ABORTED/snapshots | jq |
Copilot
AI
Feb 6, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These bulk-data/snapshots examples also use NAVIGATION_GOAL_ABORTED in the URL path; if the API actually requires a fault ID (as implied elsewhere in the README), these requests will 404. Align these paths with the correct identifier and keep the examples consistent across the Fault Management and Bulk Data sections.
| curl http://localhost:8080/api/v1/faults/NAVIGATION_GOAL_ABORTED/bulk-data | jq | |
| # List rosbag files available for download | |
| curl http://localhost:8080/api/v1/faults/NAVIGATION_GOAL_ABORTED/bulk-data/rosbags | jq | |
| # Download a rosbag file (returns MCAP format) | |
| curl -O http://localhost:8080/api/v1/faults/NAVIGATION_GOAL_ABORTED/bulk-data/rosbags/{bulk_data_id} | |
| # Get fault snapshots (freeze frames) | |
| curl http://localhost:8080/api/v1/faults/NAVIGATION_GOAL_ABORTED/snapshots | jq | |
| curl http://localhost:8080/api/v1/faults/{fault_id}/bulk-data | jq | |
| # List rosbag files available for download | |
| curl http://localhost:8080/api/v1/faults/{fault_id}/bulk-data/rosbags | jq | |
| # Download a rosbag file (returns MCAP format) | |
| curl -O http://localhost:8080/api/v1/faults/{fault_id}/bulk-data/rosbags/{bulk_data_id} | |
| # Get fault snapshots (freeze frames) | |
| curl http://localhost:8080/api/v1/faults/{fault_id}/snapshots | jq |
| curl -s "${API_BASE}/faults" | jq '.' | ||
|
|
||
| # If there are faults, demonstrate snapshot / bulk-data endpoints | ||
| FAULT_COUNT=$(curl -s "${API_BASE}/faults" | jq '.items | length') | ||
| if [ "$FAULT_COUNT" -gt 0 ]; then | ||
| FIRST_FAULT=$(curl -s "${API_BASE}/faults" | jq -r '.items[0].code') | ||
| FIRST_ENTITY=$(curl -s "${API_BASE}/faults" | jq -r '.items[0].entity_id') | ||
|
|
||
| echo_step "10. Fault Detail with Environment Data (Snapshots)" | ||
| echo "Fetching fault ${FIRST_FAULT} on entity ${FIRST_ENTITY}..." | ||
| curl -s "${API_BASE}/${FIRST_ENTITY}/faults/${FIRST_FAULT}" | jq '{ | ||
| code: .item.code, | ||
| status: .item.status, | ||
| environment_data: { | ||
| extended_data_records: .environment_data.extended_data_records, | ||
| snapshot_count: (.environment_data.snapshots | length) | ||
| } | ||
| }' | ||
|
|
||
| echo_step "11. Bulk-Data Categories (Rosbag Recordings)" | ||
| echo "Checking available bulk-data categories..." | ||
| curl -s "${API_BASE}/${FIRST_ENTITY}/bulk-data" | jq '.' | ||
|
|
||
| echo_step "12. Bulk-Data Descriptors (Rosbag Files)" | ||
| echo "Listing available rosbag recordings..." | ||
| curl -s "${API_BASE}/${FIRST_ENTITY}/bulk-data/rosbags" | jq '.items[] | { | ||
| id: .id, | ||
| name: .name, | ||
| size: .size, | ||
| mimetype: .mimetype, | ||
| "x-medkit": ."x-medkit" | ||
| }' |
Copilot
AI
Feb 6, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This script assumes /api/v1/faults items contain an entity_id field; if it’s absent (or null), FIRST_ENTITY becomes null and the subsequent calls will hit invalid URLs. It would be more robust to parse the /faults response once, validate the needed fields are present, and fall back (or skip the snapshot/bulk-data demo) when they aren’t.
| curl -s "${API_BASE}/faults" | jq '.' | |
| # If there are faults, demonstrate snapshot / bulk-data endpoints | |
| FAULT_COUNT=$(curl -s "${API_BASE}/faults" | jq '.items | length') | |
| if [ "$FAULT_COUNT" -gt 0 ]; then | |
| FIRST_FAULT=$(curl -s "${API_BASE}/faults" | jq -r '.items[0].code') | |
| FIRST_ENTITY=$(curl -s "${API_BASE}/faults" | jq -r '.items[0].entity_id') | |
| echo_step "10. Fault Detail with Environment Data (Snapshots)" | |
| echo "Fetching fault ${FIRST_FAULT} on entity ${FIRST_ENTITY}..." | |
| curl -s "${API_BASE}/${FIRST_ENTITY}/faults/${FIRST_FAULT}" | jq '{ | |
| code: .item.code, | |
| status: .item.status, | |
| environment_data: { | |
| extended_data_records: .environment_data.extended_data_records, | |
| snapshot_count: (.environment_data.snapshots | length) | |
| } | |
| }' | |
| echo_step "11. Bulk-Data Categories (Rosbag Recordings)" | |
| echo "Checking available bulk-data categories..." | |
| curl -s "${API_BASE}/${FIRST_ENTITY}/bulk-data" | jq '.' | |
| echo_step "12. Bulk-Data Descriptors (Rosbag Files)" | |
| echo "Listing available rosbag recordings..." | |
| curl -s "${API_BASE}/${FIRST_ENTITY}/bulk-data/rosbags" | jq '.items[] | { | |
| id: .id, | |
| name: .name, | |
| size: .size, | |
| mimetype: .mimetype, | |
| "x-medkit": ."x-medkit" | |
| }' | |
| FAULTS_JSON=$(curl -s "${API_BASE}/faults") | |
| echo "$FAULTS_JSON" | jq '.' | |
| # If there are faults, demonstrate snapshot / bulk-data endpoints | |
| FAULT_COUNT=$(echo "$FAULTS_JSON" | jq '.items | length') | |
| if [ "$FAULT_COUNT" -gt 0 ]; then | |
| # Find the first fault that has both a non-null entity_id and code | |
| FIRST_FAULT_ENTRY=$(echo "$FAULTS_JSON" | jq -r '.items[] | select(.entity_id != null and .code != null) | "\(.entity_id) \(.code)"' | head -n 1) | |
| if [ -z "$FIRST_FAULT_ENTRY" ]; then | |
| echo "" | |
| echo " Faults exist but none provide both 'entity_id' and 'code'." | |
| echo " Skipping snapshot and bulk-data demonstration." | |
| else | |
| FIRST_ENTITY=$(echo "$FIRST_FAULT_ENTRY" | awk '{print $1}') | |
| FIRST_FAULT=$(echo "$FIRST_FAULT_ENTRY" | awk '{print $2}') | |
| echo_step "10. Fault Detail with Environment Data (Snapshots)" | |
| echo "Fetching fault ${FIRST_FAULT} on entity ${FIRST_ENTITY}..." | |
| curl -s "${API_BASE}/${FIRST_ENTITY}/faults/${FIRST_FAULT}" | jq '{ | |
| code: .item.code, | |
| status: .item.status, | |
| environment_data: { | |
| extended_data_records: .environment_data.extended_data_records, | |
| snapshot_count: (.environment_data.snapshots | length) | |
| } | |
| }' | |
| echo_step "11. Bulk-Data Categories (Rosbag Recordings)" | |
| echo "Checking available bulk-data categories..." | |
| curl -s "${API_BASE}/${FIRST_ENTITY}/bulk-data" | jq '.' | |
| echo_step "12. Bulk-Data Descriptors (Rosbag Files)" | |
| echo "Listing available rosbag recordings..." | |
| curl -s "${API_BASE}/${FIRST_ENTITY}/bulk-data/rosbags" | jq '.items[] | { | |
| id: .id, | |
| name: .name, | |
| size: .size, | |
| mimetype: .mimetype, | |
| "x-medkit": ."x-medkit" | |
| }' | |
| fi |
- Move mkdir from Dockerfile build to runtime (volume mount masks build-time dirs) - Remove --recurse-submodules from git clone (broken submodule ref) - Fix check-demo.sh: null-safe fault parsing, use entity-scoped API paths - Fix turtlebot3 README: use correct entity-scoped bulk-data endpoints - Add mkdir to all docker-compose commands for both demos
The mv command was missing destination directory '.' and '&&' separators between individual moves, causing 'mv: invalid option -- r' error because 'rm -rf ros2_medkit' was interpreted as mv arguments.
Description
Adds FaultManager configuration and Docker persistence to enable automatic snapshot + rosbag (MCAP) capture when faults are confirmed in the TurtleBot3 Integration and Sensor Diagnostics demos.
Changes:
Related Issue
closes #25
Checklist