Skip to content

Conversation

@bburda
Copy link
Contributor

@bburda bburda commented Feb 5, 2026

Description

Adds FaultManager configuration and Docker persistence to enable automatic snapshot + rosbag (MCAP) capture when faults are confirmed in the TurtleBot3 Integration and Sensor Diagnostics demos.

Changes:

  • Pass medkit_params.yaml into ros2_medkit_fault_manager launch nodes so FaultManager can be configured via YAML.
  • Add FaultManager storage + snapshot/rosbag settings to demo medkit_params.yaml files (SQLite DB + MCAP rosbags).
  • Persist /var/lib/ros2_medkit via named Docker volumes and document the new rosbag/snapshot API usage.

Related Issue

closes #25

Checklist

  • Tested locally
  • README updated (if needed)

Enable automatic rosbag capture when faults are confirmed:
- Configure fault_manager with rosbag recording enabled
- Record 10 seconds before + 2 seconds after fault confirmation
- Use MCAP format for cross-platform compatibility
- Store recordings in persistent Docker volume

Configuration:
- medkit_params.yaml: Add fault_manager parameters with snapshot
  and rosbag configuration
- docker-compose.yml: Add medkit_data volume for persistent storage
- launch/demo.launch.py: Pass medkit_params to fault_manager node
- Dockerfile: Create /var/lib/ros2_medkit/rosbags directory

Recorded topics include odometry, pose, scan, velocity commands,
transforms, navigation status/feedback, costmaps, and diagnostics.

Access rosbags via REST API:
  GET /faults/{code}/bulk-data/rosbags
  GET /faults/{code}/snapshots
Add fault_manager configuration to sensor_diagnostics demo with:
- SQLite storage for fault persistence
- Freeze-frame snapshot capture on fault confirmation
- MCAP rosbag recording with ring buffer (10s pre + 2s post fault)
- Sensor topics included: /sensors/scan, imu, fix, image_raw

Infrastructure changes:
- Pass medkit_params to fault_manager node in launch file
- Create /var/lib/ros2_medkit/rosbags dir in Dockerfile
- Add sqlite3 package to Dockerfile
- Add persistent medkit_data volume to docker-compose
- Add bulk-data/snapshot demo steps to check-demo.sh
@bburda bburda self-assigned this Feb 5, 2026
@bburda bburda added the enhancement New feature or request label Feb 5, 2026
@bburda bburda marked this pull request as ready for review February 6, 2026 16:10
Copilot AI review requested due to automatic review settings February 6, 2026 16:10
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds FaultManager configuration and Docker persistence to enable automatic snapshot + rosbag (MCAP) capture when faults are confirmed in the TurtleBot3 Integration and Sensor Diagnostics demos.

Changes:

  • Pass medkit_params.yaml into ros2_medkit_fault_manager launch nodes so FaultManager can be configured via YAML.
  • Add FaultManager storage + snapshot/rosbag settings to demo medkit_params.yaml files (SQLite DB + MCAP rosbags).
  • Persist /var/lib/ros2_medkit via named Docker volumes and document the new rosbag/snapshot API usage.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
demos/turtlebot3_integration/launch/demo.launch.py Loads medkit_params.yaml for FaultManager so it can use the new snapshot/rosbag configuration.
demos/turtlebot3_integration/docker-compose.yml Adds a named volume to persist FaultManager DB and rosbags across runs.
demos/turtlebot3_integration/config/medkit_params.yaml Adds FaultManager SQLite + snapshot/rosbag configuration (MCAP).
demos/turtlebot3_integration/README.md Documents rosbag snapshot capture behavior and related API usage + volume persistence note.
demos/turtlebot3_integration/Dockerfile Creates storage directory (but see PR comment about volume masking).
demos/sensor_diagnostics/launch/demo.launch.py Loads medkit_params.yaml for FaultManager so it can use the new snapshot/rosbag configuration.
demos/sensor_diagnostics/docker-compose.yml Adds a named volume to persist FaultManager DB and rosbags across runs.
demos/sensor_diagnostics/config/medkit_params.yaml Adds FaultManager SQLite + snapshot/rosbag configuration (MCAP).
demos/sensor_diagnostics/check-demo.sh Extends demo script to call snapshot/bulk-data endpoints when faults exist.
demos/sensor_diagnostics/Dockerfile Installs sqlite3 and creates storage directory (but see PR comment about volume masking).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 71 to 72
# Create storage directories for faults and rosbags
RUN mkdir -p /var/lib/ros2_medkit/rosbags
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The image creates /var/lib/ros2_medkit/rosbags at build time, but docker-compose.yml mounts a named volume on /var/lib/ros2_medkit, which will hide this pre-created directory on first run. If fault_manager doesn’t create the subdirectory itself, rosbag recording may fail. Consider creating the directory at container start (e.g., in the compose command/entrypoint) or mounting a subpath volume specifically for rosbags.

Copilot uses AI. Check for mistakes.
Comment on lines 52 to 53
# Create storage directories for faults and rosbags
RUN mkdir -p /var/lib/ros2_medkit/rosbags
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as the TurtleBot3 demo: this directory is created during the image build, but docker-compose.yml mounts a named volume at /var/lib/ros2_medkit, so the directory won’t exist inside the mounted volume on first run. Ensure /var/lib/ros2_medkit/rosbags is created at container startup (or mount a subpath volume) so rosbag capture doesn’t fail.

Copilot uses AI. Check for mistakes.
Comment on lines +218 to 223
# Get fault details with environment data (includes snapshots)
curl http://localhost:8080/api/v1/faults/NAVIGATION_GOAL_ABORTED | jq

# Clear a specific fault
curl -X DELETE http://localhost:8080/api/v1/apps/diagnostic-bridge/faults/TURTLEBOT3_NODE
```
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The README mixes identifiers in the fault endpoints: here it uses a fault code (NAVIGATION_GOAL_ABORTED) as the path segment, but later in the same doc it refers to {fault_id} for DELETE. Please clarify whether these endpoints take code or id, and make the examples consistent (use placeholders if needed).

Copilot uses AI. Check for mistakes.
Comment on lines 233 to 242
curl http://localhost:8080/api/v1/faults/NAVIGATION_GOAL_ABORTED/bulk-data | jq

# List rosbag files available for download
curl http://localhost:8080/api/v1/faults/NAVIGATION_GOAL_ABORTED/bulk-data/rosbags | jq

# Download a rosbag file (returns MCAP format)
curl -O http://localhost:8080/api/v1/faults/NAVIGATION_GOAL_ABORTED/bulk-data/rosbags/{bulk_data_id}

# Get fault snapshots (freeze frames)
curl http://localhost:8080/api/v1/faults/NAVIGATION_GOAL_ABORTED/snapshots | jq
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These bulk-data/snapshots examples also use NAVIGATION_GOAL_ABORTED in the URL path; if the API actually requires a fault ID (as implied elsewhere in the README), these requests will 404. Align these paths with the correct identifier and keep the examples consistent across the Fault Management and Bulk Data sections.

Suggested change
curl http://localhost:8080/api/v1/faults/NAVIGATION_GOAL_ABORTED/bulk-data | jq
# List rosbag files available for download
curl http://localhost:8080/api/v1/faults/NAVIGATION_GOAL_ABORTED/bulk-data/rosbags | jq
# Download a rosbag file (returns MCAP format)
curl -O http://localhost:8080/api/v1/faults/NAVIGATION_GOAL_ABORTED/bulk-data/rosbags/{bulk_data_id}
# Get fault snapshots (freeze frames)
curl http://localhost:8080/api/v1/faults/NAVIGATION_GOAL_ABORTED/snapshots | jq
curl http://localhost:8080/api/v1/faults/{fault_id}/bulk-data | jq
# List rosbag files available for download
curl http://localhost:8080/api/v1/faults/{fault_id}/bulk-data/rosbags | jq
# Download a rosbag file (returns MCAP format)
curl -O http://localhost:8080/api/v1/faults/{fault_id}/bulk-data/rosbags/{bulk_data_id}
# Get fault snapshots (freeze frames)
curl http://localhost:8080/api/v1/faults/{fault_id}/snapshots | jq

Copilot uses AI. Check for mistakes.
Comment on lines 91 to 122
curl -s "${API_BASE}/faults" | jq '.'

# If there are faults, demonstrate snapshot / bulk-data endpoints
FAULT_COUNT=$(curl -s "${API_BASE}/faults" | jq '.items | length')
if [ "$FAULT_COUNT" -gt 0 ]; then
FIRST_FAULT=$(curl -s "${API_BASE}/faults" | jq -r '.items[0].code')
FIRST_ENTITY=$(curl -s "${API_BASE}/faults" | jq -r '.items[0].entity_id')

echo_step "10. Fault Detail with Environment Data (Snapshots)"
echo "Fetching fault ${FIRST_FAULT} on entity ${FIRST_ENTITY}..."
curl -s "${API_BASE}/${FIRST_ENTITY}/faults/${FIRST_FAULT}" | jq '{
code: .item.code,
status: .item.status,
environment_data: {
extended_data_records: .environment_data.extended_data_records,
snapshot_count: (.environment_data.snapshots | length)
}
}'

echo_step "11. Bulk-Data Categories (Rosbag Recordings)"
echo "Checking available bulk-data categories..."
curl -s "${API_BASE}/${FIRST_ENTITY}/bulk-data" | jq '.'

echo_step "12. Bulk-Data Descriptors (Rosbag Files)"
echo "Listing available rosbag recordings..."
curl -s "${API_BASE}/${FIRST_ENTITY}/bulk-data/rosbags" | jq '.items[] | {
id: .id,
name: .name,
size: .size,
mimetype: .mimetype,
"x-medkit": ."x-medkit"
}'
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script assumes /api/v1/faults items contain an entity_id field; if it’s absent (or null), FIRST_ENTITY becomes null and the subsequent calls will hit invalid URLs. It would be more robust to parse the /faults response once, validate the needed fields are present, and fall back (or skip the snapshot/bulk-data demo) when they aren’t.

Suggested change
curl -s "${API_BASE}/faults" | jq '.'
# If there are faults, demonstrate snapshot / bulk-data endpoints
FAULT_COUNT=$(curl -s "${API_BASE}/faults" | jq '.items | length')
if [ "$FAULT_COUNT" -gt 0 ]; then
FIRST_FAULT=$(curl -s "${API_BASE}/faults" | jq -r '.items[0].code')
FIRST_ENTITY=$(curl -s "${API_BASE}/faults" | jq -r '.items[0].entity_id')
echo_step "10. Fault Detail with Environment Data (Snapshots)"
echo "Fetching fault ${FIRST_FAULT} on entity ${FIRST_ENTITY}..."
curl -s "${API_BASE}/${FIRST_ENTITY}/faults/${FIRST_FAULT}" | jq '{
code: .item.code,
status: .item.status,
environment_data: {
extended_data_records: .environment_data.extended_data_records,
snapshot_count: (.environment_data.snapshots | length)
}
}'
echo_step "11. Bulk-Data Categories (Rosbag Recordings)"
echo "Checking available bulk-data categories..."
curl -s "${API_BASE}/${FIRST_ENTITY}/bulk-data" | jq '.'
echo_step "12. Bulk-Data Descriptors (Rosbag Files)"
echo "Listing available rosbag recordings..."
curl -s "${API_BASE}/${FIRST_ENTITY}/bulk-data/rosbags" | jq '.items[] | {
id: .id,
name: .name,
size: .size,
mimetype: .mimetype,
"x-medkit": ."x-medkit"
}'
FAULTS_JSON=$(curl -s "${API_BASE}/faults")
echo "$FAULTS_JSON" | jq '.'
# If there are faults, demonstrate snapshot / bulk-data endpoints
FAULT_COUNT=$(echo "$FAULTS_JSON" | jq '.items | length')
if [ "$FAULT_COUNT" -gt 0 ]; then
# Find the first fault that has both a non-null entity_id and code
FIRST_FAULT_ENTRY=$(echo "$FAULTS_JSON" | jq -r '.items[] | select(.entity_id != null and .code != null) | "\(.entity_id) \(.code)"' | head -n 1)
if [ -z "$FIRST_FAULT_ENTRY" ]; then
echo ""
echo " Faults exist but none provide both 'entity_id' and 'code'."
echo " Skipping snapshot and bulk-data demonstration."
else
FIRST_ENTITY=$(echo "$FIRST_FAULT_ENTRY" | awk '{print $1}')
FIRST_FAULT=$(echo "$FIRST_FAULT_ENTRY" | awk '{print $2}')
echo_step "10. Fault Detail with Environment Data (Snapshots)"
echo "Fetching fault ${FIRST_FAULT} on entity ${FIRST_ENTITY}..."
curl -s "${API_BASE}/${FIRST_ENTITY}/faults/${FIRST_FAULT}" | jq '{
code: .item.code,
status: .item.status,
environment_data: {
extended_data_records: .environment_data.extended_data_records,
snapshot_count: (.environment_data.snapshots | length)
}
}'
echo_step "11. Bulk-Data Categories (Rosbag Recordings)"
echo "Checking available bulk-data categories..."
curl -s "${API_BASE}/${FIRST_ENTITY}/bulk-data" | jq '.'
echo_step "12. Bulk-Data Descriptors (Rosbag Files)"
echo "Listing available rosbag recordings..."
curl -s "${API_BASE}/${FIRST_ENTITY}/bulk-data/rosbags" | jq '.items[] | {
id: .id,
name: .name,
size: .size,
mimetype: .mimetype,
"x-medkit": ."x-medkit"
}'
fi

Copilot uses AI. Check for mistakes.
@bburda bburda changed the title Feat/rosbag generation FaultManager configuration for snapshots (freeze frame and rosbag) Feb 6, 2026
- Move mkdir from Dockerfile build to runtime (volume mount masks build-time dirs)
- Remove --recurse-submodules from git clone (broken submodule ref)
- Fix check-demo.sh: null-safe fault parsing, use entity-scoped API paths
- Fix turtlebot3 README: use correct entity-scoped bulk-data endpoints
- Add mkdir to all docker-compose commands for both demos
@bburda bburda requested a review from mfaferek93 February 6, 2026 17:20
The mv command was missing destination directory '.' and '&&' separators
between individual moves, causing 'mv: invalid option -- r' error because
'rm -rf ros2_medkit' was interpreted as mv arguments.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enable snapshot and rosbag recording in demo scenarios

1 participant