I want to set up a watchdog that checks whether the io_context workers can pick up tasks within a reasonable time and are not stuck running long or blocking operations.
To achieve this, I've implemented a check that verifies the io_context queue is functioning properly by scheduling a task every 30 seconds. This task simply sets a flag to true so we can confirm that the queue is still responsive.
using PeriodicTask = BasicScheduledTask<boost::asio::steady_timer, true>;
std::shared_ptr<PeriodicTask> io_context_alive_task_ = std::make_shared<PeriodicTask>
(io_context_,
[this](const auto& ec) {
if (ec) {
print_error("Could not report io_context as alive: {}", ec.message());
return;
}
print_debug("Marking io_context as alive");
is_context_alive_ = true;
}, 30s));
The watchdog runs in its own independent thread outside io_context and checks every 2 minutes whether the flag has been set to true.
std::unique_ptr<std::thread> context_watchdog_ = std::make_unique<std::thread>([this] {
while (!io_context_.stopped()) {
io_context_alive_cv_.wait_for(lock, 2min, [this] { return io_context_.stopped(); });
if (!is_context_alive_) {
print_critical("io_context is not responding");
std::abort();
}
print_debug("io_context is ok, setting back to false");
is_context_alive_ = false;
}
print_debug("io_context stopped. stopping thread");
});
I've notice that I get some false alarms when the system wakes from sleep.
This happens since the 30 second periodic task that marks the context as alive does not run for more than 2 minutes. As a result, the watchdog assumes the io_context is unresponsive and attempts to abort the service.
I wonder if the io_context_alive_cv_ which is from type std::condition_variable is ticking during sleep mode while the boost based time of the keep alive task is idle in this time. If so, perhaps you can suggest me a way to resolve it ?
Thanks
#define ENABLE_MONITOR_EXECUTION_CONTEXT_PROGRESS 1) and your "best effort" diagnostics are fine! If they put their system into sleep during testing, they will figure it out?