0

I'm encountering an issue with the Boost.Asio library. My program uses a single io_context instance and multiple worker threads that call io_context::run(). Tasks are posted using spawn or post on the global io_context or a derived strand, and are processed by the worker threads.

After running for several days, the system occasionally stops accepting new tasks. At that point, all worker threads appear to be idle, waiting for new work, but no new tasks are being executed or accepted. The following call stack describe their state :

    + 7794 thread_start  (in libsystem_pthread.dylib) + 8  [0xptr]
    +   7794 _pthread_start  (in libsystem_pthread.dylib) + 136  [0xptr]
    +     7794 boost::(anonymous namespace)::thread_proxy(void*) (in myprog1.dsym) + 188 + 3450312  [0xptr]
    +       7794 boost::detail::thread_data<Service::ServiceImpl::Run(std::__1::function<void ()> const&)::$_1>::run() (in myprog1.dsym) + 176 + 10518812  [0xptr]
    +         7794 boost::asio::io_context::run() (in myprog1.dsym) + 32 + 10147768  [0xptr]
    +           7794 boost::asio::detail::scheduler::run(boost::system::error_code&) (in myprog1.dsym) + 264 + 7744452  [0xptr]
    +             7794 boost::asio::detail::scheduler::do_run_one(boost::asio::detail::conditionally_enabled_mutex::scoped_lock&, boost::asio::detail::scheduler_thread_info&, boost::system::error_code const&) (in myprog1.dsym) + 304 + 7745104  [0xptr]
    +               7794 _pthread_cond_wait  (in libsystem_pthread.dylib) + 984  [0xptr]
    +                 7794 __psynch_cvwait  (in libsystem_kernel.dylib) + 8  [0xptr]

Needless to say, tasks continue to be posted to the io_context.

I would appreciate any theory that might explain this anomaly or guidance on how to properly debug it. For example, whether there is a way to inspect or dump the internal state of the io_context and the when this scenario occurs

4
  • Did it run out work, and did you forget to restart() it? live.boost.org/doc/libs/1_88_0/doc/html/boost_asio/reference/…. If not (and the stacktrace, if relevant, suggests so) then you are running blocking operations in completion handlers which causes the queue to lock up Commented Jul 9 at 14:41
  • For tracking: live.boost.org/doc/libs/1_88_0/doc/html/boost_asio/overview/… Commented Jul 9 at 14:42
  • @sehe, I've got asnyAccept task that always wait for the new connections so I guarantee io_context may never run out of work and the stack trace prove it. any idea how to detect/avoid/resolve the situation of blocking operations in completion handlers which causes the queue to lock up ? Commented Jul 10 at 11:53
  • Yeah. Use a debugger. When it "locks up", dump all the stacks. It should tell you what everything is blocked on. In the "impossible" event that worker threads are just waiting idle check for UB (is the concurrency hint and threading configuration valid?) and perhaps add a "sentinel" timer job to detect whether IT does proceed. Because if the timer proceeds independently, you know something is blocking your acceptor (e.g. when it erroneously shares a strand executor with some other task that IS blocking) Commented Jul 10 at 12:38

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.