Description:
I have a process that consumes a large amount of memory (+100GB RSS). There's a usage by huge pages, but most of the memory is allocated in a naive way via malloc().
When this process crashes, its parent process restarts it immediately upon receiving SIGCHLD. However, we've observed a significant delay between the crash and the actual execution of the new process instance.
For example:
- A process with 90GB RSS takes ~6 seconds to restart.
- A process with 130GB RSS takes ~9 seconds to restart.
This delay is consistent and scales with the memory size of the process.
Questions:
What factors contribute to this delay? Is it related to memory deallocation, kernel cleanup, or something else? Are there ways to reduce this delay and make the restart faster?
OS: Ubuntu 14 Kernel Version: 6.1.21