Skip to content

Commit dc653bf

Browse files
jkloetzkepoettering
authored andcommitted
service: handle abort stops with dedicated timeout
When shooting down a service with SIGABRT the user might want to have a much longer stop timeout than on regular stops/shutdowns. Especially in the face of short stop timeouts the time might not be sufficient to write huge core dumps before the service is killed. This commit adds a dedicated (Default)TimeoutAbortSec= timer that is used when stopping a service via SIGABRT. In all other cases the existing TimeoutStopSec= is used. The timer value is unset by default to skip the special handling and use TimeoutStopSec= for state 'stop-watchdog' to keep the old behaviour. If the service is in state 'stop-watchdog' and the service should be stopped explicitly we still go to 'stop-sigterm' and re-apply the usual TimeoutStopSec= timeout.
1 parent 1ace223 commit dc653bf

File tree

16 files changed

+190
-9
lines changed

16 files changed

+190
-9
lines changed

docs/TRANSIENT-SETTINGS.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -286,6 +286,7 @@ Most service unit settings are available for transient units.
286286
✓ RestartSec=
287287
✓ TimeoutStartSec=
288288
✓ TimeoutStopSec=
289+
✓ TimeoutAbortSec=
289290
✓ TimeoutSec=
290291
✓ RuntimeMaxSec=
291292
✓ WatchdogSec=

man/systemd-system.conf.xml

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -239,13 +239,15 @@
239239
<varlistentry>
240240
<term><varname>DefaultTimeoutStartSec=</varname></term>
241241
<term><varname>DefaultTimeoutStopSec=</varname></term>
242+
<term><varname>DefaultTimeoutAbortSec=</varname></term>
242243
<term><varname>DefaultRestartSec=</varname></term>
243244

244-
<listitem><para>Configures the default timeouts for starting
245-
and stopping of units, as well as the default time to sleep
245+
<listitem><para>Configures the default timeouts for starting,
246+
stopping and aborting of units, as well as the default time to sleep
246247
between automatic restarts of units, as configured per-unit in
247248
<varname>TimeoutStartSec=</varname>,
248-
<varname>TimeoutStopSec=</varname> and
249+
<varname>TimeoutStopSec=</varname>,
250+
<varname>TimeoutAbortSec=</varname> and
249251
<varname>RestartSec=</varname> (for services, see
250252
<citerefentry><refentrytitle>systemd.service</refentrytitle><manvolnum>5</manvolnum></citerefentry>
251253
for details on the per-unit settings). Disabled by default, when
@@ -255,7 +257,9 @@
255257
<varname>TimeoutSec=</varname>
256258
value. <varname>DefaultTimeoutStartSec=</varname> and
257259
<varname>DefaultTimeoutStopSec=</varname> default to
258-
90s. <varname>DefaultRestartSec=</varname> defaults to
260+
90s. <varname>DefaultTimeoutAbortSec=</varname> is not set by default
261+
so that all units fall back to <varname>TimeoutStopSec=</varname>.
262+
<varname>DefaultRestartSec=</varname> defaults to
259263
100ms.</para></listitem>
260264
</varlistentry>
261265

man/systemd.service.xml

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -573,6 +573,35 @@
573573
</para></listitem>
574574
</varlistentry>
575575

576+
<varlistentry>
577+
<term><varname>TimeoutAbortSec=</varname></term>
578+
<listitem><para>This option configures the time to wait for the service to terminate when it was aborted due to a
579+
watchdog timeout (see <varname>WatchdogSec=</varname>). If the service has a short <varname>TimeoutStopSec=</varname>
580+
this option can be used to give the system more time to write a core dump of the service. Upon expiration the service
581+
will be forcibly terminated by <constant>SIGKILL</constant> (see <varname>KillMode=</varname> in
582+
<citerefentry><refentrytitle>systemd.kill</refentrytitle><manvolnum>5</manvolnum></citerefentry>). The core file will
583+
be truncated in this case. Use <varname>TimeoutAbortSec=</varname> to set a sensible timeout for the core dumping per
584+
service that is large enough to write all expected data while also being short enough to handle the service failure
585+
in due time.
586+
</para>
587+
588+
<para>Takes a unit-less value in seconds, or a time span value such as "5min 20s". Pass an empty value to skip
589+
the dedicated watchdog abort timeout handling and fall back <varname>TimeoutStopSec=</varname>. Pass
590+
<literal>infinity</literal> to disable the timeout logic. Defaults to <varname>DefaultTimeoutAbortSec=</varname> from
591+
the manager configuration file (see
592+
<citerefentry><refentrytitle>systemd-system.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>).
593+
</para>
594+
595+
<para>If a service of <varname>Type=notify</varname> handles <constant>SIGABRT</constant> itself (instead of relying
596+
on the kernel to write a core dump) it can send <literal>EXTEND_TIMEOUT_USEC=…</literal> to
597+
extended the abort time beyond <varname>TimeoutAbortSec=</varname>. The first receipt of this message
598+
must occur before <varname>TimeoutAbortSec=</varname> is exceeded, and once the abort time has exended beyond
599+
<varname>TimeoutAbortSec=</varname>, the service manager will allow the service to continue to abort, provided
600+
the service repeats <literal>EXTEND_TIMEOUT_USEC=…</literal> within the interval specified, or terminates itself
601+
(see <citerefentry><refentrytitle>sd_notify</refentrytitle><manvolnum>3</manvolnum></citerefentry>).
602+
</para></listitem>
603+
</varlistentry>
604+
576605
<varlistentry>
577606
<term><varname>TimeoutSec=</varname></term>
578607
<listitem><para>A shorthand for configuring both

src/core/dbus-manager.c

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -287,6 +287,27 @@ static int property_set_runtime_watchdog(
287287
return watchdog_set_timeout(t);
288288
}
289289

290+
static int property_get_default_timeout_abort_usec(
291+
sd_bus *bus,
292+
const char *path,
293+
const char *interface,
294+
const char *property,
295+
sd_bus_message *reply,
296+
void *userdata,
297+
sd_bus_error *error) {
298+
299+
Manager *m = userdata;
300+
usec_t t;
301+
302+
assert(bus);
303+
assert(reply);
304+
assert(m);
305+
306+
t = manager_default_timeout_abort_usec(m);
307+
308+
return sd_bus_message_append(reply, "t", t);
309+
}
310+
290311
static int bus_get_unit_by_name(Manager *m, sd_bus_message *message, const char *name, Unit **ret_unit, sd_bus_error *error) {
291312
Unit *u;
292313
int r;
@@ -2410,6 +2431,7 @@ const sd_bus_vtable bus_manager_vtable[] = {
24102431
SD_BUS_PROPERTY("DefaultTimerAccuracyUSec", "t", bus_property_get_usec, offsetof(Manager, default_timer_accuracy_usec), SD_BUS_VTABLE_PROPERTY_CONST),
24112432
SD_BUS_PROPERTY("DefaultTimeoutStartUSec", "t", bus_property_get_usec, offsetof(Manager, default_timeout_start_usec), SD_BUS_VTABLE_PROPERTY_CONST),
24122433
SD_BUS_PROPERTY("DefaultTimeoutStopUSec", "t", bus_property_get_usec, offsetof(Manager, default_timeout_stop_usec), SD_BUS_VTABLE_PROPERTY_CONST),
2434+
SD_BUS_PROPERTY("DefaultTimeoutAbortUSec", "t", property_get_default_timeout_abort_usec, 0, 0),
24132435
SD_BUS_PROPERTY("DefaultRestartUSec", "t", bus_property_get_usec, offsetof(Manager, default_restart_usec), SD_BUS_VTABLE_PROPERTY_CONST),
24142436
SD_BUS_PROPERTY("DefaultStartLimitIntervalUSec", "t", bus_property_get_usec, offsetof(Manager, default_start_limit_interval), SD_BUS_VTABLE_PROPERTY_CONST),
24152437
/* The following two items are obsolete alias */

src/core/dbus-service.c

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,27 @@ static BUS_DEFINE_PROPERTY_GET_ENUM(property_get_restart, service_restart, Servi
2929
static BUS_DEFINE_PROPERTY_GET_ENUM(property_get_notify_access, notify_access, NotifyAccess);
3030
static BUS_DEFINE_PROPERTY_GET_ENUM(property_get_emergency_action, emergency_action, EmergencyAction);
3131

32+
static int property_get_timeout_abort_usec(
33+
sd_bus *bus,
34+
const char *path,
35+
const char *interface,
36+
const char *property,
37+
sd_bus_message *reply,
38+
void *userdata,
39+
sd_bus_error *error) {
40+
41+
Service *s = userdata;
42+
usec_t t;
43+
44+
assert(bus);
45+
assert(reply);
46+
assert(s);
47+
48+
t = service_timeout_abort_usec(s);
49+
50+
return sd_bus_message_append(reply, "t", t);
51+
}
52+
3253
static int property_get_exit_status_set(
3354
sd_bus *bus,
3455
const char *path,
@@ -103,6 +124,7 @@ const sd_bus_vtable bus_service_vtable[] = {
103124
SD_BUS_PROPERTY("RestartUSec", "t", bus_property_get_usec, offsetof(Service, restart_usec), SD_BUS_VTABLE_PROPERTY_CONST),
104125
SD_BUS_PROPERTY("TimeoutStartUSec", "t", bus_property_get_usec, offsetof(Service, timeout_start_usec), SD_BUS_VTABLE_PROPERTY_CONST),
105126
SD_BUS_PROPERTY("TimeoutStopUSec", "t", bus_property_get_usec, offsetof(Service, timeout_stop_usec), SD_BUS_VTABLE_PROPERTY_CONST),
127+
SD_BUS_PROPERTY("TimeoutAbortUSec", "t", property_get_timeout_abort_usec, 0, 0),
106128
SD_BUS_PROPERTY("RuntimeMaxUSec", "t", bus_property_get_usec, offsetof(Service, runtime_max_usec), SD_BUS_VTABLE_PROPERTY_CONST),
107129
SD_BUS_PROPERTY("WatchdogUSec", "t", bus_property_get_usec, offsetof(Service, watchdog_usec), SD_BUS_VTABLE_PROPERTY_CONST),
108130
BUS_PROPERTY_DUAL_TIMESTAMP("WatchdogTimestamp", offsetof(Service, watchdog_timestamp), 0),

src/core/load-fragment-gperf.gperf.m4

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -308,6 +308,7 @@ Service.RestartSec, config_parse_sec, 0,
308308
Service.TimeoutSec, config_parse_service_timeout, 0, 0
309309
Service.TimeoutStartSec, config_parse_service_timeout, 0, 0
310310
Service.TimeoutStopSec, config_parse_sec_fix_0, 0, offsetof(Service, timeout_stop_usec)
311+
Service.TimeoutAbortSec, config_parse_service_timeout_abort, 0, 0
311312
Service.RuntimeMaxSec, config_parse_sec, 0, offsetof(Service, runtime_max_usec)
312313
Service.WatchdogSec, config_parse_sec, 0, offsetof(Service, watchdog_usec)
313314
m4_dnl The following five only exist for compatibility, they moved into Unit, see above

src/core/load-fragment.c

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1894,6 +1894,42 @@ int config_parse_service_timeout(
18941894
return 0;
18951895
}
18961896

1897+
int config_parse_service_timeout_abort(
1898+
const char *unit,
1899+
const char *filename,
1900+
unsigned line,
1901+
const char *section,
1902+
unsigned section_line,
1903+
const char *lvalue,
1904+
int ltype,
1905+
const char *rvalue,
1906+
void *data,
1907+
void *userdata) {
1908+
1909+
Service *s = userdata;
1910+
int r;
1911+
1912+
assert(filename);
1913+
assert(lvalue);
1914+
assert(rvalue);
1915+
assert(s);
1916+
1917+
rvalue += strspn(rvalue, WHITESPACE);
1918+
if (isempty(rvalue)) {
1919+
s->timeout_abort_set = false;
1920+
return 0;
1921+
}
1922+
1923+
r = parse_sec(rvalue, &s->timeout_abort_usec);
1924+
if (r < 0) {
1925+
log_syntax(unit, LOG_ERR, filename, line, r, "Failed to parse TimeoutAbortSec= setting, ignoring: %s", rvalue);
1926+
return 0;
1927+
}
1928+
1929+
s->timeout_abort_set = true;
1930+
return 0;
1931+
}
1932+
18971933
int config_parse_sec_fix_0(
18981934
const char *unit,
18991935
const char *filename,

src/core/load-fragment.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ CONFIG_PARSER_PROTOTYPE(config_parse_exec_nice);
2424
CONFIG_PARSER_PROTOTYPE(config_parse_exec_oom_score_adjust);
2525
CONFIG_PARSER_PROTOTYPE(config_parse_exec);
2626
CONFIG_PARSER_PROTOTYPE(config_parse_service_timeout);
27+
CONFIG_PARSER_PROTOTYPE(config_parse_service_timeout_abort);
2728
CONFIG_PARSER_PROTOTYPE(config_parse_service_type);
2829
CONFIG_PARSER_PROTOTYPE(config_parse_service_restart);
2930
CONFIG_PARSER_PROTOTYPE(config_parse_socket_bindtodevice);

src/core/main.c

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,8 @@ static ExecOutput arg_default_std_error = EXEC_OUTPUT_INHERIT;
112112
static usec_t arg_default_restart_usec = DEFAULT_RESTART_USEC;
113113
static usec_t arg_default_timeout_start_usec = DEFAULT_TIMEOUT_USEC;
114114
static usec_t arg_default_timeout_stop_usec = DEFAULT_TIMEOUT_USEC;
115+
static usec_t arg_default_timeout_abort_usec = DEFAULT_TIMEOUT_USEC;
116+
static bool arg_default_timeout_abort_set = false;
115117
static usec_t arg_default_start_limit_interval = DEFAULT_START_LIMIT_INTERVAL;
116118
static unsigned arg_default_start_limit_burst = DEFAULT_START_LIMIT_BURST;
117119
static usec_t arg_runtime_watchdog = 0;
@@ -668,6 +670,40 @@ static int config_parse_crash_chvt(
668670
return 0;
669671
}
670672

673+
static int config_parse_timeout_abort(
674+
const char* unit,
675+
const char *filename,
676+
unsigned line,
677+
const char *section,
678+
unsigned section_line,
679+
const char *lvalue,
680+
int ltype,
681+
const char *rvalue,
682+
void *data,
683+
void *userdata) {
684+
685+
int r;
686+
687+
assert(filename);
688+
assert(lvalue);
689+
assert(rvalue);
690+
691+
rvalue += strspn(rvalue, WHITESPACE);
692+
if (isempty(rvalue)) {
693+
arg_default_timeout_abort_set = false;
694+
return 0;
695+
}
696+
697+
r = parse_sec(rvalue, &arg_default_timeout_abort_usec);
698+
if (r < 0) {
699+
log_syntax(unit, LOG_ERR, filename, line, r, "Failed to parse DefaultTimeoutAbortSec= setting, ignoring: %s", rvalue);
700+
return 0;
701+
}
702+
703+
arg_default_timeout_abort_set = true;
704+
return 0;
705+
}
706+
671707
static int parse_config_file(void) {
672708

673709
const ConfigTableItem items[] = {
@@ -697,6 +733,7 @@ static int parse_config_file(void) {
697733
{ "Manager", "DefaultStandardError", config_parse_output_restricted,0, &arg_default_std_error },
698734
{ "Manager", "DefaultTimeoutStartSec", config_parse_sec, 0, &arg_default_timeout_start_usec },
699735
{ "Manager", "DefaultTimeoutStopSec", config_parse_sec, 0, &arg_default_timeout_stop_usec },
736+
{ "Manager", "DefaultTimeoutAbortSec", config_parse_timeout_abort, 0, NULL },
700737
{ "Manager", "DefaultRestartSec", config_parse_sec, 0, &arg_default_restart_usec },
701738
{ "Manager", "DefaultStartLimitInterval", config_parse_sec, 0, &arg_default_start_limit_interval }, /* obsolete alias */
702739
{ "Manager", "DefaultStartLimitIntervalSec",config_parse_sec, 0, &arg_default_start_limit_interval },
@@ -765,6 +802,8 @@ static void set_manager_defaults(Manager *m) {
765802
m->default_std_error = arg_default_std_error;
766803
m->default_timeout_start_usec = arg_default_timeout_start_usec;
767804
m->default_timeout_stop_usec = arg_default_timeout_stop_usec;
805+
m->default_timeout_abort_usec = arg_default_timeout_abort_usec;
806+
m->default_timeout_abort_set = arg_default_timeout_abort_set;
768807
m->default_restart_usec = arg_default_restart_usec;
769808
m->default_start_limit_interval = arg_default_start_limit_interval;
770809
m->default_start_limit_burst = arg_default_start_limit_burst;

src/core/manager.h

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -330,6 +330,8 @@ struct Manager {
330330
ExecOutput default_std_output, default_std_error;
331331

332332
usec_t default_restart_usec, default_timeout_start_usec, default_timeout_stop_usec;
333+
usec_t default_timeout_abort_usec;
334+
bool default_timeout_abort_set;
333335

334336
usec_t default_start_limit_interval;
335337
unsigned default_start_limit_burst;
@@ -417,6 +419,10 @@ struct Manager {
417419
bool honor_device_enumeration;
418420
};
419421

422+
static inline usec_t manager_default_timeout_abort_usec(Manager *m) {
423+
return m->default_timeout_abort_set ? m->default_timeout_abort_usec : m->default_timeout_stop_usec;
424+
}
425+
420426
#define MANAGER_IS_SYSTEM(m) ((m)->unit_file_scope == UNIT_FILE_SYSTEM)
421427
#define MANAGER_IS_USER(m) ((m)->unit_file_scope != UNIT_FILE_SYSTEM)
422428

0 commit comments

Comments
 (0)