Skip to content

Commit bf76080

Browse files
jkloetzkepoettering
authored andcommitted
core: let user define start-/stop-timeout behaviour
The usual behaviour when a timeout expires is to terminate/kill the service. This is what user usually want in production systems. To debug services that fail to start/stop (especially sporadic failures) it might be necessary to trigger the watchdog machinery and write core dumps, though. Likewise, it is usually just a waste of time to gracefully stop a stuck service. Instead it might save time to go directly into kill mode. This commit adds two new options to services: TimeoutStartFailureMode= and TimeoutStopFailureMode=. Both take the same values and tweak the behavior of systemd when a start/stop timeout expires: * 'terminate': is the default behaviour as it has always been, * 'abort': triggers the watchdog machinery and will send SIGABRT (unless WatchdogSignal was changed) and * 'kill' will directly send SIGKILL. To handle the stop failure mode in stop-post state too a new final-watchdog state needs to be introduced.
1 parent 8b5616f commit bf76080

File tree

10 files changed

+200
-42
lines changed

10 files changed

+200
-42
lines changed

man/systemd.service.xml

Lines changed: 30 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -560,16 +560,12 @@
560560

561561
<varlistentry>
562562
<term><varname>TimeoutStartSec=</varname></term>
563-
<listitem><para>Configures the time to wait for start-up. If a
564-
daemon service does not signal start-up completion within the
565-
configured time, the service will be considered failed and
566-
will be shut down again. Takes a unit-less value in seconds,
567-
or a time span value such as "5min 20s". Pass
568-
<literal>infinity</literal> to disable the timeout logic. Defaults to
569-
<varname>DefaultTimeoutStartSec=</varname> from the manager
570-
configuration file, except when
571-
<varname>Type=oneshot</varname> is used, in which case the
572-
timeout is disabled by default (see
563+
<listitem><para>Configures the time to wait for start-up. If a daemon service does not signal start-up
564+
completion within the configured time, the service will be considered failed and will be shut down again. The
565+
precise action depends on the <varname>TimeoutStartFailureMode=</varname> option. Takes a unit-less value in
566+
seconds, or a time span value such as "5min 20s". Pass <literal>infinity</literal> to disable the timeout logic.
567+
Defaults to <varname>DefaultTimeoutStartSec=</varname> from the manager configuration file, except when
568+
<varname>Type=oneshot</varname> is used, in which case the timeout is disabled by default (see
573569
<citerefentry><refentrytitle>systemd-system.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>).
574570
</para>
575571

@@ -588,7 +584,8 @@
588584
<listitem><para>This option serves two purposes. First, it configures the time to wait for each
589585
<varname>ExecStop=</varname> command. If any of them times out, subsequent <varname>ExecStop=</varname> commands
590586
are skipped and the service will be terminated by <constant>SIGTERM</constant>. If no <varname>ExecStop=</varname>
591-
commands are specified, the service gets the <constant>SIGTERM</constant> immediately. Second, it configures the time
587+
commands are specified, the service gets the <constant>SIGTERM</constant> immediately. This default behavior
588+
can be changed by the <varname>TimeoutStopFailureMode=</varname> option. Second, it configures the time
592589
to wait for the service itself to stop. If it doesn't terminate in the specified time, it will be forcibly terminated
593590
by <constant>SIGKILL</constant> (see <varname>KillMode=</varname> in
594591
<citerefentry><refentrytitle>systemd.kill</refentrytitle><manvolnum>5</manvolnum></citerefentry>).
@@ -646,6 +643,28 @@
646643
</para></listitem>
647644
</varlistentry>
648645

646+
<varlistentry>
647+
<term><varname>TimeoutStartFailureMode=</varname></term>
648+
<term><varname>TimeoutStopFailureMode=</varname></term>
649+
650+
<listitem><para>These options configure the action that is taken in case a daemon service does not signal
651+
start-up within its configured <varname>TimeoutStartSec=</varname>, respectively if it does not stop within
652+
<varname>TimeoutStopSec=</varname>. Takes one of <option>terminate</option>, <option>abort</option> and
653+
<option>kill</option>. Both options default to <option>terminate</option>.</para>
654+
655+
<para>If <option>terminate</option> is set the service will be gracefully terminated by sending the signal
656+
specified in <varname>KillSignal=</varname> (defaults to <constant>SIGTERM</constant>, see
657+
<citerefentry><refentrytitle>systemd.kill</refentrytitle><manvolnum>5</manvolnum></citerefentry>). If the
658+
service does not terminate the <varname>FinalKillSignal=</varname> is sent after
659+
<varname>TimeoutStopSec=</varname>. If <option>abort</option> is set, <varname>WatchdogSignal=</varname> is sent
660+
instead and <varname>TimeoutAbortSec=</varname> applies before sending <varname>FinalKillSignal=</varname>.
661+
This setting may be used to analyze services that fail to start-up or shut-down intermittently.
662+
By using <option>kill</option> the service is immediately terminated by sending
663+
<varname>FinalKillSignal=</varname> without any further timeout. This setting can be used to expedite the
664+
shutdown of failing services.
665+
</para></listitem>
666+
</varlistentry>
667+
649668
<varlistentry>
650669
<term><varname>RuntimeMaxSec=</varname></term>
651670

src/basic/unit-def.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -185,6 +185,7 @@ static const char* const service_state_table[_SERVICE_STATE_MAX] = {
185185
[SERVICE_STOP_SIGTERM] = "stop-sigterm",
186186
[SERVICE_STOP_SIGKILL] = "stop-sigkill",
187187
[SERVICE_STOP_POST] = "stop-post",
188+
[SERVICE_FINAL_WATCHDOG] = "final-watchdog",
188189
[SERVICE_FINAL_SIGTERM] = "final-sigterm",
189190
[SERVICE_FINAL_SIGKILL] = "final-sigkill",
190191
[SERVICE_FAILED] = "failed",

src/basic/unit-def.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -127,6 +127,7 @@ typedef enum ServiceState {
127127
SERVICE_STOP_SIGTERM,
128128
SERVICE_STOP_SIGKILL,
129129
SERVICE_STOP_POST,
130+
SERVICE_FINAL_WATCHDOG, /* In case the STOP_POST executable needs to be aborted. */
130131
SERVICE_FINAL_SIGTERM, /* In case the STOP_POST executable hangs, we shoot that down, too */
131132
SERVICE_FINAL_SIGKILL,
132133
SERVICE_FAILED,

src/core/dbus-service.c

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ static BUS_DEFINE_PROPERTY_GET_ENUM(property_get_notify_access, notify_access, N
2929
static BUS_DEFINE_PROPERTY_GET_ENUM(property_get_emergency_action, emergency_action, EmergencyAction);
3030
static BUS_DEFINE_PROPERTY_GET(property_get_timeout_abort_usec, "t", Service, service_timeout_abort_usec);
3131
static BUS_DEFINE_PROPERTY_GET(property_get_watchdog_usec, "t", Service, service_get_watchdog_usec);
32+
static BUS_DEFINE_PROPERTY_GET_ENUM(property_get_timeout_failure_mode, service_timeout_failure_mode, ServiceTimeoutFailureMode);
3233

3334
static int property_get_exit_status_set(
3435
sd_bus *bus,
@@ -101,6 +102,8 @@ const sd_bus_vtable bus_service_vtable[] = {
101102
SD_BUS_PROPERTY("TimeoutStartUSec", "t", bus_property_get_usec, offsetof(Service, timeout_start_usec), SD_BUS_VTABLE_PROPERTY_CONST),
102103
SD_BUS_PROPERTY("TimeoutStopUSec", "t", bus_property_get_usec, offsetof(Service, timeout_stop_usec), SD_BUS_VTABLE_PROPERTY_CONST),
103104
SD_BUS_PROPERTY("TimeoutAbortUSec", "t", property_get_timeout_abort_usec, 0, 0),
105+
SD_BUS_PROPERTY("TimeoutStartFailureMode", "s", property_get_timeout_failure_mode, offsetof(Service, timeout_start_failure_mode), SD_BUS_VTABLE_PROPERTY_CONST),
106+
SD_BUS_PROPERTY("TimeoutStopFailureMode", "s", property_get_timeout_failure_mode, offsetof(Service, timeout_stop_failure_mode), SD_BUS_VTABLE_PROPERTY_CONST),
104107
SD_BUS_PROPERTY("RuntimeMaxUSec", "t", bus_property_get_usec, offsetof(Service, runtime_max_usec), SD_BUS_VTABLE_PROPERTY_CONST),
105108
SD_BUS_PROPERTY("WatchdogUSec", "t", property_get_watchdog_usec, 0, 0),
106109
BUS_PROPERTY_DUAL_TIMESTAMP("WatchdogTimestamp", offsetof(Service, watchdog_timestamp), 0),
@@ -259,6 +262,7 @@ static BUS_DEFINE_SET_TRANSIENT_PARSE(service_type, ServiceType, service_type_fr
259262
static BUS_DEFINE_SET_TRANSIENT_PARSE(service_restart, ServiceRestart, service_restart_from_string);
260263
static BUS_DEFINE_SET_TRANSIENT_PARSE(oom_policy, OOMPolicy, oom_policy_from_string);
261264
static BUS_DEFINE_SET_TRANSIENT_STRING_WITH_CHECK(bus_name, sd_bus_service_name_is_valid);
265+
static BUS_DEFINE_SET_TRANSIENT_PARSE(timeout_failure_mode, ServiceTimeoutFailureMode, service_timeout_failure_mode_from_string);
262266

263267
static int bus_service_set_transient_property(
264268
Service *s,
@@ -316,6 +320,12 @@ static int bus_service_set_transient_property(
316320
return r;
317321
}
318322

323+
if (streq(name, "TimeoutStartFailureMode"))
324+
return bus_set_transient_timeout_failure_mode(u, name, &s->timeout_start_failure_mode, message, flags, error);
325+
326+
if (streq(name, "TimeoutStopFailureMode"))
327+
return bus_set_transient_timeout_failure_mode(u, name, &s->timeout_stop_failure_mode, message, flags, error);
328+
319329
if (streq(name, "RuntimeMaxUSec"))
320330
return bus_set_transient_usec(u, name, &s->runtime_max_usec, message, flags, error);
321331

src/core/load-fragment-gperf.gperf.m4

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -322,6 +322,8 @@ Service.TimeoutSec, config_parse_service_timeout, 0,
322322
Service.TimeoutStartSec, config_parse_service_timeout, 0, 0
323323
Service.TimeoutStopSec, config_parse_sec_fix_0, 0, offsetof(Service, timeout_stop_usec)
324324
Service.TimeoutAbortSec, config_parse_service_timeout_abort, 0, 0
325+
Service.TimeoutStartFailureMode, config_parse_service_timeout_failure_mode, 0, offsetof(Service, timeout_start_failure_mode)
326+
Service.TimeoutStopFailureMode, config_parse_service_timeout_failure_mode, 0, offsetof(Service, timeout_stop_failure_mode)
325327
Service.RuntimeMaxSec, config_parse_sec, 0, offsetof(Service, runtime_max_usec)
326328
Service.WatchdogSec, config_parse_sec, 0, offsetof(Service, watchdog_usec)
327329
m4_dnl The following five only exist for compatibility, they moved into Unit, see above

src/core/load-fragment.c

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -123,6 +123,7 @@ DEFINE_CONFIG_PARSE_ENUM(config_parse_protect_system, protect_system, ProtectSys
123123
DEFINE_CONFIG_PARSE_ENUM(config_parse_runtime_preserve_mode, exec_preserve_mode, ExecPreserveMode, "Failed to parse runtime directory preserve mode");
124124
DEFINE_CONFIG_PARSE_ENUM(config_parse_service_type, service_type, ServiceType, "Failed to parse service type");
125125
DEFINE_CONFIG_PARSE_ENUM(config_parse_service_restart, service_restart, ServiceRestart, "Failed to parse service restart specifier");
126+
DEFINE_CONFIG_PARSE_ENUM(config_parse_service_timeout_failure_mode, service_timeout_failure_mode, ServiceTimeoutFailureMode, "Failed to parse timeout failure mode");
126127
DEFINE_CONFIG_PARSE_ENUM(config_parse_socket_bind, socket_address_bind_ipv6_only_or_bool, SocketAddressBindIPv6Only, "Failed to parse bind IPv6 only value");
127128
DEFINE_CONFIG_PARSE_ENUM(config_parse_oom_policy, oom_policy, OOMPolicy, "Failed to parse OOM policy");
128129
DEFINE_CONFIG_PARSE_ENUM_WITH_DEFAULT(config_parse_ip_tos, ip_tos, int, -1, "Failed to parse IP TOS value");
@@ -4941,6 +4942,7 @@ void unit_dump_config_items(FILE *f) {
49414942
{ config_parse_exec, "PATH [ARGUMENT [...]]" },
49424943
{ config_parse_service_type, "SERVICETYPE" },
49434944
{ config_parse_service_restart, "SERVICERESTART" },
4945+
{ config_parse_service_timeout_failure_mode, "TIMEOUTMODE" },
49444946
{ config_parse_kill_mode, "KILLMODE" },
49454947
{ config_parse_signal, "SIGNAL" },
49464948
{ config_parse_socket_listen, "SOCKET [...]" },

src/core/load-fragment.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ CONFIG_PARSER_PROTOTYPE(config_parse_exec_coredump_filter);
3030
CONFIG_PARSER_PROTOTYPE(config_parse_exec);
3131
CONFIG_PARSER_PROTOTYPE(config_parse_service_timeout);
3232
CONFIG_PARSER_PROTOTYPE(config_parse_service_timeout_abort);
33+
CONFIG_PARSER_PROTOTYPE(config_parse_service_timeout_failure_mode);
3334
CONFIG_PARSER_PROTOTYPE(config_parse_service_type);
3435
CONFIG_PARSER_PROTOTYPE(config_parse_service_restart);
3536
CONFIG_PARSER_PROTOTYPE(config_parse_socket_bindtodevice);

0 commit comments

Comments
 (0)