This document provides information about how Valkey reacts to
different POSIX signals such as SIGTERM
and
SIGSEGV
.
The SIGTERM
and SIGINT
signals tell Valkey
to shut down gracefully. When the server receives this signal, it does
not immediately exit. Instead, it schedules a shutdown similar to the
one performed by the SHUTDOWN
command. The scheduled
shutdown starts as soon as possible, specifically as long as the current
command in execution terminates (if any), with a possible additional
delay of 0.1 seconds or less.
If the server is blocked by a long-running Lua script, kill the
script with SCRIPT KILL
if possible. The scheduled shutdown
will run just after the script is killed or terminates
spontaneously.
This shutdown process includes the following actions:
CLIENT PAUSE
and
the WRITE
option.shutdown-timeout
(default 10
seconds) for replicas to catch up with the primary’s replication
offset.fsync
system
call on the AOF file descriptor to flush the buffers on disk.IF the RDB file can’t be saved, the shutdown fails, and the server
continues to run in order to ensure no data loss. Likewise, if the user
just turned on AOF, and the server triggered the first AOF rewrite in
order to create the initial AOF file but this file can’t be saved, the
shutdown fails and the server continues to run. No further attempt to
shut down will be made unless a new SIGTERM
is received or
the SHUTDOWN
command is issued.
Since Redis OSS 7.0, the server waits for lagging replicas up to a
configurable shutdown-timeout
, 10 seconds by default,
before shutting down. This provides a best effort to minimize the risk
of data loss in a situation where no save points are configured and AOF
is deactivated. Before version 7.0, shutting down a heavily loaded
primary node in a diskless setup was more likely to result in data loss.
To minimize the risk of data loss in such setups, trigger a manual
FAILOVER
(or CLUSTER FAILOVER
) to demote the
primary to a replica and promote one of the replicas to a new primary
before shutting down a primary node.
The following signals are handled as a Valkey crash:
Once one of these signals is trapped, Valkey stops any current operation and performs the following actions:
When the child performing the Append Only File rewrite gets killed by a signal, Valkey handles this as an error and discards the (probably partial or corrupted) AOF file. It will attempt the rewrite again later.
When the child performing an RDB save is killed, Valkey handles the condition as a more severe error. While the failure of an AOF file rewrite can cause AOF file enlargement, failed RDB file creation reduces durability.
As a result of the child producing the RDB file being killed by a signal, or when the child exits with an error (non zero exit code), Valkey enters a special error condition where no further write command is accepted.
MISCONFIG
error.This error condition will persist until it becomes possible to create an RDB file successfully.
Sometimes the user may want to kill the RDB-saving child process
without generating an error. This can be done using the signal
SIGUSR1
. This signal is handled in a special way: it kills
the child process like any other signal, but the parent process will not
detect this as a critical error and will continue to serve write
requests.