Signal handling · Valkey

Signal handling

Description

This document provides information about how Valkey reacts to different POSIX signals such as SIGTERM and SIGSEGV.

SIGTERM and SIGINT

The SIGTERM and SIGINT signals tell Valkey to shut down gracefully. When the server receives this signal, it does not immediately exit. Instead, it schedules a shutdown similar to the one performed by the SHUTDOWN command. The scheduled shutdown starts as soon as possible, specifically as long as the current command in execution terminates (if any), with a possible additional delay of 0.1 seconds or less.

If the server is blocked by a long-running Lua script, kill the script with SCRIPT KILL if possible. The scheduled shutdown will run just after the script is killed or terminates spontaneously.

This shutdown process includes the following actions:

IF the RDB file can’t be saved, the shutdown fails, and the server continues to run in order to ensure no data loss. Likewise, if the user just turned on AOF, and the server triggered the first AOF rewrite in order to create the initial AOF file but this file can’t be saved, the shutdown fails and the server continues to run. No further attempt to shut down will be made unless a new SIGTERM is received or the SHUTDOWN command is issued.

Since Redis OSS 7.0, the server waits for lagging replicas up to a configurable shutdown-timeout, 10 seconds by default, before shutting down. This provides a best effort to minimize the risk of data loss in a situation where no save points are configured and AOF is deactivated. Before version 7.0, shutting down a heavily loaded primary node in a diskless setup was more likely to result in data loss. To minimize the risk of data loss in such setups, trigger a manual FAILOVER (or CLUSTER FAILOVER) to demote the primary to a replica and promote one of the replicas to a new primary before shutting down a primary node.

SIGSEGV, SIGBUS, SIGFPE and SIGILL

The following signals are handled as a Valkey crash:

Once one of these signals is trapped, Valkey stops any current operation and performs the following actions:

What happens when a child process gets killed

When the child performing the Append Only File rewrite gets killed by a signal, Valkey handles this as an error and discards the (probably partial or corrupted) AOF file. It will attempt the rewrite again later.

When the child performing an RDB save is killed, Valkey handles the condition as a more severe error. While the failure of an AOF file rewrite can cause AOF file enlargement, failed RDB file creation reduces durability.

As a result of the child producing the RDB file being killed by a signal, or when the child exits with an error (non zero exit code), Valkey enters a special error condition where no further write command is accepted.

This error condition will persist until it becomes possible to create an RDB file successfully.

Kill the RDB file without errors

Sometimes the user may want to kill the RDB-saving child process without generating an error. This can be done using the signal SIGUSR1. This signal is handled in a special way: it kills the child process like any other signal, but the parent process will not detect this as a critical error and will continue to serve write requests.