Forces a replica to perform a manual failover of its primary.
CLUSTER
FAILOVER
[FORCE
|
TAKEOVER
]
This command, that can only be sent to a Valkey Cluster replica node, forces the replica to start a manual failover of its primary instance.
A manual failover is a special kind of failover that is usually executed when there are no actual failures, but we wish to swap the current primary with one of its replicas (which is the node we send the command to), in a safe way, without any window for data loss. It works in the following way:
This way clients are moved away from the old primary to the new primary atomically and only when the replica that is turning into the new primary has processed all of the replication stream from the old primary.
The command behavior can be modified by two options: FORCE and TAKEOVER.
If the FORCE option is given, the replica does not perform any handshake with the primary, that may be not reachable, but instead just starts a failover ASAP starting from point 4. This is useful when we want to start a manual failover while the primary is no longer reachable.
However using FORCE we still need the majority of primaries to be available in order to authorize the failover and generate a new configuration epoch for the replica that is going to become primary.
There are situations where this is not enough, and we want a replica to failover without any agreement with the rest of the cluster. A real world use case for this is to mass promote replicas in a different data center to primaries in order to perform a data center switch, while all the primaries are down or partitioned away.
The TAKEOVER option implies everything
FORCE implies, but also does not uses any cluster
authorization in order to failover. A replica receiving
CLUSTER FAILOVER TAKEOVER
will instead:
configEpoch
unilaterally, just taking
the current greatest epoch available and incrementing it if its local
configuration epoch is not already the greatest.Note that TAKEOVER violates the last-failover-wins principle of Valkey Cluster, since the configuration epoch generated by the replica violates the normal generation of configuration epochs in several ways:
Because of this the TAKEOVER option should be used with care.
CLUSTER FAILOVER
, unless the TAKEOVER
option is specified, does not execute a failover synchronously. It only
schedules a manual failover, bypassing the failure detection
stage.OK
reply is no guarantee that the failover will
succeed.CLUSTER NODES
or CLUSTER REPLICAS
to each
of the primary nodes and check that it appears as a replica, before
sending CLUSTER FAILOVER
to the replica.ROLE
, INFO REPLICATION
(which indicates
“role:master” after successful failover), or CLUSTER NODES
to verify that the state of the cluster has changed sometime after the
command was sent.Simple string
reply: OK
if the command was accepted and a manual
failover is going to be attempted. An error if the operation cannot be
executed, for example if the client is connected to a node that is
already a primary.
O(1)
@admin @dangerous @slow
ASKING, CLUSTER, CLUSTER ADDSLOTS, CLUSTER ADDSLOTSRANGE, CLUSTER BUMPEPOCH, CLUSTER COUNT-FAILURE-REPORTS, CLUSTER COUNTKEYSINSLOT, CLUSTER DELSLOTS, CLUSTER DELSLOTSRANGE, CLUSTER FLUSHSLOTS, CLUSTER FORGET, CLUSTER GETKEYSINSLOT, CLUSTER HELP, CLUSTER INFO, CLUSTER KEYSLOT, CLUSTER LINKS, CLUSTER MEET, CLUSTER MYID, CLUSTER MYSHARDID, CLUSTER NODES, CLUSTER REPLICAS, CLUSTER REPLICATE, CLUSTER RESET, CLUSTER SAVECONFIG, CLUSTER SET-CONFIG-EPOCH, CLUSTER SETSLOT, CLUSTER SHARDS, CLUSTER SLOT-STATS, CLUSTER SLOTS, READONLY, READWRITE.