About FireCluster Failover

The FireCluster failover process is the same for an active/active cluster or an active/passive cluster. With both types of clusters, each cluster member maintains state and session information at all times. When failover occurs, the packet filter connections, branch office VPN tunnels, and user sessions from the failed device fail over automatically to the other device in the cluster.

One Firebox is the cluster master and the other device is the backup master. The backup master uses the primary cluster interface to synchronize connection and session information with the cluster master. If the primary cluster interface fails or is disconnected, the backup master uses the backup cluster interface to communicate with the cluster master. The cluster master also uses both the primary and backup cluster interfaces to send a heartbeat packet once per second to the backup master. We recommend that you always configure both a primary cluster interface and a backup cluster interface.

Events that Trigger a Failover

There are three types of events that can trigger a failover of the cluster master.

Health index of the cluster master is lower than the health index of the backup master

Each cluster member has a calculated health index that indicates the overall health of the device. If the health index of the cluster master is lower than the health index of the backup master, this triggers failover of the cluster master.

For more information about the cluster health index, go to Monitor Cluster Health.

Lost heartbeat from the cluster master

The cluster master sends a heartbeat packet through the primary and backup cluster interfaces once per second. If the backup master does not receive three consecutive heartbeats from the cluster master, this triggers failover of the cluster master. The default threshold for lost heartbeats is three. You can increase the lost heartbeat threshold that triggers a failover in the FireCluster Advanced settings.

For more information about the lost heartbeat threshold, go to Configure FireCluster Advanced Settings.

Cluster receives the Failover Master command

In Firebox System Manager, when you select Tools > Cluster > Failover Master, you force a failover from the cluster master to the backup master.

For more information about this command, go to Force a Failover of the Cluster Master.

For interfaces included in multi-WAN or link aggregation configurations:

Multi-WAN — FireCluster failover is triggered when the physical interface is down or does not respond. FireCluster failover is not triggered if multi-WAN failover occurs because of a link monitor failure.
Link Aggregation — FireCluster failover is triggered if all Link Aggregation member interfaces fail. FireCluster failover is not triggered if only some Link Aggregation member interfaces fail.

What Happens When a Failover Occurs

When a failover of the cluster master occurs, the backup master becomes the cluster master. Then, the original cluster master rejoins the cluster as the backup master. When a failover occurs, the cluster maintains all packet filter connections, branch office VPN tunnels, and user sessions. This behavior is the same for an active/active or an active/passive FireCluster.

In an active/active cluster, if the backup master fails, the cluster master maintains all packet filter connections, branch office VPN tunnels, and user sessions. Proxy connections and Mobile VPN connections can be interrupted, as described in this list. In an active/passive cluster, if the backup master fails, there is no interruption of connections or sessions because no traffic is assigned to the backup master.

Connection/Session Type	Impact of a Failover Event
Packet filter connections	Connections fail over to the other cluster member.
Branch Office VPN tunnels	Tunnels fail over to the other cluster member. Some third-party devices might encounter a tunnel failure after a FireCluster failover event.
User sessions	Sessions fail over to the other cluster member.
Proxy connections	Connections assigned to the failed device (master or backup master) must be restarted. Connections assigned to the other device are not interrupted.
Access Portal	Access Portal user sessions and connections to Access Portal web applications remain active after a failover. RDP and SSH connections initiated through the Access Portal are disconnected after a failover.
Mobile VPN with IPSec	If the cluster master fails over, all sessions must be restarted. If the backup master fails, only the sessions assigned to the backup master must be restarted. Sessions assigned to the cluster master are not interrupted.
Mobile VPN with SSL	If either device fails over, all sessions must be restarted.
Mobile VPN with L2TP	All L2TP sessions are assigned to the cluster master, even for an active/active cluster. If the cluster master fails over, all sessions must be restarted. If the backup master fails, L2TP sessions are not interrupted.
Mobile VPN with IKEv2	If the cluster master fails over, all sessions must be restarted. If the backup master fails, only the sessions assigned to the backup master must be restarted. Sessions assigned to the cluster master are not interrupted.

FireCluster Failover and Server Load Balancing

If you use server load balancing to balance connections between your internal servers, when a FireCluster failover event occurs, real-time synchronization does not occur. After a failover, the new cluster master sends connections to all servers in the server load balancing list to discover which servers are available. It then applies the server load balancing algorithm to all available servers.

For information about server load balancing, go to Configure Server Load Balancing.

FireCluster Failover and Dynamic Routing

When you enable dynamic routing on a FireCluster, only the cluster master participates directly in the dynamic routing domain. The cluster master synchronizes dynamic route information to the other cluster member. When a failover occurs, the new cluster master initially uses the previously learned dynamic routes. The new cluster master then participates in the dynamic routing domain and uses the configured dynamic routing protocol to discover the latest routes to all destination networks. When the new cluster master discovers the updated dynamic routes, the old dynamic routes are purged and replaced with the new ones.

The time it takes for the new cluster master and all connected routers to agree on a common set of routes (the convergence time) depends on the dynamic routing protocol.

For RIPv1 and RIPv2

The peer RIP router does not detect the FireCluster failover event if the connection itself is not interrupted during the failover.

OSPFv2

The peer router detects the FireCluster failover event. The convergence time for OSPF is from 10 to 40 seconds. The convergence time could be shorter, because the new cluster master uses a set of known dynamic routes synchronized from the previous cluster master until it discovers the updated dynamic routes.

BGPv4

The peer router detects the FireCluster failover event. The convergence time for BGP is from 1 to 3 minutes. The convergence time could be shorter, because the new cluster master uses a set of known dynamic routes synchronized from the previous cluster master until it discovers the updated dynamic routes.

FireCluster Failover and Third-Party BOVPN Endpoints

When FireCluster failover takes place, the new active cluster member initiates a new Phase 1 negotiation with remote BOVPN gateways. When the new active cluster member discovers that a remote gateway is a third-party device that does not support A Childless Initiation of the Internet Key Exchange Version 2 (IKEv2) Security Association (SA), it terminates the Phase 1 negotiation and automatically recreates the BOVPN tunnel. For more information, go to RFC 6023 (external link). The time taken to re-establish the BOVPN tunnel depends on Firebox traffic load and if traffic is waiting to go through the BOVPN tunnel.

Monitor the Cluster During a Failover

The role of each device in the cluster appears after the member name on the Firebox System Manager Front Panel tab. If you look at the Front Panel tab during a failover of the cluster master, you can see the cluster master role move from one device to another. During a failover, you see:

The role of the old backup master changes from backup master to master.
The role of the old cluster master changes to inactive and then to idle while the device restarts.
The role of the old cluster master changes to backup master after the device restarts.

For more information, go to Monitor and Control FireCluster Members.

FireCluster Failover and Subscription Services

If you enable licensed subscription services for your FireCluster, the services continue to operate after the failover, as long as you have purchased the required subscription services for FireCluster members. The requirements are different for an active/active FireCluster than for an active/passive FireCluster.

Active/Active — You must have the same subscription services enabled in the feature keys for both cluster members. Each cluster member applies the services from its own feature key.
Active/Passive — You must enable the subscription services in the feature key for only one cluster member. The active cluster member uses the subscription services that are active in the feature key of either cluster member.

For more information about feature keys and FireCluster, go to About Feature Keys and FireCluster.