Failover cluster host switch

Job details

Name:	Failover cluster host switch
Platform:	Sqlserver
Category:	Cluster and Replication
Description:	Checks if an instance switched to a different host i a Windows Server Failover Cluster (WSFC).
Long description:	A Windows Server Failover Cluster (WSFC) is a group of independent servers that work together to increase the availability of applications and services. A warning or an alarm can be triggered if an instance switched to a different host i a WSFC.
Version:	1.2
Default schedule:	* * * *
Requires engine install:	Yes
Compatibility tag:	.[type=‘instance’ & databasetype=‘sqlserver’]/instance[ maj_version > ‘2014′ & hasengine=‘YES’ & engine_edition = ‘Microsoft SQL Server’ ]

Parameters

Name	Default value	Description
preferred host		A node on which the instance prefers to run. Ignored if empty.
return status	1	Return status value (ALARM – 2, WARNING – 1, or OK – 0) when the instance switch to a different host in Windows Server Failover Cluster.
return status when preferred host	1	Return status value (ALARM – 2, WARNING – 1, or OK – 0) when the instance not run on the preferred host (parameter “preferred host”).
keep status	30	For how long (in minutest) to keep warning/alarm status after a switch.

Job Summary

Purpose: The job “Failover cluster host switch” monitors host switches within a Windows Server Failover Cluster (WSFC). It aims to increase application and service availability by managing node failovers in a cluster environment.

Why: Monitoring failovers is critical to maintaining the high availability and reliability of services that clusters support. It helps in quick identification and resolution of issues associated with unexpected host switches which could lead to performance degradation or downtime.

Manual checking: You can manually check the failover events and the current primary host in the database using the following commands:

SELECT top 20 switch_no "Switch #", primary_host "Primary host", monitoring_start "Start as Primary", monitoring_end "End as Primary" FROM dbw_failover_cluster_switch_tab ORDER BY monitoring_end DESC;

Implementation Details

The monitoring involves creating a history table and a stored procedure that checks if the current host is different from the last recorded host, indicating a host switch.
Inserts new records into the history table whenever a host switch is detected.
Raises warnings or alarms based on configuration parameters such as status during a switch, or if the current host is not the preferred host.

Dependencies and Database Changes

Main Object: The job revolves around a procedure named “dbw_failover_cluster_switch_proc” and a table “dbw_failover_cluster_switch_tab”.
History Table: This table is used to record the details about the switches, including the primary host and timestamps marking the start and end of primary status after each switch.
Clean-up on Fail: True. This ensures that in case of task failure, the process will attempt to clean up by removing inaccuracies or incomplete data entries.

Report Output and Presentation

Report: The job provides a report titled “WSFC status” which lists the most recent host switches including details such as switch number, primary host, and timings.
Table Format: The output is presented in a table format with columns for Switch Number, Primary Host, Start and End times as Primary.

Switch #	Primary Host	Start as Primary	End as Primary
1	ServerA	2023-01-01 12:00:00	2023-01-02 12:00:00
2	ServerB	2023-01-02 13:00:00	2023-01-03 13:00:00

Scheduled Reports: The reports are generated on an hourly basis by default.

Installation and Configuration Requirements

Compatibility: This job is meant for instances running SQL Server version 2014 or higher with engine edition being Microsoft SQL Server.
Installation Type: The force-install attribute set to true indicates that installation will proceed even if checks are unsuccessful, ensuring that the job setup is not skipped.

This job is essential for maintaining high availability in clustered server environments by actively monitoring and reporting on failover events.