High activity monitor

Job details

Name:	High activity monitor
Platform:	Sqlserver
Category:	Performance
Description:	Collects SQL instance CPU usage, CPU usage of other processes, logical and physical reads, and active processes count on the SQL Server instance.
Long description:	Collects SQL instance CPU usage, CPU usage of other processes, logical and physical reads, and active processes count on the SQL Server instance. Collects SQL instance CPU usage, CPU usage of other processes, logical and physical reads, and active processes count on the SQL Server instance.
Version:	2
Default schedule:	0 * * *
Requires engine install:	Yes
Compatibility tag:	.[type=‘instance’ & databasetype=‘sqlserver’]/instance[maj_version > ‘2005′ & hasengine=‘YES’ & (engine_edition = ‘Microsoft SQL Server’	engine_edition = ‘Azure SQL Managed Instance’)]

Parameters

Name	Default value	Description
history threshold	48	The number of hours the statistics are hold in the history table.
logical reads threshold	250000	Avg. logical reads per second. If this threshold is reached over a period of time (controlled by two parameters, “warning threshold” and “alarm threshold”), a warning/alarm is returned.
alarm threshold	60	A period of time (in minutes) which must be passed (combined with average values exceeding the threshold values) before an alarm is returned by the alert procedure.
SQL Server CPU threshold	85	Avg. CPU usage by SQL Server. If this threshold is reached over a period of time (controlled by two parameters, “warning threshold” and “alarm threshold”), a warning/alarm is returned.
other CPU threshold	60	Avg. CPU usage (excluding SQL Server). If this threshold is reached over a period of time (controlled by two parameters, “warning threshold” and “alarm threshold”), a warning/alarm is returned.
physical reads threshold	20000	Avg. physical reads per second. If this threshold is reached over a period of time (controlled by two parameters, “warning threshold” and “alarm threshold”), a warning/alarm is returned.
active processes threshold	25	Avg. active processes. If this threshold is reached over a period of time (controlled by parameter “period span”), a warning/alarm is returned.
warning threshold	15	A period of time (in minutes) which must be passed (combined with average values exceeding the threshold values) before a warning is returned by the alert procedure.
enable warnings and alarms	NO	If set to “NO” (default), the job will only collect performance statistics without returning status warning or alarm. Value “YES“ will activate the job.
graph time range	2	This parameter determines how many hours back from the current time the data should be visualized in the report graphs regardless of how many hours statistics are collected for.

Job Summary

Purpose: The purpose of this job is to monitor and collect key performance metrics such as SQL instance CPU usage, logical and physical reads, and the number of active processes on a SQL Server instance. It provides timely warnings and alarms based on predefined thresholds to maintain optimal performance and availability.
Why: This monitoring job is crucial for ensuring the SQL Server’s health and performance. By setting thresholds on various metrics, it helps in proactive detection of potential issues such as high CPU usage, excessive read operations, or too many active sessions, which could affect the application’s performance and stability.
Manual checking: You can check these manually in the database by issuing SQL commands related to performance counters or by reviewing the monitoring tables created and updated by this job. Some example checks include:

SELECT * FROM high_act_mon_tab;

SELECT * FROM sys.dm_os_performance_counters WHERE counter_name = 'Batch Requests/sec';

Key Dependencies

The job has multiple dependencies, including other monitoring tables and support functions that either host the collected data or assist in the data collection process.
Dependencies include tables such as “high_act_mon_proc”, “high_act_mon_tab”, and “high_act_mon_last”, as well as functions like “dbwf_data_cache_ratio” and “dbwf_logical_reads”.

Parameters Handled

The job involves several parameters to control the monitoring logic, such as:

history threshold: Duration to keep the statistics in the history table.

logical reads threshold: Average number of logical reads per second that triggers a warning or alarm if exceeded over a certain period.

SQL Server CPU threshold: Average SQL Server CPU usage that triggers a warning or alarm if exceeded over a specified time frame.

other CPU threshold: Monitors the CPU usage by other processes, with set limits for warnings and alerts.

physical reads threshold: Average number of physical reads per second triggering an alert if exceeded.

active processes threshold: Tracks the number of active processes and triggers an alert if the specified number is exceeded.

Reporting and Visualization

The job includes several presentations for reporting and visualizing the data collected:

CPU Utilization: Graphical presentation of SQL Server and other CPU usages over the specified time range.
Logical and Physical Reads: Shows the historical data of logical and physical reads frequency.
Active Sessions: Reports on the number of active sessions over time.

Implementation Details

Several SQL procedures and functions are defined within the job to collect data, such as “high_act_mon_proc” which aggregates the necessary data into monitoring tables.
Functions like “dbwf_logical_reads” calculate average readings over specific intervals.
Cleanup procedures ensure the data does not grow indefinitely, maintaining tables like “high_act_mon_tab”.

Upgrades and Maintenance

The job has provisions for upgrades, ensuring new features or changes in thresholds can be easily deployed without disrupting existing setup.
Changes are carefully scripted to avoid data loss and ensure that the new features integrate seamlessly with the existing data structures.

This monitoring job is vital for database administrators to ensure that the SQL Server instance runs efficiently without performative bottlenecks or resource constraints, using a meticulous reporting and alert system to manage the database health proactively.