Instance error log (AWS)

Job details

Name:	Instance error log (AWS)
Platform:	Sqlserver
Category:	Availability
Description:	Reads and checks the Instance error log file by using the AWS rdsadmin.dbo.rds_read_error_log stored procedure.
Long description:	The rdsadmin.dbo.rds_read_error_log stored procedure allows to read the contents of the SQL Server error log file. This procedure checks for errors defined by the ‘error text’ parameter.
Version:	1.4
Default schedule:	6,26,46 * * *
Requires engine install:	Yes
Compatibility tag:	.[type=‘instance’ & databasetype=‘sqlserver’]/instance[maj_version > ‘2005′ & hasengine=‘YES’ & engine_edition = ‘Microsoft SQL Server’ & eng_inst_priv = 1 & eng_inst_aws_priv = 0 & rds_read_error_log_exists = 1]

Name	Default value	Description
error text	failed, error	Defines which strings (errors) the Check must look for. Values must be separated with commas (,).
error text allowed	Log\ERRORLOG, informational message only,log has been reinitialized, 0 errors	Specifies the text strings which make errors (specified by the “error text” parameter) to be excluded when found in the same record (error log line).
history threshold	1000	The maximum numbers of error messages the history table will keep (error_log_err_histr_tab table).
last run		The last time the error log has been checked.
max lines per execution	5000	The maximum number of lines the Alert will read during execution before a warning/alarm is raised.
return status	0	Return status value (ALARM – 2, WARNING – 1, or OK – 0) when “error text” is found (or “max lines per execution” is reached.
max elap time	180	The maximum execution time (in seconds) for the Alert. If this threshold is reached a warning is raised.

Purpose: The purpose of this job is to monitor, extract, and analyze error log entries from Microsoft SQL Server instances on AWS.
Why: Monitoring error logs is crucial for maintaining the uptime and availability of databases by quickly identifying and responding to issues. This job specifically targets error log analysis in SQL Server instances hosted on AWS to ensure that any emerging issues are promptly addressed, thus mitigating potential impacts on database availability and performance.
Manual checking: You can check the error logs manually in the database using the following SQL command:

EXEC rdsadmin.dbo.rds_read_error_log 0, 1;

This dbWatch job utilizes the AWS RDS specific stored procedure ‘rdsadmin.dbo.rds_read_error_log’ to read the SQL Server error logs. It filters logs based on text strings defined as errors and excludes those defined as non-errors based on other parameterized text strings.
The job includes various key configurations like history threshold to maintain the size of error history, the maximum number of lines the job should read per execution, and a timing threshold to ensure prompt execution, thereby preventing long-running job executions.

The job is configured to check instances of SQL Server (version greater than 2005) hosted on AWS with specific properties like being an instance having an engine with permission levels appropriate for reading the error logs.
It runs automatically at a default schedule of every 20 minutes (6, 26, 46 minutes past each hour).

The job outputs include execution history, a count of detected errors, and diagnostic messages that describe each detected issue along with its occurrence time. Errors that do not meet the exclusion criteria are stored in a history table and a test table.

Error Text	Allowed Text	Message	Date
error example	allowed tag	This is a sample error message	2023-01-01 12:30:00

It includes metrics such as the number of records checked, elapsed time per check, and frequency of error log entries over time.

The dbscripts involved in this job create necessary local temporary and history tables such as error_log_histr_tab, error_log_err_histr_tab, and error_log_aggr_test_tab among others. These tables facilitate the logging and cyclical analysis of errors over different executions.

The monitoring job can trigger alerts based on the severity and type of error encountered. Parameters such as ‘return status’ might change based on the found errors which can lead to different alert statuses: ALARM, WARNING, or OK.
The job is also equipped with mechanisms to handle and log exceptions during the execution, ensuring that any job execution issues are captured for troubleshooting.

Dependencies such as the main procedure ‘error_log_proc’ and associated tables are managed automatically with cleanup options configured to true, ensuring that temporary data used during execution does not persist unnecessarily.

Potential future enhancements may include more granular parameterization of error texts or integration with more advanced AI-based analysis tools for predictive error handling and automated error resolutions.