Instance error log (AWS)
Job details
Name: |
Instance error log (AWS) |
Platform: |
Sqlserver |
Category: |
Availability |
Description: |
Reads and checks the Instance error log file by using the AWS rdsadmin.dbo.rds_read_error_log stored procedure. |
Long description: |
The rdsadmin.dbo.rds_read_error_log stored procedure allows to read the contents of the SQL Server error log file. This procedure checks for errors defined by the ‘error text’ parameter. |
Version: |
1.4 |
Default schedule: |
6,26,46 * * * |
Requires engine install: |
Yes |
Compatibility tag: |
.[type=‘instance’ & databasetype=‘sqlserver’]/instance[maj_version > ‘2005′ & hasengine=‘YES’ & engine_edition = ‘Microsoft SQL Server’ & eng_inst_priv = 1 & eng_inst_aws_priv = 0 & rds_read_error_log_exists = 1] |
Parameters
Name |
Default value |
Description |
error text |
failed, error |
Defines which strings (errors) the Check must look for. Values must be separated with commas (,). |
error text allowed |
Log\ERRORLOG, informational message only,log has been reinitialized, 0 errors |
Specifies the text strings which make errors (specified by the “error text” parameter) to be excluded when found in the same record (error log line). |
history threshold |
1000 |
The maximum numbers of error messages the history table will keep (error_log_err_histr_tab table). |
last run |
|
The last time the error log has been checked. |
max lines per execution |
5000 |
The maximum number of lines the Alert will read during execution before a warning/alarm is raised. |
return status |
0 |
Return status value (ALARM – 2, WARNING – 1, or OK – 0) when “error text” is found (or “max lines per execution” is reached. |
max elap time |
180 |
The maximum execution time (in seconds) for the Alert. If this threshold is reached a warning is raised. |
Job Summary
- Purpose: The purpose of this job is to monitor, extract, and analyze error log entries from Microsoft SQL Server instances on AWS.
- Why: Monitoring error logs is crucial for maintaining the uptime and availability of databases by quickly identifying and responding to issues. This job specifically targets error log analysis in SQL Server instances hosted on AWS to ensure that any emerging issues are promptly addressed, thus mitigating potential impacts on database availability and performance.
- Manual checking: You can check the error logs manually in the database using the following SQL command:
EXEC rdsadmin.dbo.rds_read_error_log 0, 1;
Detailed Description
- This dbWatch job utilizes the AWS RDS specific stored procedure ‘rdsadmin.dbo.rds_read_error_log’ to read the SQL Server error logs. It filters logs based on text strings defined as errors and excludes those defined as non-errors based on other parameterized text strings.
- The job includes various key configurations like history threshold to maintain the size of error history, the maximum number of lines the job should read per execution, and a timing threshold to ensure prompt execution, thereby preventing long-running job executions.
Job Configuration
- The job is configured to check instances of SQL Server (version greater than 2005) hosted on AWS with specific properties like being an instance having an engine with permission levels appropriate for reading the error logs.
- It runs automatically at a default schedule of every 20 minutes (6, 26, 46 minutes past each hour).
Output Management
- The job outputs include execution history, a count of detected errors, and diagnostic messages that describe each detected issue along with its occurrence time. Errors that do not meet the exclusion criteria are stored in a history table and a test table.
Table: Error Log Output Sample
Error Text |
Allowed Text |
Message |
Date |
error example |
allowed tag |
This is a sample error message |
2023-01-01 12:30:00 |
Performance Metrics
- It includes metrics such as the number of records checked, elapsed time per check, and frequency of error log entries over time.
Execution Procedures
- The dbscripts involved in this job create necessary local temporary and history tables such as error_log_histr_tab, error_log_err_histr_tab, and error_log_aggr_test_tab among others. These tables facilitate the logging and cyclical analysis of errors over different executions.
Automation and Alerts
- The monitoring job can trigger alerts based on the severity and type of error encountered. Parameters such as ‘return status’ might change based on the found errors which can lead to different alert statuses: ALARM, WARNING, or OK.
- The job is also equipped with mechanisms to handle and log exceptions during the execution, ensuring that any job execution issues are captured for troubleshooting.
Dependencies and Cleanup
- Dependencies such as the main procedure ‘error_log_proc’ and associated tables are managed automatically with cleanup options configured to true, ensuring that temporary data used during execution does not persist unnecessarily.
Future Enhancements
- Potential future enhancements may include more granular parameterization of error texts or integration with more advanced AI-based analysis tools for predictive error handling and automated error resolutions.