SQL Server Assessment Script

I finally get to cross an item off my back burner project list.  Last February, I was at a client assessing their entire SQL Server database footprint for migration to the cloud.  Not having done this kind of work before, I quickly cobbled together scripts that gave me a basic overview of each database.  I then combined the results into a spreadsheet.

Wouldn’t it have been nice to get this information in a single script?  You could use this script to not only get a better understanding of resource usage in your SQL Server instance, but also use it to determine needed resources for migrating to either other hardware or the cloud.

I have already produced a lengthy health check script (here) that gives quite a bit of information.  It did not, however, take measurements of CPU usage and IOPS values over a period of time.  My new SQL Server assessment script not only gathers a smaller but more critical set of information than my health check script, but also takes hourly snapshots of CPU usage and IOPS values including individual read and write values over the course of one day.  Execute this script during a normal working day or during a period of end-of-month processing, if it exists.

Here is the script in its latest form.  To adjust the CPU and IOPS snapshots taken and the interval between snapshots, edit these two lines:

while (@cntr) < 24 – This is the number of snapshots to take before results are produced.

waitfor delay ’01:00:00′  –  This is the time delay between snapshots.  It is currently set to one hour.

--sql_server_assessment.sql
 --Lists the resources and their usage for a single SQL Server instance
 --Intended to act as a tool for migrations to other hardware or the cloud.

use master
 GO
 ----------------------------------------------------------Node name

SELECT SERVERPROPERTY('ComputerNamePhysicalNetBIOS') AS [CurrentNodeName];
 ----------------------------------------------------------Product Version, level, and Edition

SELECT
 CASE
 WHEN CONVERT(VARCHAR(128), SERVERPROPERTY ('productversion')) like '8%' THEN 'SQL2000'
 WHEN CONVERT(VARCHAR(128), SERVERPROPERTY ('productversion')) like '9%' THEN 'SQL2005'
 WHEN CONVERT(VARCHAR(128), SERVERPROPERTY ('productversion')) like '10.0%' THEN 'SQL2008'
 WHEN CONVERT(VARCHAR(128), SERVERPROPERTY ('productversion')) like '10.5%' THEN 'SQL2008 R2'
 WHEN CONVERT(VARCHAR(128), SERVERPROPERTY ('productversion')) like '11%' THEN 'SQL2012'
 WHEN CONVERT(VARCHAR(128), SERVERPROPERTY ('productversion')) like '12%' THEN 'SQL2014'
 WHEN CONVERT(VARCHAR(128), SERVERPROPERTY ('productversion')) like '13%' THEN 'SQL2016'
 WHEN CONVERT(VARCHAR(128), SERVERPROPERTY ('productversion')) like '14%' THEN 'SQL2017'
 ELSE 'unknown'
 END AS MajorVersion,
 SERVERPROPERTY('ProductLevel') AS ProductLevel,
 SERVERPROPERTY('Edition') AS Edition,
 SERVERPROPERTY('ProductVersion') AS ProductVersion

----------------------------------------------------------License information

print N'License information';

SELECT SERVERPROPERTY('LicenseType') AS LICENSE_TYPE,
 ISNULL(SERVERPROPERTY('NumLicenses'),0) AS NUM_LICENCES;
 ----------------------------------------------------------Database listing

print N'Database list with Status and Recovery Model';
 ----------------------------------------------------------Logical CPUs
 select cpu_count as logical_cpu_count
 from sys.dm_os_sys_info;
 ----------------------------------------------------------Physical Memory

--For SQL Server version 2008, 2008R2
 select physical_memory_in_bytes/1024/1024/1024 as GB
 from sys.dm_os_sys_info;

--For SQL Server version > 2008
 select physical_memory_kb/1024/1024 as GB
 from sys.dm_os_sys_info;

----------------------------------------------------------Memory usage per database

print N'Memory Usage per User Database';
 SELECT
 substring(DB_NAME(database_id),1,40) AS [Database Name]
 ,COUNT(*) * 8/1024.0 AS [Cached Size (MB)]
 FROM
 sys.dm_os_buffer_descriptors
 WHERE
 database_id > 4
 AND database_id <> 32767
 AND db_name(database_id) <> 'SSISDB'
 GROUP BY DB_NAME(database_id)
 ORDER BY [Cached Size (MB)] DESC OPTION (RECOMPILE);

----------------------------------------------------------Database file size and growth settings

print N'Size and Growth';
 select substring(b.name,1,40) AS DB_Name, substring(a.name,1,40) AS Logical_name,
 substring(a.filename,1,100) AS File_Name,
 cast((a.size * 8.00) / 1024 as numeric(12,2)) as DB_Size_in_MB,
 case when a.growth > 100 then 'In MB' else 'In Percentage' end File_Growth,
 cast(case when a.growth > 100 then (a.growth * 8.00) / 1024
 else (((a.size * a.growth) / 100) * 8.00) / 1024
 end as numeric(12,2)) File_Growth_Size_in_MB,
 case when ( maxsize = -1 or maxsize=268435456 ) then 'AutoGrowth Not Restricted' else 'AutoGrowth Restricted' end AutoGrowth_Status
 from sysaltfiles a
 join sysdatabases b on a.dbid = b.dbid
 where DATABASEPROPERTYEX(b.name, 'status') = 'ONLINE'
 order by b.name;

----------------------------------------------------------Database data and log file total size
 SELECT
 SUM(CASE WHEN type = 0 THEN MF.size * 8 / 1024.0 /1024.0 ELSE 0 END) +
 SUM(CASE WHEN type = 1 THEN MF.size * 8 / 1024.0 /1024.0 ELSE 0 END) AS Total_GB
 FROM
 sys.master_files MF
 JOIN sys.databases DB ON DB.database_id = MF.database_id
 WHERE DB.source_database_id is null;
 ----------------------------------------------------------License feature usage

IF OBJECT_ID('tempdb.dbo.##enterprise_features') IS NOT NULL
 DROP TABLE ##enterprise_features

CREATE TABLE ##enterprise_features
 (
 dbname SYSNAME,
 feature_name VARCHAR(100),
 feature_id INT
 )

EXEC sp_msforeachdb
 N' USE [?]
 IF (SELECT COUNT(*) FROM sys.dm_db_persisted_sku_features) >0
 BEGIN
 INSERT INTO ##enterprise_features
 SELECT dbname=DB_NAME(),feature_name,feature_id
 FROM sys.dm_db_persisted_sku_features
 END '
 SELECT *
 FROM ##enterprise_features;

----------------------------------------------------------CPU and IOPS for 24 hours (avg. work day or end of month)

create table #CPU (snap_time varchar(30),
 row_num int,
 db_name varchar(50),
 cpu_ms int,
 cpu_pct int)

create table #IOPS (snap_time varchar(30),
 db_id int,
 db_name varchar(50),
 reads int,
 writes int,
 total_io int)

declare @cntr int
 set @cntr = 0
 while (@cntr) < 24 --set number of 1-hour loops to perform here.
 begin;

WITH DB_CPU_Stats
 AS
 (
 SELECT convert(varchar, getdate(), 120) AS ts,
 DatabaseID, DB_Name(DatabaseID) AS [DatabaseName],
 SUM(total_worker_time) AS [CPU_Time_Ms]
 FROM sys.dm_exec_query_stats AS qs
 CROSS APPLY (
 SELECT CONVERT(int, value) AS [DatabaseID]
 FROM sys.dm_exec_plan_attributes(qs.plan_handle)
 WHERE attribute = N'dbid') AS F_DB
 GROUP BY DatabaseID
 )
 insert into #CPU (row_num,snap_time,db_name,cpu_ms,cpu_pct)
 SELECT ROW_NUMBER() OVER(ORDER BY [CPU_Time_Ms] DESC) AS [row_num],
 ts, DatabaseName,
 [CPU_Time_Ms],
 CAST([CPU_Time_Ms] * 1.0 / SUM([CPU_Time_Ms]) OVER() * 100.0 AS DECIMAL(5, 2)) AS [CPUPercent]
 FROM DB_CPU_Stats
 --WHERE DatabaseID > 4 -- system databases
 --AND DatabaseID <> 32767 -- ResourceDB
 ORDER BY row_num OPTION (RECOMPILE);

WITH D
 AS(
 SELECT
 convert(varchar, getdate(), 120) AS ts,
 database_id,DB_NAME(database_id) AS Name,
 SUM(num_of_reads) AS IO_Reads,
 SUM(num_of_writes) AS IO_Writes,
 SUM(num_of_reads + num_of_writes) AS Total_IO
 FROM sys.dm_io_virtual_file_stats(NULL, NULL)
 GROUP BY database_id,DB_NAME(database_id))
 insert into #IOPS (snap_time,db_id,db_name,reads,writes,total_io)
 SELECT
 D.ts, D.database_id, D.Name,
 SUM(D.IO_Reads) - SUM(L.num_of_reads) AS Data_Reads,
 SUM(D.IO_Writes) - SUM(L.num_of_writes) AS Data_Writes,
 SUM(D.IO_Reads+D.IO_Writes) - SUM(L.num_of_reads+L.num_of_writes) AS Total_IO
 FROM D
 JOIN sys.dm_io_virtual_file_stats(NULL, 2) L
 ON D.database_id = L.database_id
 GROUP BY D.ts,D.database_id, D.Name
 ORDER BY D.database_id;

waitfor delay '01:00:00' --change this to determine the interval between snapshots
 set @cntr = @cntr + 1

end

select db_name, avg(cpu_pct) as AVG_CPU
 from #CPU
 where db_name is not null
 group by db_name
 order by db_name;

select DB_NAME, snap_time, MAX(cpu_pct) as MAX_CPU
 from #CPU
 group by db_name, snap_time
 having MAX(cpu_pct) > 0
 order by MAX(cpu_pct) desc

select db_name, min(total_io) AS MIN_IO, max(total_io) AS MAX_IO, avg(total_io) AS AVG_IO
 from #IOPS
 group by db_name
 order by db_name;

select snap_time, db_name, reads, writes, total_io
 from #IOPS
 where total_io = (select max(total_io) from #IOPS);

drop table #CPU;

drop table #IOPS;
Advertisements

Keep your Oracle binaries!

I am a pack rat.  I admit it.  I rarely throw away old software or IT books.  I still have Oracle version 6 books.  The last time I got rid of software was when I threw out 3 boxes of 3.5″ floppy drives.  I still have old versions of Windows and other software on CD, though.  Not DVD, but CD.  Anyone need a copy of Windows NT 3.51?

My point is that we typically don’t think about preserving Oracle installation media files after the installation is concluded.  More often they are removed to reclaim space in a staging area with the thought that we can always download them again.

After the installation, flash forward to the point where upgrades have not occurred and the hardware is starting to show its age.  Then, just before the work day begins, a failure on a legacy storage appliance causes all the production databases to go down.

If you are lucky, there is a disaster recovery plan in place.  You are even luckier if it has been tested.  At the end of the failure, there are going to be lessons learned and plan revisions.

I recently experienced just such an occurrence at a client site.  A legacy storage appliance did fail, taking the production databases with it.  You are probably thinking that something like this very rarely, if ever, happens but I have seen network storage appliance failure of this kind twice within the last five years at different places and with different storage appliance makers.

After the initial smoke cleared, three options for recovery were available.  In order of preference, they were recover the storage appliance, rebuild servers and databases and recover from backups, and fail over to a disaster recovery site.

While the first option was in progress, it was discovered that the disaster recovery site option would only bring designated critical systems back online, but not every production database.  It was further discovered that this option would only allow fail over and not fail back and not allow running systems from sites other than the DR site.  As such, this option was abandoned.

This left the option of rebuilding servers and databases and recovering from backups while the storage appliance was being worked on.  Unfortunately, a recent acquisition had brought many databases running on unsupported versions of Oracle into the environment.  Now there was a scramble to secure the installation media for these versions.

Installation media is available through Oracle Technology Network (https://www.oracle.com/technetwork) or Oracle Software Delivery Cloud (https://edelivery.oracle.com).  However, as of the writing of this blog, versions 11g release 1 and 10g release 2 are no longer available for direct download.  You can request previous versions of the installation media using the instructions in My Oracle Support (MOS) note 1071023.1.  Oracle provides this statement (links included) on the Oracle Technology Network site when viewing Oracle Database software downloads:

Oracle Database 10.2 and 11.1 are no longer available for download. The software is available as a media or FTP request for those customers who own a valid Oracle Database product license for any edition. To request access to these releases, follow the instructions in Oracle Support Document 1071023.1 (Requesting Physical Shipment or Download URL for Software Media) from My Oracle Support. NOTE: for Oracle Database 10.2, you should request 10.2.0.1 even if you want to install a later patch set. Once you install 10.2.0.1 you can then apply any 10.2 patch set. Similarly, for 11.1 request 11.1.0.6 which must be applied before installing 11.1.0.7. Patch sets can be downloaded from the Patches and Updates tab on My Oracle Support.

In most of the recent client sites I have been to, virtual machines are the standard, replacing individual servers.  These VMs are hosted on large appliances that allow near instantaneous recovery in the event of a failure.  In non-virtual environments, I have seen backups performed at the OS level, where the Oracle binaries are typically installed.  In either of these cases, it may not be necessary to keep copies of the Oracle installation media.

My recommendation is to keep not only the installation media of every Oracle software product in your environment, but also keep copies of all patches applied.  With storage as cheap and accessible as it is now, there is no  real reason not to.  You never know when you will need it again.

SQL Server Health Check v3

My health check script continues to evolve.

–sql_server_health_check_v3.sql
–v3
–This is a non-intrusive script written to give an overall informational view of a
–SQL Server 2005-2016(CTP3) server instance for the purpose of first-time review.

–Version 3 additions
–User statistics identification
–License information
–Database data and log size with total

/*
Information gathered
-Node name
-License information
-Product Version, level, and Edition
-Instance parameters
-Start time and up time
-Database listing
-Database uptime
-Database data and log size with total
-Database file size and growth settings
-Is this a Cluster Node?
-Nodes in Cluster
-Is AlwaysOn enabled (2012 and above)?
-AlwaysOn status
-Memory usage per database
-Memory usage of in-memory OLTP tables
-Last backup time per database
-No log backups for FULL or BULK_LOGGED recovery model databases in last 30 days
-Databases with no backups at all in the last 30 says
-Backups for the previous week per database
-Jobs that failed in the last 24 hours
-Missing indexes
-Duplicate indexes
-High index fragmentation check
-Wait stats
-Users and roles
-Job information
-Existing linked server listing
-User statistics identification

*/
use master
GO
———————————————————-Node name

SELECT SERVERPROPERTY(‘ComputerNamePhysicalNetBIOS’) AS [CurrentNodeName];
———————————————————-Product Version, level, and Edition

SELECT SERVERPROPERTY(‘productversion’) AS ProductVersion,
SERVERPROPERTY (‘productlevel’) AS ProductLevel
SERVERPROPERTY (‘edition’) AS Edition;

———————————————————-License information

print N’License information’;

SELECT SERVERPROPERTY(‘LicenseType’) AS LICENSE_TYPE,
ISNULL(SERVERPROPERTY(‘NumLicenses’),0) AS NUM_LICENCES;

———————————————————-Instance parameters

print N’Instance Parameter’;
SELECT *
FROM sys.configurations
ORDER BY name ;
———————————————————-Database listing

print N’Database list with Status and Recovery Model’;

———————————————————-Database startup time

print N’Start time’;

SELECT DATEADD(ms,-sample_ms,GETDATE() )AS StartTime
FROM sys.dm_io_virtual_file_stats(1,1);
———————————————————-Database start time uptime

print N’Up time’;
DECLARE @server_start_time DATETIME,
@seconds_diff INT,
@years_online INT,
@days_online INT,
@hours_online INT,
@minutes_online INT,
@seconds_online INT ;

SELECT @server_start_time = login_time
FROM master.sys.sysprocesses
WHERE spid = 1 ;

SELECT @seconds_diff = DATEDIFF(SECOND, @server_start_time, GETDATE()),
@years_online = @seconds_diff / 31536000,
@seconds_diff = @seconds_diff % 31536000,
@days_online = @seconds_diff / 86400,
@seconds_diff = @seconds_diff % 86400,
@hours_online = @seconds_diff / 3600,
@seconds_diff = @seconds_diff % 3600,
@minutes_online = @seconds_diff / 60,
@seconds_online = @seconds_diff % 60 ;

SELECT @server_start_time AS server_start_time,
@years_online AS years_online,
@days_online AS days_online,
@hours_online AS hours_online,
@minutes_online AS minutes_online,
@seconds_online AS seconds_online ;
SELECT substring(name,1,40) AS name, substring(state_desc,1,20) AS STATE,
substring(recovery_model_desc,1,20) AS RECOVERY_MODEL
FROM sys.databases
order by name;

———————————————————-Database data and log size with total

print N’Data and Log Size with Total’;
with fs
as
(
select database_id, type, size * 8.0 / 1024 size
from sys.master_files
)
select
name,
(select sum(size) from fs where type = 0 and fs.database_id = db.database_id) DataFileSizeMB,
(select sum(size) from fs where type = 1 and fs.database_id = db.database_id) LogFileSizeMB,
(select sum(size) from fs where type = 0 and fs.database_id = db.database_id) +
(select sum(size) from fs where type = 1 and fs.database_id = db.database_id) AS Total_MB
from sys.databases db;

–Grand total of ALL data and log files as one value
select SUM(size*8.0)/1024 AS Total_MB
from sys.master_files;
———————————————————-Database file size and growth settings

print N’Size and Growth’;
select substring(b.name,1,40) AS DB_Name, substring(a.name,1,40) AS Logical_name,
substring(a.filename,1,100) AS File_Name,
cast((a.size * 8.00) / 1024 as numeric(12,2)) as DB_Size_in_MB,
case when a.growth > 100 then ‘In MB’ else ‘In Percentage’ end File_Growth,
cast(case when a.growth > 100 then (a.growth * 8.00) / 1024
else (((a.size * a.growth) / 100) * 8.00) / 1024
end as numeric(12,2)) File_Growth_Size_in_MB,
case when ( maxsize = -1 or maxsize=268435456 ) then ‘AutoGrowth Not Restricted’ else ‘AutoGrowth Restricted’ end AutoGrowth_Status
from sysaltfiles a
join sysdatabases b on a.dbid = b.dbid
where DATABASEPROPERTYEX(b.name, ‘status’) = ‘ONLINE’
order by b.name;
———————————————————-Is this a Cluster Node?

SELECT ‘Clustered’, case when SERVERPROPERTY(‘IsClustered’) = 0 then ‘No’
else ‘Yes’ end;

———————————————————-Nodes in Cluster

print N’Cluster Nodes’;

SELECT * FROM fn_virtualservernodes();
———————————————————-Is AlwaysOn enabled (2012 and above)?

SELECT ‘AlwaysOn’, case when SERVERPROPERTY(‘IsHadrEnabled’) = 0 then ‘No’
when SERVERPROPERTY(‘IsHadrEnabled’) = 1 then ‘Yes’
else SERVERPROPERTY(‘IsHadrEnabled’) end;
———————————————————-AlwaysOn status

declare @c int;
declare @rd nvarchar(60);
declare @osd nvarchar(60);
declare @rhd nvarchar(60);
declare @shd nvarchar(60);
declare @csd nvarchar(60);
select @c = COUNT(name)
from sys.all_objects
where name = ‘dm_hadr_availability_replica_states’;
if @c = 0
print N’No AlwaysOn Status’;
else
select @rd = role_desc, @osd= case when operational_state_desc is null then ‘Replica is not local’
else operational_state_desc end,
@rhd = recovery_health_desc, @shd = synchronization_health_desc,
@csd = connected_state_desc
from sys.dm_hadr_availability_replica_states;
print @rd
print @osd
print @rhd
print @shd
print @csd
———————————————————-Memory usage per database

print N’Memory Usage per User Database’;
SELECT
substring(DB_NAME(database_id),1,40) AS [Database Name]
,COUNT(*) * 8/1024.0 AS [Cached Size (MB)]
FROM
sys.dm_os_buffer_descriptors
WHERE
database_id > 4
AND database_id <> 32767
AND db_name(database_id) <> ‘SSISDB’
GROUP BY DB_NAME(database_id)
ORDER BY [Cached Size (MB)] DESC OPTION (RECOMPILE);

———————————————————-Memory usage of in-memory OLTP tables

print N’In-memory OLTP table usage’;
SELECT object_name(object_id) AS Name, *
FROM sys.dm_db_xtp_table_memory_stats;
———————————————————-Last backup time per database

SELECT substring(sdb.Name,1,40) AS DatabaseName,
COALESCE(CONVERT(VARCHAR(12), MAX(bus.backup_finish_date), 101),’-‘) AS LastBackUpTime
FROM sys.sysdatabases sdb
LEFT OUTER JOIN msdb.dbo.backupset bus ON bus.database_name = sdb.name
where sdb.Name <> ‘tempdb’
GROUP BY sdb.Name;
———————————————————-No log backups for FULL or BULK_LOGGED recovery model databases in last 30 days
print N’Databases with FULL or BULK_LOGGED recovery model and no log backups in last 30 days’;
SELECT name AS at_risk_database
FROM sys.databases
where recovery_model_desc in(‘FULL’,’BULK_LOGGED’)
and name not in(
SELECT
msdb.dbo.backupset.database_name AS DBName
FROM msdb.dbo.backupmediafamily
INNER JOIN msdb.dbo.backupset ON msdb.dbo.backupmediafamily.media_set_id = msdb.dbo.backupset.media_set_id
WHERE (CONVERT(datetime, msdb.dbo.backupset.backup_start_date, 102) >= GETDATE() – 30)
and msdb..backupset.type = ‘L’
group by msdb.dbo.backupset.database_name
);

———————————————————-Databases with no backups at all in the last 30 says

print N’Databases with NO backups in last 30 days’;

SELECT name AS at_risk_database
FROM sys.databases
where name <> ‘tempdb’
and name not in(
SELECT
substring(msdb.dbo.backupset.database_name,1,40) AS DBName
FROM msdb.dbo.backupmediafamily
INNER JOIN msdb.dbo.backupset ON msdb.dbo.backupmediafamily.media_set_id = msdb.dbo.backupset.media_set_id
WHERE (CONVERT(datetime, msdb.dbo.backupset.backup_start_date, 102) >= GETDATE() – 30)
group by msdb.dbo.backupset.database_name
);
———————————————————-Backups for the previous week per database

print N’All backups for previous week’;

SELECT
CONVERT(CHAR(40), SERVERPROPERTY(‘Servername’)) AS Server,
substring(msdb.dbo.backupset.database_name,1,40) AS DBName,
msdb.dbo.backupset.backup_start_date,
msdb.dbo.backupset.backup_finish_date,
msdb.dbo.backupset.expiration_date,
CASE msdb..backupset.type
WHEN ‘D’ THEN ‘Database’
WHEN ‘L’ THEN ‘Log’
WHEN ‘F’ THEN ‘File’
WHEN ‘P’ THEN ‘Partial’
WHEN ‘I’ THEN ‘Differential database’
WHEN ‘G’ THEN ‘Differential file’
WHEN ‘Q’ THEN ‘Differential partial’
WHEN NULL THEN msdb..backupset.type
END AS backup_type,
msdb.dbo.backupset.backup_size,
substring(msdb.dbo.backupmediafamily.logical_device_name,1,50) AS logical_device_name,
substring(msdb.dbo.backupmediafamily.physical_device_name,1,50) AS physical_device_name,
substring(msdb.dbo.backupset.name,1,50) AS backupset_name,
substring(msdb.dbo.backupset.description,1,50) AS description
FROM msdb.dbo.backupmediafamily
INNER JOIN msdb.dbo.backupset ON msdb.dbo.backupmediafamily.media_set_id = msdb.dbo.backupset.media_set_id
WHERE (CONVERT(datetime, msdb.dbo.backupset.backup_start_date, 102) >= GETDATE() – 7)
ORDER BY
msdb.dbo.backupset.database_name,
msdb.dbo.backupset.backup_finish_date;
———————————————————-Jobs that failed in the last 24 hours

print N’Jobs Failing in last 24 hours’;

———————————————————-Variable Declarations
DECLARE @PreviousDate datetime
DECLARE @Year VARCHAR(4)
DECLARE @Month VARCHAR(2)
DECLARE @MonthPre VARCHAR(2)
DECLARE @Day VARCHAR(2)
DECLARE @DayPre VARCHAR(2)
DECLARE @FinalDate INT
———————————————————-Initialize Variables
SET @PreviousDate = DATEADD(dd, -1, GETDATE()) –Last 1 day
SET @Year = DATEPART(yyyy, @PreviousDate)
SELECT @MonthPre = CONVERT(VARCHAR(2), DATEPART(mm, @PreviousDate))
SELECT @Month = RIGHT(CONVERT(VARCHAR, (@MonthPre + 1000000000)),2)
SELECT @DayPre = CONVERT(VARCHAR(2), DATEPART(dd, @PreviousDate))
SELECT @Day = RIGHT(CONVERT(VARCHAR, (@DayPre + 1000000000)),2)
SET @FinalDate = CAST(@Year + @Month + @Day AS INT)
–Final Logic
SELECT substring(j.[name],1,40) AS JOB,
substring(s.step_name,1,40) AS Step,
h.step_id,
substring(h.step_name,1,40) AS Step,
h.run_date,
h.run_time,
h.sql_severity,
substring(h.message,1,100) AS Message,
h.server
FROM msdb.dbo.sysjobhistory h
INNER JOIN msdb.dbo.sysjobs j
ON h.job_id = j.job_id
INNER JOIN msdb.dbo.sysjobsteps s
ON j.job_id = s.job_id AND h.step_id = s.step_id
WHERE h.run_status = 0 –Failure
AND h.run_date > @FinalDate
ORDER BY h.instance_id DESC;

———————————————————-Missing indexes

print N’Missing Indexes’;
SELECT substring(so.name,1,40) AS Name
, (avg_total_user_cost * avg_user_impact) * (user_seeks + user_scans) as Impact
, ddmid.equality_columns
, ddmid.inequality_columns
, ddmid.included_columns
FROM sys.dm_db_missing_index_group_stats AS ddmigs
INNER JOIN sys.dm_db_missing_index_groups AS ddmig
ON ddmigs.group_handle = ddmig.index_group_handle
INNER JOIN sys.dm_db_missing_index_details AS ddmid
ON ddmig.index_handle = ddmid.index_handle
INNER JOIN sys.objects so WITH (nolock)
ON ddmid.object_id = so.object_id
WHERE ddmigs.group_handle IN (
SELECT TOP (5000) group_handle
FROM sys.dm_db_missing_index_group_stats WITH (nolock)
ORDER BY (avg_total_user_cost * avg_user_impact)*(user_seeks+user_scans)DESC);

———————————————————-Duplicate indexes

print N’Duplicate Indexes’;

DECLARE @SCHEMANAME VARCHAR(30);
DECLARE @TABLENAME VARCHAR(127);
WITH ind_list AS(
select o.schema_id, i.object_id, i.index_id,
i.name, i.type_desc,
i.is_unique, i.is_primary_key,
STUFF( (SELECT ‘,’ + tc.name
FROM sys.index_columns ic
JOIN sys.columns tc
ON tc.column_id = ic.column_id AND
tc.object_id = ic.object_id
WHERE ic.object_id = i.object_id AND
ic.index_id = i.index_id
AND ic.is_included_column = 0
ORDER BY ic.index_column_id
FOR XML PATH (”) ),1,1,” ) index_columns,
STUFF( (SELECT ‘,’ + tc.name
FROM sys.index_columns ic
JOIN sys.columns tc
ON tc.column_id = ic.column_id AND
tc.object_id = ic.object_id
WHERE ic.object_id = i.object_id AND
ic.index_id = i.index_id
AND ic.is_included_column = 1
ORDER BY ic.index_column_id
FOR XML PATH (”) ),1,1,” ) include_columns
FROM sys.indexes i
JOIN sys.objects o ON o.object_id = i.object_id
WHERE i.index_id > 0 AND i.type_desc <> ‘XML’
AND object_name(i.object_id) LIKE @TABLENAME
AND i.is_disabled = 0
AND schema_name(o.schema_id) LIKE @SCHEMANAME )
SELECT substring(schema_name(included_indexes.schema_id),1,30) AS owner,
object_name(included_indexes.object_id) table_name,
(SELECT SUM(st.row_count) FROM sys.dm_db_partition_stats st
WHERE st.object_id = included_indexes.object_id
AND st.index_id < 2 ) num_rows,
included_indexes.name included_index_name,
included_indexes.index_columns included_index_columns,
included_indexes.include_columns
included_index_include_columns,
included_indexes.type_desc included_index_type,
included_indexes.is_unique included_index_uniqueness,
included_indexes.is_primary_key included_index_PK,
(SELECT SUM(a.total_pages) * 8 FROM sys.allocation_units a
JOIN sys.partitions p ON a.container_id = p.partition_id
WHERE p.object_id = included_indexes.object_id AND
p.index_id = included_indexes.index_id
) included_index_size_kb,
including_indexes.name including_index_name,
including_indexes.index_columns including_index_columns,
including_indexes.include_columns
including_index_include_columns,
including_indexes.type_desc including_index_type,
including_indexes.is_unique including_index_uniqueness,
including_indexes.is_primary_key including_index_PK,
(SELECT SUM(a.total_pages) * 8 FROM sys.allocation_units a
JOIN sys.partitions p ON a.container_id = p.partition_id
WHERE p.object_id = including_indexes.object_id AND
p.index_id = including_indexes.index_id
) including_index_size_kb
FROM ind_list included_indexes
JOIN ind_list including_indexes
ON including_indexes.object_id = included_indexes.object_id
JOIN sys.partitions ing_p
ON ing_p.object_id = including_indexes.object_id AND
ing_p.index_id = including_indexes.index_id
JOIN sys.allocation_units ing_a
ON ing_a.container_id = ing_p.partition_id
WHERE including_indexes.index_id <> included_indexes.index_id
AND LEN(included_indexes.index_columns) <=
LEN(including_indexes.index_columns)
AND included_indexes.index_columns + ‘,’ =
SUBSTRING(including_indexes.index_columns,1,
LEN(included_indexes.index_columns + ‘,’))
ORDER BY 2 DESC;

———————————————————-High index fragmentation check

print N’Index with HIGH Fragmentation’;

EXEC sp_MSforeachdb ‘
USE [?]
SELECT ”?” AS DB_NAME,
QUOTENAME(sysind.name) AS [index_name],
indstat.*
FROM sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL, NULL, ”LIMITED”)
AS indstat
INNER JOIN sys.indexes sysind ON indstat.object_id = sysind.object_id AND
indstat.index_id = sysind.index_id
where avg_fragmentation_in_percent >= 30
ORDER BY avg_fragmentation_in_percent DESC;

use master
GO
———————————————————-Wait stats

print N’Wait Stats’;

SELECT *
FROM sys.dm_os_wait_stats
where wait_time_ms > 10000
ORDER BY wait_time_ms DESC;
———————————————————-Users and roles

print N’Users and Roles’;

WITH Roles_CTE(Role_Name, Username)
AS
(
SELECT
User_Name(sm.[groupuid]) as [Role_Name],
user_name(sm.[memberuid]) as [Username]
FROM [sys].[sysmembers] sm
)

SELECT
Roles_CTE.Role_Name,
[DatabaseUserName] = princ.[name],
[UserType] = CASE princ.[type]
WHEN ‘S’ THEN ‘SQL User’
WHEN ‘U’ THEN ‘Windows User’
WHEN ‘G’ THEN ‘Windows Group’
WHEN ‘A’ THEN ‘Application Role’
WHEN ‘R’ THEN ‘Database Role’
WHEN ‘C’ THEN ‘User mapped to a certificate’
WHEN ‘K’ THEN ‘User mapped to an asymmetric key’
END
FROM
sys.database_principals princ
JOIN Roles_CTE on Username = princ.name
where princ.type in (‘S’, ‘U’, ‘G’, ‘A’, ‘R’, ‘C’, ‘K’)
ORDER BY princ.name;
———————————————————-Job information

print N’Job Information’;
SELECT [JobName] = [jobs].[name]
,[Category] = [categories].[name]
,[Owner] = SUSER_SNAME([jobs].[owner_sid])
,[Enabled] = CASE [jobs].[enabled] WHEN 1 THEN ‘Yes’ ELSE ‘No’ END
,[Scheduled] = CASE [schedule].[enabled] WHEN 1 THEN ‘Yes’ ELSE ‘No’ END
,[Description] = [jobs].[description]
,[Occurs] =
CASE [schedule].[freq_type]
WHEN 1 THEN ‘Once’
WHEN 4 THEN ‘Daily’
WHEN 8 THEN ‘Weekly’
WHEN 16 THEN ‘Monthly’
WHEN 32 THEN ‘Monthly relative’
WHEN 64 THEN ‘When SQL Server Agent starts’
WHEN 128 THEN ‘Start whenever the CPU(s) become idle’
ELSE ”
END
,[Occurs_detail] =
CASE [schedule].[freq_type]
WHEN 1 THEN ‘O’
WHEN 4 THEN ‘Every ‘ + CONVERT(VARCHAR, [schedule].[freq_interval]) + ‘ day(s)’
WHEN 8 THEN ‘Every ‘ + CONVERT(VARCHAR, [schedule].[freq_recurrence_factor]) + ‘ weeks(s) on ‘ +
LEFT(
CASE WHEN [schedule].[freq_interval] & 1 = 1 THEN ‘Sunday, ‘ ELSE ” END +
CASE WHEN [schedule].[freq_interval] & 2 = 2 THEN ‘Monday, ‘ ELSE ” END +
CASE WHEN [schedule].[freq_interval] & 4 = 4 THEN ‘Tuesday, ‘ ELSE ” END +
CASE WHEN [schedule].[freq_interval] & 8 = 8 THEN ‘Wednesday, ‘ ELSE ” END +
CASE WHEN [schedule].[freq_interval] & 16 = 16 THEN ‘Thursday, ‘ ELSE ” END +
CASE WHEN [schedule].[freq_interval] & 32 = 32 THEN ‘Friday, ‘ ELSE ” END +
CASE WHEN [schedule].[freq_interval] & 64 = 64 THEN ‘Saturday, ‘ ELSE ” END ,
LEN(
CASE WHEN [schedule].[freq_interval] & 1 = 1 THEN ‘Sunday, ‘ ELSE ” END +
CASE WHEN [schedule].[freq_interval] & 2 = 2 THEN ‘Monday, ‘ ELSE ” END +
CASE WHEN [schedule].[freq_interval] & 4 = 4 THEN ‘Tuesday, ‘ ELSE ” END +
CASE WHEN [schedule].[freq_interval] & 8 = 8 THEN ‘Wednesday, ‘ ELSE ” END +
CASE WHEN [schedule].[freq_interval] & 16 = 16 THEN ‘Thursday, ‘ ELSE ” END +
CASE WHEN [schedule].[freq_interval] & 32 = 32 THEN ‘Friday, ‘ ELSE ” END +
CASE WHEN [schedule].[freq_interval] & 64 = 64 THEN ‘Saturday, ‘ ELSE ” END
) – 1
)
WHEN 16 THEN ‘Day ‘ + CONVERT(VARCHAR, [schedule].[freq_interval]) + ‘ of every ‘ + CONVERT(VARCHAR, [schedule].[freq_recurrence_factor]) + ‘ month(s)’
WHEN 32 THEN ‘The ‘ +
CASE [schedule].[freq_relative_interval]
WHEN 1 THEN ‘First’
WHEN 2 THEN ‘Second’
WHEN 4 THEN ‘Third’
WHEN 8 THEN ‘Fourth’
WHEN 16 THEN ‘Last’
END +
CASE [schedule].[freq_interval]
WHEN 1 THEN ‘ Sunday’
WHEN 2 THEN ‘ Monday’
WHEN 3 THEN ‘ Tuesday’
WHEN 4 THEN ‘ Wednesday’
WHEN 5 THEN ‘ Thursday’
WHEN 6 THEN ‘ Friday’
WHEN 7 THEN ‘ Saturday’
WHEN 8 THEN ‘ Day’
WHEN 9 THEN ‘ Weekday’
WHEN 10 THEN ‘ Weekend Day’
END + ‘ of every ‘ + CONVERT(VARCHAR, [schedule].[freq_recurrence_factor]) + ‘ month(s)’
ELSE ”
END
,[Frequency] =
CASE [schedule].[freq_subday_type]
WHEN 1 THEN ‘Occurs once at ‘ +
STUFF(STUFF(RIGHT(‘000000’ + CONVERT(VARCHAR(8), [schedule].[active_start_time]), 6), 5, 0, ‘:’), 3, 0, ‘:’)
WHEN 2 THEN ‘Occurs every ‘ +
CONVERT(VARCHAR, [schedule].[freq_subday_interval]) + ‘ Seconds(s) between ‘ +
STUFF(STUFF(RIGHT(‘000000’ + CONVERT(VARCHAR(8), [schedule].[active_start_time]), 6), 5, 0, ‘:’), 3, 0, ‘:’) + ‘ and ‘ +
STUFF(STUFF(RIGHT(‘000000’ + CONVERT(VARCHAR(8), [schedule].[active_end_time]), 6), 5, 0, ‘:’), 3, 0, ‘:’)
WHEN 4 THEN ‘Occurs every ‘ +
CONVERT(VARCHAR, [schedule].[freq_subday_interval]) + ‘ Minute(s) between ‘ +
STUFF(STUFF(RIGHT(‘000000’ + CONVERT(VARCHAR(8), [schedule].[active_start_time]), 6), 5, 0, ‘:’), 3, 0, ‘:’) + ‘ and ‘ +
STUFF(STUFF(RIGHT(‘000000’ + CONVERT(VARCHAR(8), [schedule].[active_end_time]), 6), 5, 0, ‘:’), 3, 0, ‘:’)
WHEN 8 THEN ‘Occurs every ‘ +
CONVERT(VARCHAR, [schedule].[freq_subday_interval]) + ‘ Hour(s) between ‘ +
STUFF(STUFF(RIGHT(‘000000’ + CONVERT(VARCHAR(8), [schedule].[active_start_time]), 6), 5, 0, ‘:’), 3, 0, ‘:’) + ‘ and ‘ +
STUFF(STUFF(RIGHT(‘000000’ + CONVERT(VARCHAR(8), [schedule].[active_end_time]), 6), 5, 0, ‘:’), 3, 0, ‘:’)
ELSE ”
END
,[AvgDurationInSec] = CONVERT(DECIMAL(10, 2), [jobhistory].[AvgDuration])
,[Next_Run_Date] =
CASE [jobschedule].[next_run_date]
WHEN 0 THEN CONVERT(DATETIME, ‘1900/1/1’)
ELSE CONVERT(DATETIME, CONVERT(CHAR(8), [jobschedule].[next_run_date], 112) + ‘ ‘ +
STUFF(STUFF(RIGHT(‘000000’ + CONVERT(VARCHAR(8), [jobschedule].[next_run_time]), 6), 5, 0, ‘:’), 3, 0, ‘:’))
END
FROM [msdb].[dbo].[sysjobs] AS [jobs] WITh(NOLOCK)
LEFT OUTER JOIN [msdb].[dbo].[sysjobschedules] AS [jobschedule] WITh(NOLOCK)
ON [jobs].[job_id] = [jobschedule].[job_id]
LEFT OUTER JOIN [msdb].[dbo].[sysschedules] AS [schedule] WITh(NOLOCK)
ON [jobschedule].[schedule_id] = [schedule].[schedule_id]
INNER JOIN [msdb].[dbo].[syscategories] [categories] WITh(NOLOCK)
ON [jobs].[category_id] = [categories].[category_id]
LEFT OUTER JOIN
( SELECT [job_id], [AvgDuration] = (SUM((([run_duration] / 10000 * 3600) +
(([run_duration] % 10000) / 100 * 60) +
([run_duration] % 10000) % 100)) * 1.0) / COUNT([job_id])
FROM [msdb].[dbo].[sysjobhistory] WITh(NOLOCK)
WHERE [step_id] = 0
GROUP BY [job_id]
) AS [jobhistory]
ON [jobhistory].[job_id] = [jobs].[job_id];
———————————————————-Existing linked server listing
print N’Linked Server Information’;
declare @x int;
select @x = COUNT(name)
from sys.all_objects
where name = ‘Servers’;
if @x <> 0
SELECT *
–c.name, provider, data_source, is_remote_login_enabled, b.modify_date
FROM sys.Servers a
LEFT OUTER JOIN sys.linked_logins b ON b.server_id = a.server_id
LEFT OUTER JOIN sys.server_principals c ON c.principal_id = b.local_principal_id
where a.server_id <> 0;
else
exec sp_linkedservers;
———————————————————-User statistics identification

print N’User statistics’;

select s.name as STATS_NAME, SCHEMA_NAME(ob.Schema_id) AS SCHEMA_NAME, OBJECT_NAME(s.object_id) AS TABLE_NAME
FROM sys.stats s
INNER JOIN sys.Objects ob ON ob.Object_id = s.object_id
WHERE SCHEMA_NAME(ob.Schema_id) <> ‘sys’
AND Auto_Created = 0 AND User_Created = 1;

Evaluating a SQL Server environment

This is a topic with more than one meaning, so I will narrow the focus to the aspect of just looking at an existing SQL Server environment for the first time. If you are experiencing performance issues with an existing environment, I recommend you start with the Microsoft (MS) SQL Server performance monitor to concentrate your effort instead of looking at the overall SQL Server. This is especially true if there are only a few problematic processes.

It is good to start asking basic questions for better understanding of the environment to evaluate. A short checklist of the items to inquire about before proceeding includes:

  • Is this a 24×7 environment?
  • What are the peak times of usage and critical availability?
  • Are there any current performance concerns?
  • What is the current backup configuration?
  • When was the last time a recovery was performed? Practice recovery? Cloning using backups?

This information can help you determine the best approach for your evaluation like the best times to execute any of the more resource-intensive examinations. There are many tools to choose from when evaluating a SQL Server environment. Some of these are MS native tools and some are third-party. As a precaution, verify the presence of a recent backup or even execute a one-off full backup before evaluation.

The most recognized MS built-in tool for evaluation is DBCC CHECKDB which is included with every SQL Server version. DBCC CHECKDB is a Transact SQL based utility that checks the logical and physical integrity of all the objects in the specified database. This process also calls other DBCC programs and performs these steps while executing:

  • Runs DBCC CHECKALLOC on the database to check the consistency of disk space allocation structures.
  • Runs DBCC CHECKTABLE on every table and view in the database to check the integrity of all the pages and structures that make table or indexed views.
  • Runs DBCC CHECKCATALOG on the database to check for catalog consistency within the database.
  • Validates the contents of every indexed view in the database.
  • Validates link-level consistency between table metadata and file system directories and files when storing varbinary(max) data in the file system using FILESTREAM.
  • Validates the Service Broker data in the database.

The DBCC CHECKDB utility includes options to repair issues it finds, but I do not recommend the use of these options until any existing issues have been identified and understood. All of these checks and validations can take a long time to execute against a sizeable database, so there are options to exclude one or more of the above steps for time and resource considerations. MS SQL Server versions 2014 through 2017 even offer the ability to execute this utility with a parallel degree. Consult the MS DBCC CHECKDB documentation for the version you are evaluating for more specific information.

Basic execution (from a SQL Server Management Studio prompt):

use <db_name>

go

dbcc checkdb

go

 

Note – For larger database executions, use the “Results To” option under the Query menu item in SQL Server Management Studio to spool the results for later review.

Brent Ozar, a well-regarded SQL Server expert, created a tool called the First Responder Kit, or sp_Blitz for short.   It is available for free from https://www.brentozar.com/first-aid/ and provides a list of its findings prioritized for immediate action or informational purposes. The lower the number a finding is given, the higher the priority. This tool installs required stored procedures and an output table in the database. However, it is free and you can use parameters at execution to avoid checking certain objects or ignore findings over a certain priority level.

Basic execution (from a SQL Server Management Studio prompt):

After downloading the First Responder’s Kit compressed archive and placing it in an accessible location, execute the sp_blitz.sql file to install the database objects. Then, from a SQL Server command prompt, execute:

sp_blitz

go

Note – By default, this script examines ALL databases present on the current SQL Server and groups the findings by order of priority, sorting by database name.

My own SQL Server Health Check script (https://wordpress.com/post/dbajonblog.wordpress.com/178), which is also free, may not have the maturity of some other tools, but it does not require any objects to be installed and provides a good overall picture of a database environment. It is a single script that can be executed the same as a standard SQL script.

 

Backups triggered automatically

Database availability is the number one goal of the database administrator.  Backup and recovery should be the second goal.  An internet search on establishing database backup and recovery processes will produce many examples.  I won’t go into my personal preferences, just emphasize the points I think are important:

  • Backup and recovery processes MUST be documented for reference by personnel responsible for maintenance and execution. I find that the lack of backup and recovery documentation is fairly common out in the world. No DBA likes to try to do an undocumented recovery at 3AM.
  • Backups processes that do not include a check for successful completion are INVALID. If the backup has been completing with an error for a month then guess what, you last valid backup was a month ago and if you had to do a recovery today, you just lost a month’s worth of data.
  • Recovery processes that have not been tested by performing a practice recovery or a clone are INVALID. You may have a proven backup method, but it has not been proven until you do a recovery in your own environment.
  • If there is available time and resources, add an export to your backup process. Exports are good for granular recovery of metadata and objects such as indexes and PL/SQL code.

That said, even solid backup processes can get blind-sided by unexpected activity in the database that produces vastly more redo activity then is normally seen.  It only takes one instance of the dreaded “ORA-00257: archiver error” to send personnel from support through management looking for or demanding answers and remediation steps.

The work around for these situations is usually increasing either the amount of space in the archived log destination or increasing the frequency of backups.  While either of these are fairly easy to implement, they each have areas of concern that should be considered prior to implementation.

Increasing the amount of space in the archived log location could be costly based on amount and type of storage used.  Also, if the surge of redo is not a common occurrence, all the additional space will be unused, essentially wasted.  Increasing the backup frequency may not cost in terms of additional storage, but in database resources needed to perform the additional backups, especially when they occur during normal business hours.

When a database is configured to use the fast recovery area (FRA) for backups and flashback, Oracle automatically monitors and alerts on its space usage.  Oracle 11.2 and 12.2 documentation both have the same information on these alerts:

The database issues a warning alert when reclaimable space is 
less than 15% and a critical alert when reclaimable space is 
less than 3%. To warn the DBA of this condition, an entry is 
added to the alert log and to the DBA_OUTSTANDING_ALERTS table 
(used by Enterprise Manager). Nevertheless, the database 
continues to consume space in the fast recovery area until 
there is no reclaimable space left.

When the recovery area is completely full, the error displayed 
is as follows, where nnnnn is the number of bytes required and 
mmmmm is the disk quota:

ORA-19809: limit exceeded for recovery files ORA-19804: cannot 
reclaim nnnnn bytes disk space from mmmmm limit

From <https://docs.oracle.com/cd/E11882_01/backup.112/e10642/rcmmaint.htm>
The database issues a warning alert when reclaimable space is 
less than 15% and a critical alert when reclaimable space is 
less than 3%. To warn the DBA of this condition, an entry is 
added to the alert log and to the DBA_OUTSTANDING_ALERTS table 
(used by Enterprise Manager). Nevertheless, the database 
continues to consume space in the fast recovery area until 
there is no reclaimable space left.

When the recovery area is completely full, the error displayed 
is as follows, where nnnnn is the number of bytes required and 
mmmmm is the disk quota:

ORA-19809: limit exceeded for recovery files ORA-19804: cannot 
reclaim nnnnn bytes disk space from mmmmm limit

From <https://docs.oracle.com/en/database/oracle/oracle-database/12.2/bradv/maintaining-rman-backups.html>

A practical database administrator should already be monitoring the alert log for errors and warnings.  Why not configure these monitoring processes to automatically trigger backups if this error is encountered?  One of my favorite alert log monitor scripts sends an email when an ORA- error is found in the alert log and sends a page or text message to designated parties if the error involves some type of corruption:

#

ERRMSG="Missing command line parameters"

if [ "$1" ]; then
        export ORACLE_SID=$1
else
        echo $ERRMSG
        exit 1
fi

if [ "$2" ]; then
        export ALERT_LOG_DIR=$2
else
        echo $ERRMSG
        exit 1
fi

            LOGFILE="${ALERT_LOG_DIR}/alert_${ORACLE_SID}.log"
            ERRFILE="${ALERT_LOG_DIR}/alert_error.log"
            CHECK_TIMEB="`date +%m/%d/%y-%H:%M`"
            SAVFILE="${ALERT_LOG_DIR}/alert_${ORACLE_SID}_`date '+%y%m%d-%H%M'`.log"
            GROUPID="SAPADMIN1.BASIS_ON_CALL"
            RECIP="youremail@sendwordnow.com"
            echo "  "
            echo "**"
            echo "**** $ORACLE_SID"
            echo "**"
            echo " "
            if [ -f $LOGFILE ]; then
               if [ `egrep -i "ORA-|corrupt" "${LOGFILE}" | wc -l` -eq 0 ]; then
                  echo "No errors found in alert log "
               else
                  egrep -i "ORA-|corrupt" $LOGFILE > $ERRFILE
                  echo " " >> $ERRFILE
                  echo "  Found the above error(s) in the alert log for ${ORACLE_SID}" >> $ERRFILE
                  echo "  Alert Log with error(s) saved as ${SAVFILE}" >> $ERRFILE
                  echo " " >> $ERRFILE
                  echo " " >> $ERRFILE
                  mailx -s "ALERT-Errors found in ${ORACLE_SID} Alert Log" youremail@yourcompany.com < $ERRFILE
                  if [ `egrep -i "corrupt" "${ERRFILE}" | wc -l` -gt 0 ]; then
                    mailx -s "ALERT - CORRUPTION DETECTED IN ${ORACLE_SID} ALERT LOG" YourPager@yourcompany.com < $ERRFILE
                  fi
                  mv $LOGFILE $SAVFILE
               fi
            else
               echo "No alert log found for ${ORACLE_SID} "
            fi
exit

You could easily add a line to call a separate backup script from this script.  I usually set my alert log monitor to execute once an hour, but increasing the frequency of execution will help catch redo space problems and not cost in terms of resource or space usage.  Also,  having an automatically-triggered backup will save time diagnosing the issue when an archiver ‘out of space’ problem occurs.

The drawback here is that you must have FRA enabled to utilize the alert functionality.  What if usage of the FRA is not preferred?  You could use the alert log monitor script above to trigger a backup when an archiver error is seen, but that would mean the database is already in the process of halting activity when the backup is triggered.  It is better to be pre-emptive in these types of alerts by setting up monitoring to alert of a problem before an alert log message is generated.

For example, here is a snippet of code for a Linux/Unix shell script that determines how much space is available in GB for a certain filesystem.  If that amount of space falls below a certain value, a message is generated:

#!/bin/sh

space_avail="`df -k | grep /archive | awk '{print $4}'`"

if [ `expr $space_avail / 1048576` -lt 40  ]; then
echo Backup needed
fi

You could easily replace the echo command with a call to a backup script.  After testing, this script could be called at an interval of your choosing from the crontab.

If you have your backup location in ASM, you can use this next piece of code to accomplish the same thing by querying the desired diskgroup:

#!/bin/sh

free_mb=`$CRS_HOME/bin/sqlplus -s "/ as sysasm"<<EOF
set heading off
select free_mb
from v\\$asm_diskgroup
where name = 'DATA'
order by name;
exit
EOF`

if [ `expr $free_mb / 1024` -lt 40  ]; then
echo Backup needed
fi

With a few simple steps, you can be proactive about excessive redo activity in your database.

To output in real time or not?

Just yesterday I encountered a wrong conclusion based on a misunderstanding about the use of the dbms_output.put_line function.  The writer of a SQL script that spooled its output to a log file used the function to generate ‘start’ and ‘end’ messages within a PL/SQL block and was confused when the ‘start’ message was not seen in the log file while the main section of the PL/SQL block was executing.  He was under the assumption that calls to this function were carried out immediately.

This is a common misconception regarding dbms_output.put_line.  This example of the same functionality they were attempting reinforces this conclusion.

set serveroutput on

spool ./output.log

begin

  dbms_output.put_line('Start - '||to_char(sysdate,'MM/DD/YYYY HH24:MI:SS'));

  execute immediate 'select * from tab';

  dbms_lock.sleep(30);

  dbms_output.put_line('End - '||to_char(sysdate,'MM/DD/YYYY HH24:MI:SS'));
  
end;
/

spool off

In this example, the spooling of the output is started inside the PL/SQL block.  This code will execute for about 30 seconds because of the dbms_lock.sleep function call.  During execution, the output.log file will be empty until the code finishes, either successfully or unsuccessfully, and ‘spool off’ is executed.  You can verify this with a ‘tail -f’ command in Linux/Unix or the ‘type’ command in Windows.

What happens if you place the ‘start’ and ‘end’ messages outside the PL/SQL block?

set serveroutput on

spool ./output.log

select 'Start - '||to_char(sysdate,'MM/DD/YYYY HH24:MI:SS')
from dual;

begin

  execute immediate 'select * from tab';

  dbms_lock.sleep(30);
  
end;
/

select 'End - '||to_char(sysdate,'MM/DD/YYYY HH24:MI:SS')
from dual;

spool off

Even though you see the ‘start’ message at the onset in your SQL session, the log file will still not be populated until the PL/SQL block execution is finished and the ‘spool off’ command runs.

If you truly want to see the ‘start’ message before ANY code executes, then using OS commands to start spooling a log file and generate the ‘start’ message is probably your only alternative.  Here is an example for a Linux-based system:

echo "Start - $(date)" > ./output.log

This command, executed before the script is called, will create the log file and write a ‘start’ message to the file with the current date and time in the current directory.  This information will show up in the log file immediately.   Next, call the script to be executed:

$ORACLE_HOME/bin/sqlplus -S username/password @script.sql >> ./output.log

This command will start a SQL*Plus session in silent mode (-S), suppressing header and prompt output, call the script.sql script, and send the output to the log file opened earlier.  Using this method, you don’t need to use any ‘spool’ commands in the script.  Notice I used “>>” to redirect the output in this command and not “>” for the command when I opened the log file.  The “>>” redirect appends to an existing file, opening it if it does not exist.  If you use “>” for the redirect, an existing file will be overwritten and you will lose any information previously written to the file.

For a Windows-based system, the command is almost the same:

echo Start - %date% %time% > ./output.log

This command, executed before the script is called, will create the log file and write a ‘start’ message to it with the date and time in the current directory.  This information will show up in the log file immediately.  The “>>” and “>” redirect options even work the same as a Linux/Unix environment.

From here there a few different ways to execute a SQL script from a Windows command prompt.  If this is going to be an ongoing process in Windows, then I recommend writing the scripts in Perl.  It will save time, especially if you have to port the code between Windows and Linux/Unix environments.

One last point.  You should not use this ‘start’ and ‘end’ message method in a production system.  While it is fine for time tracking while testing the script, it should not be relied on outside of a test environment.  Even when these messages are sent to a log file, that file needs to be parsed for error messages at the conclusion of the process.  A better solution is to send the messages to a table used for auditing and monitoring.  Just don’t forget to commit each time you send a message to the table and remember to purge the information from the table based on audit requirements.

 

 

Get in the Penalty Box! (Redux version)

Note – The commands and processes listed have been tested and verified in a Oracle 11g release 2 (11.2.0.4) environment.

Sometimes, there exist database users with no regard for what type of queries they run or how long they run for.  These users typically kick off these ill-behaved queries and then go for coffee, lunch, or leave for the day.  Their query sits out there spinning and consuming resources that you know could be better used elsewhere.  This can be especially problematic in a production system if end-of-the-month aggregation processes or reports are being executed at the same time.

Maybe you have talked to them before about this behavior or even helped them tune a query or two, but they don’t seem to get the message.  In extreme cases, these users even terminate sessions when the query does not come back in a timely manner only to reconnect and retry it in a new session, leaving the old one (possibly) still executing.

It’s time to let these users know that their behavior is not acceptable in an environment where resources are finite and they need to share with other higher priority processes.  They need to spend some time in the penalty box.  For those unfamiliar with this term, it is a term used in ice hockey where misbehaving players have to sit out the game in a segregated area for a certain length of time, depending on the seriousness of their misdeeds.  The team with the misbehaving player is not allowed to send in a substitute player during the penalty time, so it leaves them one player short and vulnerable to be scored against.

The Oracle Resource Manager is the keeper of the database penalty box.  It has the ability to enforce limits on resources such CPU and I/O consumption, as well as execution time limits.  We are going to use this last feature to create our own penalty box.  I realize that most database managers don’t want to put constraints on users, especially developers, but in all honesty, if they are writing queries that take an hour or more to execute on a regular basis, then the managers need to put their foot down and the users or developers may need a remedial SQL tuning course.

A common misconception about resource plans is that they will limit all users in the database.  While it is true that only one resource plan can be active at any time, users can be segregated into consumer groups under the plan and each consumer group can be tailored with the use of directives to make sure that users or applications do not suffer due to the actions of a few users.

Here is a simple process to create a resource plan that will limit execution time to 60 seconds.  This limit is just for testing purposes as even I have been known to write queries that run for over a minute ;).  Administering the resource manager requires the ADMINISTER_RESOURCE_MANAGER privilege, go figure.  A SYSDBA user has this privilege already, but if you need to assign the privilege to a non-SYSDBA user, you can use this command (as a properly privileged user):

exec dbms_resource_manager_privs.grant_system_privilege( -

grantee_name => '<USER_NAME>', -

admin_option => true);

Check for a currently active resource plan:

SQL> show parameter resource_manager_plan

NAME                     TYPE     VALUE

-------------------------- ----------- ------------------------------

resource_manager_plan   string

Since no resource plan is active, we can continue with the rest of the penalty box steps to create one.  Later on, we will look at our options if there is an actual resource plan already in place.

Create a new pending area for plan, consumer group, and directives creation.  The pending area is where our resource plan work will be kept until it is validated and submitted.

begin

 dbms_resource_manager.create_pending_area();

end;

/

After the pending area is created, if your session abruptly ends or you make a mistake, simply run this next command to clear the pending area and start again.

begin

 dbms_resource_manager.clear_pending_area();

end;

/

Create the penalty box resource plan:

begin

 dbms_resource_manager.create_plan(

  plan => 'PENALTY_BOX_PLAN',

  comment => 'Resource plan to limit execution time');

end;

/

Create the penalty box resource consumer group:

begin

 dbms_resource_manager.create_consumer_group(

  consumer_group => 'PENALTY_BOX_GROUP',

  comment => 'Resource consumer group to limit execution time');

end;

/

 

Create the penalty box resource plan directives.  Every resource plan must contain a directive to OTHER_GROUPS.  This consumer group contains all sessions that have not been assigned to a consumer group.  If you try to submit the resource plan without the OTHER_GROUPS option, you will get this error message when attempting to validate the pending area:

begin

*

ERROR at line 1:

ORA-29377: consumer group OTHER_GROUPS is not part of top-plan PENALTY_BOX_PLAN

ORA-06512: at "SYS.DBMS_RMIN_SYS", line 3640

ORA-06512: at "SYS.DBMS_RMIN_SYS", line 3691

ORA-06512: at "SYS.DBMS_RMIN_SYS", line 3703

ORA-06512: at "SYS.DBMS_RESOURCE_MANAGER", line 746

ORA-06512: at line 2

 

As mentioned previously, we are setting an execution limit of 60 seconds.

begin

  dbms_resource_manager.create_plan_directive(

    PLAN=>'PENALTY_BOX_PLAN',

    GROUP_OR_SUBPLAN=>'PENALTY_BOX_GROUP',

    COMMENT=>'Kill statement after exceeding total execution time',

    SWITCH_GROUP=>'CANCEL_SQL',

    SWITCH_FOR_CALL=>TRUE,

    SWITCH_TIME_IN_CALL=>60,

    SWITCH_ESTIMATE=>false);

  dbms_resource_manager.create_plan_directive(

    PLAN=> 'PENALTY_BOX_PLAN',

    GROUP_OR_SUBPLAN=>'OTHER_GROUPS',

    COMMENT=>'leave others alone',

    CPU_P1=>100);

end;

/

The CANCEL_SQL group that will be switched to when the SWITCH_TIME_IN_CALL of 60 seconds is reached is actually an internal call introduced in 11g that will just cancel the SQL.  The KILL_SESSION group can also be used, but that would be a bit extreme in this case.

Validate the pending area for the new plan, consumer group, and directives.  If there is a problem with the plan, it will appear during validation

begin

 dbms_resource_manager.validate_pending_area();

end;

/

 

Submit the pending area for plan, consumer group, and directives

begin

 dbms_resource_manager.submit_pending_area();

end;

/

 

Now we have to grant privileges in order to assign consumer groups to users.

Create a new pending area for the users we want to put in our newly-created penalty box:

begin

 dbms_resource_manager.create_pending_area();

end;

/

 

Grant the switch privilege for resource consumer groups to users or roles.  We need to execute this command for every user we want to be eligible to put into the penalty box.

begin

 dbms_resource_manager_privs.grant_switch_consumer_group(

   grantee_name => 'SCOTT',

   consumer_group => 'PENALTY_BOX_GROUP',

   grant_option => FALSE);

end;

/

 

Assign the desired users to the new resource consumer group

begin

 dbms_resource_manager.set_initial_consumer_group(

  user => 'SCOTT',

  consumer_group => 'PENALTY_BOX_GROUP');

end;

/

 

Validate the pending area before submitting it:

begin

 dbms_resource_manager.validate_pending_area();

end;

/

 

Submit the pending area:

begin

 dbms_resource_manager.submit_pending_area();

end;

/

 

Finally, specify the new plan to be used by the instance:

SQL> ALTER SYSTEM SET RESOURCE_MANAGER_PLAN = PENALTY_BOX_PLAN;

System altered.

To clear the resource plan from the instance, use this command.  Note the command uses two single quotation marks, not a double-quote.  If you just want to switch to a different plan and not disable resource plans altogether, you can just supply the resource plan name within the quotation marks.

ALTER SYSTEM SET RESOURCE_MANAGER_PLAN = '';

Once our user’s behavior has been adjusted, we can remove them from the plan and put them back into their original plan:

begin

 dbms_resource_manager.set_initial_consumer_group(

  user => 'SCOTT',

  consumer_group => 'DEFAULT_CONSUMER_GROUP');

end;

/

 

To place other users in the penalty box, follow these abbreviated steps:

begin

 dbms_resource_manager_privs.grant_switch_consumer_group(

   grantee_name => 'USER',

   consumer_group => 'PENALTY_BOX_GROUP',

   grant_option => FALSE);

end;

/

begin

 dbms_resource_manager.set_initial_consumer_group(

  user => 'USER',

  consumer_group => 'PENALTY_BOX_GROUP');

end;

/

 

Now to test out our penalty box.  This query was executed as user SCOTT:

NOTE – This is my “hammer” query which will execute until it runs out of TEMP space:

set timing on

select distinct owner

from dba_constraints

where owner = (select owner

from dba_constraints

where constraint_type = 'C'

and owner in(select owner

from dba_constraints

where constraint_type = 'P'

and owner in(select owner

from dba_constraints

where constraint_type = 'R'

and owner in(select owner

from dba_constraints

where constraint_type = 'U'))))

/

If the penalty box is working properly, you should see this error if the query exceeds the time limit set in the resource plan directive.  The elapsed time output was the result of executing ‘set timing on’ to enable this SQL*Plus option.

ERROR at line 14:

ORA-00040: active time limit exceeded - call aborted

Elapsed: 00:01:31.20

Notice that the query did not end EXACTLY when the sixty-second time limit had been reached.  The time limit is an approximation.  The query will look to end after the time limit at the next convenient interrupt as determined by the database, much like times when you kill a database session and instead of ending immediately, it gets ‘marked for kill’ instead.  The database is waiting for the next convenient interrupt.  As a side note, you can add the word ‘immediate’ to the end of your ‘alter system kill session’ to speed things along.

If you want to make changes to the resource plan, consumer group, or resource plan directives, you can use the following commands.  Please note that you will need to create a pending area, execute the desired command, then submit the pending area in order for the command to take affect.

To remove a consumer group:

begin

DBMS_RESOURCE_MANAGER.DELETE_CONSUMER_GROUP('PENALTY_BOX_GROUP');

end;

/

To remove a resource plan:

begin

DBMS_RESOURCE_MANAGER.DELETE_PLAN('PENALTY_BOX_PLAN');

end;

/

To remove a resource plan and all associated sub-plans and consumer groups:

begin

DBMS_RESOURCE_MANAGER.DELETE_PLAN_CASCADE('PENALTY_BOX_PLAN');

end;

/

To delete plan directives:

begin

DBMS_RESOURCE_MANAGER.DELETE_PLAN_DIRECTIVE (

plan=>'PENALTY_BOX_PLAN',

group_or_subplan=>'PENALTY_BOX_GROUP');

end;

/

 

In the event that there is an active resource plan already in place, there are two options to consider.  Either you can create a new consumer group and assign the offending user to the new group, or you can add a plan directive to the offending user’s assigned consumer group.

The DBA_RSRC_CONSUMER_GROUP_PRIVS view displays the consumer groups granted to users or roles. Specifically, it displays the groups to which a user or role is allowed to belong or be switched.  In the example shown below, user SCOTT always starts in the SALES consumer group (INITIAL_GROUP is YES) and can switch to the MARKETING group through a specific grant, and can also switch to the DEFAULT_CONSUMER_GROUP (OTHER_GROUPS) and LOW_GROUP groups because they are granted to PUBLIC.  SCOTT also can grant the SALES group but not the MARKETING group to other users (GRANT_OPTION is YES for SALES, NO for MARKETING).

SELECT * FROM dba_rsrc_consumer_group_privs;

GRANTEE            GRANTED_GROUP                  GRANT_OPTION INITIAL_GROUP
------------------ ------------------------------ ------------ -------------
PUBLIC             DEFAULT_CONSUMER_GROUP         YES          YES
PUBLIC             LOW_GROUP                      NO           NO
SCOTT              MARKETING                      NO           NO
SCOTT              SALES                          YES          YES
SYSTEM             SYS_GROUP                      NO           YES

Using this information, let’s look at our database after the PENALTY_BOX_PLAN has been created and enabled:

 

select *
from dba_rsrc_consumer_group_privs
order by grantee;

GRANTEE                        GRANTED_GROUP                  GRA INI
------------------------------ ------------------------------ --- ---
PUBLIC                         DEFAULT_CONSUMER_GROUP         YES YES
PUBLIC                         LOW_GROUP                      NO  NO
SCOTT                          PENALTY_BOX_GROUP              NO  YES
SYSTEM                         SYS_GROUP                      NO  YES

Of the two options I mentioned earlier, it may be safer to create a new consumer group for the particular user with all the directives of their currently-assigned initial group.  The script in My Oracle Support (MOS) note 1388634.1 has the ability to reverse-engineer a complete resource plan including all included consumer groups.

With this information, you can create a new consumer group with the addition of the execution time limiting directive mentioned earlier and then assign it to the user as their initial consumer group.

The second option – adding the execution time limiting directive to their initial consumer group, is more risky because the directive would affect ALL the users assigned to that consumer group.

This post has been a small sampling of what the Oracle Resource Manager can do.  Over the years, I have seen nearly no shops where this feature was actively used.  Using this feature in non-production environment where resources are usually more limited can be beneficial.  I have also seen instances where using it in a production environment where some users are allowed to execute ad-hoc queries would have also removed some problems.

Enjoy!