Working with Oracle trace files

magnifying glassBeing a contractor sometimes means working with clients in situations where you can’t access their databases and servers directly. Fortunately Oracle has utilities such as AWR and scripts like SQLTXPLAIN to help diagnose the system in your absence. If you are very fortunate, you may even get them to execute eDB360 for you.

However, if you really want to know where the database is spending all of its time, nothing beats a good old-fashioned trace. Trace (.trc) files are the ever-present files in the Oracle diagnostic destination that can tell you what processes are doing and how long they are doing it for. Most ignore these files and even delete them when they build up on the database server, but you can’t beat them for their wealth of information.  If I can’t touch a client’s system, the first thing I ask for is a trace of the session executing the problem code.   If they are unfamiliar with the tracing process, I offer to run it for them.

Creating a trace file

For a proper trace complete with timing information, the statistics_level parameter needs to be set to ALL. Fortunately, if it is not set at the system level, it can be set at the session level.

alter session set statistics_level=ALL;

To make identifying the trace file easier, you can set a trace file identifier at the session level. This identifier will become part of the name(s) of the trace file(s) generated:

alter session set tracefile_identifier = ‘my_trace_id’;

Tracing a session is simple, even if that session is not yours. To trace your own session, you can use any of these commands:

alter session set sql_trace=TRUE;
exec dbms_monitor.session_trace_enable;
exec dbms_support.start_trace;
alter session set events = ‘10046 TRACE NAME CONTEXT FOREVER, LEVEL 12’;

The last command provides the most detailed tracing with a maximum level setting of 12. Lower settings are also available, but I would not recommend a lower setting than 8.

If you need to trace a session other than your own, you just need the sid and serial number of the session.  You start tracing on the current session with any of this commands:

execute dbms_monitor.session_trace_enable(sid,serial#,TRUE, TRUE); (The last two arguments are to enable reporting on WAITS and BIND variables)

If this is a client server environment and there are several sessions connected with the same database account and you are not sure which one will handle the code execution, you can generate dynamic statements to trace them all:

set lines 200 pages 999
select ‘execute dbms_monitor.session_trace_enable(‘||sid||’,’||serial#||’,TRUE,TRUE);’
from v$session
where username = ‘SCOTT’
order by sid;

When the process or code you are tracing has completed, stop the tracing in the current session with any of these commands:

alter session set sql_trace = false;
exec dbms_monitor.session_trace_disable;
exec dbms_support.stop_trace;
alter session set events = ‘10046 TRACE NAME CONTEXT OFF’;

Or you can use this command if the session is not the current one (based on the earlier sid and serial number values):

exec dbms_monitor.session_trace_disable(sid, serial#);

If you were tracing multiple sessions, use this code to generate the stop tracing commands:

set lines 200 pages 999
select ‘execute dbms_monitor.session_trace_disable(‘||sid||’,’||serial#||’);’
from v$session
where username = ‘SCOTT’
order by sid;

Locating and processing a trace file

The trace file location on the database server is designated by the diagnostic_dest parameter (11g and above). From there, go to the rdbms\<sid>\<instance>\trace directory to see the actual trace files. In a pre-11g system, just go to the location designated by the parameter user_dump_dest. Use a list command with a sort such as ‘ls -lart’ to see the latest trace files. The trace files end in the .trc extension. The one(s) you are interested in should be the latest and probably largest of the trace files and you can usually see them dynamically grow as the code executes. Of course, if you set a trace file identifier, the file will be even easier to find.

Processing a trace file

Looking at a raw trace is a bit confusing and takes some experience to decipher. You can use the TKPROF utility to process it into something a little more readable. I recommend using the EXEELA and FCHELA options during the process to sort the SQL code in order of elapsed time and fetches descending. That way, the SQL taking the longest time will be at the top for easy access. Here is an example of the command to create this output:

tkprof <raw_trace_file.trc> <processed_file.log> sort=EXEELA,FCHELA

You can use any extension for the output file, but it is best to use one that is not foreign to the editor you are going to use to open the resulting file.

Reviewing a trace file

Opening the file, go straight to the bottom section that starts with ‘OVERALL TOTALS’. If you are unfamiliar with looking at trace file output, spend a minute reviewing the legend at the top of the file:

Trace file: dummy_ora_1369.trc
Sort options: exeela fchela
********************************************************************************
count = number of times OCI procedure was executed
cpu = cpu time in seconds executing
elapsed = elapsed time in seconds executing
disk = number of physical reads of buffers from disk
query = number of buffers gotten for consistent read
current = number of buffers gotten in current mode (usually for update)
rows = number of rows processed by the fetch or execute call
********************************************************************************

Then proceed to the bottom:

********************************************************************************

OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS

call     count       cpu    elapsed       disk      query    current        rows

------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        0      0.00       0.00          0          0          0           0
Execute 209486    822.98     823.79       2526      33961    1607238      209486
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total   209486    822.98     823.79       2526      33961    1607238      209486

Misses in library cache during parse: 0

Elapsed times include waiting on following events:
Event waited on                             Times   Max. Wait  Total Waited
----------------------------------------   Waited  ----------  ------------
log file sync                              154081        6.41      19315.04
SQL*Net message to client                  314407        0.00          1.51
SQL*Net message from client                314407        0.02         59.12
Disk file operations I/O                       22        0.00          0.00
db file sequential read                      2526        0.02          0.37
library cache: mutex X                        154        0.00          0.00
log buffer space                                8        0.17          0.96
buffer busy waits                               2        0.10          0.10
latch: In memory undo latch                     5        0.00          0.00
latch: shared pool                              3        0.00          0.00
undo segment extension                          1        0.00          0.00


OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS

call     count       cpu    elapsed       disk      query    current        rows

------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse   628459     13.06      13.03          0          0          0           0
Execute 1256917    598.32     611.29       1889     425754     653042      418972
Fetch   418973     10.35      10.45          1          4          0      418973
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total   2304349    621.73     634.78       1890     425758     653042      837945

Misses in library cache during parse: 0

Elapsed times include waiting on following events:
Event waited on                             Times   Max. Wait  Total Waited
----------------------------------------   Waited  ----------  ------------
Disk file operations I/O                       23        0.00          0.00
db file sequential read                      1890        0.04          0.30
log buffer space                               22        0.63          3.65
log file switch (private strand flush incomplete)
12        1.56          7.50
cursor: pin S                                   1        0.00          0.00
buffer busy waits                               1        0.00          0.00
latch: In memory undo latch                     2        0.00          0.00
latch free                                      3        0.00          0.00
resmgr:cpu quantum                             36        0.06          0.29
log file switch completion                      1        1.57          1.57

418978  user  SQL statements in session.
2  internal SQL statements in session.
418980  SQL statements in session.
6  statements EXPLAINed in this session.
********************************************************************************
Trace file: dummy_ora_1369.trc
Trace file compatibility: 11.1.0.7
Sort options: exeela  fchela
1  session in tracefile.
418978  user  SQL statements in trace file.
2  internal SQL statements in trace file.
418980  SQL statements in trace file.
23  unique SQL statements in trace file.
6  SQL statements EXPLAINed using schema:
APPS.prof$plan_table
Default table was used.
Table was created.
Table was dropped.
36460282  lines in trace file.
20970  elapsed seconds in trace file.

There are two sections for overall totals – one for Oracle-generated non-recursive statements and one for user code known as recursive statements. At the very bottom is the number of elapsed seconds covered in the trace file. In this case, it is 20970 seconds or about 5.8 hours. Each of these sections has a total elapsed time and wait events information. The first thing to notice is that the total for the non-recursive statements is 823.79 seconds and the total for the recursive statements is 634.78. The sum of these two values is drastically short of the 20,970 grand total. Where did the other time go? Now start looking at the wait events under each section and stop when you get to the event that has a total time waited of 19315.04. To put this in perspective, a total of 1458.57 seconds (823.79 + 634.78) or 24 minutes was spent during the trace time on actual database work and 19315.04 seconds or 5.3 hours was spent doing log file syncs. Now we can start looking for the sources of all these log file sync waits. Since this output file is already sorted by elapsed time descending, we just start at the top:

SQL ID: 9dtjfluf6rcct Plan Hash: 2730624675

INSERT INTO "BAD"."SIMP_STG" (REC_ID,INSTANCE_ID,
ORIGINAL_ID,REPROCESS_ID,TRANSACTION_ID,REPROCESS_CNT)
VALUES
(SIMP_STG_S.nextval, :INSTANCE_ID, :ORIGINAL_BPEL_ID,
:REPROCESS_BPEL_ID, :TRANSACTION_ID, :REPROCESS_COUNTER)

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        0      0.00       0.00          0          0          0           0
Execute 104743    462.81     463.34       1414      18029     752169      104743
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total   104743    462.81     463.34       1414      18029     752169      104743

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 44  (APPS)
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
0          0          0  LOAD TABLE CONVENTIONAL  (cr=2 pr=0 pw=0 time=5657 us)
1          1          1   SEQUENCE  SIMP_STG_S (cr=2 pr=0 pw=0 time=2488 us)

Rows     Execution Plan
-------  ---------------------------------------------------
0  INSERT STATEMENT   MODE: ALL_ROWS
0   LOAD TABLE CONVENTIONAL OF 'SIMP_STG'
1    SEQUENCE OF 'SIMP_STG_S' (SEQUENCE)

Elapsed times include waiting on following events:
Event waited on                             Times   Max. Wait  Total Waited
----------------------------------------   Waited  ----------  ------------
log file sync                               49337        6.41       9105.43
SQL*Net message to client                  104743        0.00          0.51
SQL*Net message from client                104743        0.02         17.95
db file sequential read                      1414        0.01          0.17
library cache: mutex X                         61        0.00          0.00
Disk file operations I/O                       16        0.00          0.00
log buffer space                                1        0.17          0.17
buffer busy waits                               2        0.10          0.10
latch: In memory undo latch                     2        0.00          0.00
latch: shared pool                              1        0.00          0.00
undo segment extension                          1        0.00          0.00
********************************************************************************

This first statement looks like it accounts for about half of the log file sync waits. This insert statement uses a sequence to generate a unique ID for every one of the 104,743 rows inserted. A review of this sequence revealed that it had the default cache setting of 20. Any sequence that is called thousands of times during the course of one process needs a cache value that will drastically reduce the number of times it actually needs to generate a value. A cache value of 10000 or more would be needed here so that every time the sequence is called, the next 10000 numbers of the sequence are cached in memory, speeding up the obtaining of the sequence values. Once these cached values are exhausted, only then is the sequence physically called again.

Sequence caching removed about 9000 log file sync waits, but a review of the rest of the output revealed no more log file sync waits in any of the other statements. Where were the other 10000 waits? One of the problems with TKPROF is that it does not report on commits and rollbacks that are recorded in the trace file. It just so happens that one source of log file sync waits is excessive commits. The process generating these insert statements was a SQL Loader process that had options(rows=1) in the control file. It was performing a commit for every row inserted. Removing this clause and switching from conventional path to direct path loading removed the remaining log file sync waits.

Sometimes the cause of a problem is not always readily apparent. Take this statement for example:

SQL ID: 1b8xbxk8jc712 Plan Hash: 2504779452
SELECT L.JON
FROM
SIMP_HDR_STG H, SIMP_LINES_STG L WHERE
L.RECORD_HEADER_ID = H.RECORD_HEADER_ID AND H.INSTANCE_ID = :B1 AND ROWNUM = 1

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute     12      0.00       0.00          0          0          0           0
Fetch       12     81.56      81.56          0   26025612          0          12
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total       25     81.56      81.56          0   26025612          0          12

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 44     (recursive depth: 2)
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
1          1          1  COUNT STOPKEY (cr=2168801 pr=0 pw=0 time=6878671 us)
1          1          1   NESTED LOOPS  (cr=2168801 pr=0 pw=0 time=6878653 us cost=9 size=126 card=7)
1072923    1072923    1072923    NESTED LOOPS  (cr=1095878 pr=0 pw=0 time=4041943 us cost=9 size=126 card=7)
1072923    1072923    1072923     TABLE ACCESS FULL SIMP_LINES_STG (cr=23852 pr=0 pw=0 time=535161 us cost=2 size=49 card=7)
1072923    1072923    1072923     INDEX UNIQUE SCAN SIMP_HDR_STG_PK (cr=1072026 pr=0 pw=0 time=2585580 us cost=0 size=0 card=1)(object id 3379354)
1          1          1    TABLE ACCESS BY INDEX ROWID SIMP_HDR_STG (cr=1072923 pr=0 pw=0 time=2309258 us cost=1 size=11 card=1)

You need to dig a little bit with this one. Execution time is 81.56 seconds for 12 executions of this statement, which is not too bad at 6.7 seconds per execution. However, 26 million consistent reads total or 2.1 million per execution for only one row per execution as a result seems excessive. Looking further into the details, there are more scans of the primary key index of the header table than are performed in a full table scan of the lines table. If you assume a one to many relationship between the header and lines tables, then you should be doing fewer scans on the headers table than the lines table. This may be a simple case of having the wrong table driving the query.

Finally, there are times when you see things in the TKPROF output that just make you shake your head:

********************************************************************************

OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        0      0.00       0.00          0          0          0           0
Execute      0      0.00       0.00          0          0          0           0
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        0      0.00       0.00          0          0          0           0

Misses in library cache during parse: 0

Elapsed times include waiting on following events:
Event waited on                             Times   Max. Wait  Total Waited
----------------------------------------   Waited  ----------  ------------
db file sequential read                         1        0.00          0.00
utl_file I/O                                61671        0.00          1.12
PL/SQL lock timer                            2373       10.09      23731.10

OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse   314346      6.04       6.46          0          0         10           0
Execute 588073    667.85     678.04      35191    4498924     293728       50209
Fetch   496358    102.61     105.00       1768    5489750       6015      492324
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total   1398777    776.51     789.51      36959    9988674     299753      542533

Misses in library cache during parse: 346
Misses in library cache during execute: 1331

Elapsed times include waiting on following events:
Event waited on                             Times   Max. Wait  Total Waited
----------------------------------------   Waited  ----------  ------------
db file sequential read                      7713        0.06         10.85
Disk file operations I/O                       93        0.00          0.00
db file scattered read                       2067        0.07          2.26
enq: TM - contention                            9        0.17          0.96
latch: cache buffers chains                     1        0.00          0.00
latch: In memory undo latch                     1        0.00          0.00

24039  user  SQL statements in session.
138  internal SQL statements in session.
24177  SQL statements in session.
********************************************************************************
Trace file: DUMMY_ora_22860.trc
Trace file compatibility: 11.1.0.7
Sort options: exeela  fchela
1  session in tracefile.
24039  user  SQL statements in trace file.
138  internal SQL statements in trace file.
24177  SQL statements in trace file.
357  unique SQL statements in trace file.
14449283  lines in trace file.
29448  elapsed seconds in trace file.
 

This is only the bottom portion of a TKPROF output file, but what is interesting here is the presence of the PL/SQL lock timer waits under the non-recursive section. The only cause of these waits are calls to the dbms_lock.sleep procedure. To summarize, there are 29448 total seconds in this TKPROF output and 23731 seconds or 80% of the total time is taken up by code forcing the database to do nothing or “sleep”.

I hope you find this as useful as I did when it was first revealed to me. There is a wealth of information that the Oracle database provides automatically. Sometimes we have to put on our Indiana Jones hats to dig it up.

Update 1/10/2017

I recently had a situation where I was only given a raw trace file with no means to process it through TKPROF.  I this situation, I only had a restricted access virtual machine (VM) with which to investigate its contents.

Knowing that I could install basic tools on this VM, I installed Cygwin64 Terminal (cygwin.com) which is a set of Unix-like tools for Windows environments.  After installation, I was able to use the following commands to help me process the wait events out of the trace file.

(Tested on Oracle version 11.2.0.4.0)

Use this command to group waits in a raw trace file:

cat PROD1_ora_24787.trc | grep WAIT | awk -F”ela=”‘{a[$3];}END{for (i in a)print i;}’

cat PROD1_ora_24787.trc | grep WAIT | awk ‘{a[$3″ “$4” “$5];}END{for (i in a)print i;}’

Note – Since the fields of a wait category are often separated by spaces, it may be
necessary to grab multiple fields ($3, $4, $5) to get the entire wait event name

To count the number of waits per category;
cat PROD1_ora_24787.trc | grep WAIT | awk ‘{a[$3]++;}END{for (i in a)print i, a[i];}’

To summarize elapsed microseconds for a particular wait:
cat PROD1_ora_24787.trc | grep WAIT | grep ‘utl_file I/O’ | awk ‘{sum += $6}END{print sum}’

The above command will summarize all the ‘utl_file I/O’ waits in the example trace file.
Note that the value will be give in microseconds and should be multiplied by .000001 to
produce the summarized value in seconds.

Surviving after-hours support

cropped-master-cup-of-coffee.jpg

I like to joke that DBA stands for ‘Doing Business After-hours’ not Database Administrator. If you have been a DBA long-enough, you have found yourself in the unenviable position of having to deploy production code after business hours or even having to respond to frantic zero-dark-thirty calls for tier 1 environment issues ranging from performance problems to full-blown disaster recovery.

Surviving and even thriving in these type of situations is all a matter of planning and common sense.  I will explain using two examples.

Performing planned after-hours work:

In these situations, practicing the tasks is the best way to prevent surprises.  Since this work is planned, you should have the opportunity to practice in a non-production environment.  Even if a non-production environment is not available, creating a test environment using Oracle VirtualBox is an easy way create a similar environment to test.  Document the process you are going to follow, especially if you have to wake up to perform these changes.  Having the practiced steps laid out helps when you’re half asleep.

Middle of the night panic calls:

The first thing to remember in a panic situation is don’t panic.  Good decisions are seldom made in a panicked state of mind.  I know it it easy to get swept up in a panic situation, so you should start by asking questions, which will hopefully start others thinking about the situation more clearly.  Getting as much information up front helps the decision process.  Another useful tool is to have your environments documented ahead of time so that you can refer to this information easily.  In the event of a disaster recovery,  you will be thankful if you have a tested and documented recovery process.

Additional Tips

In the event of a middle of the night panic or assistance call, don’t immediately reach for the caffeine.  I reach for Gatorade first to get me hydrated while my thought processes ramp up.  Having coffee or a power drink immediately may keep you from getting back to sleep in the event that the assistance requested turns out to be brief.  If it looks like it is going to be a long night, then reach for the power drink or coffee.

Make sure you can connect to your environments remotely and that your connection is reliable and quick.  Ensure you can access connection account and password information readily.

Keep informed about any after-hours work in your environments, even if it does not involve you.  Problems can arise when bad code is deployed or unexpected results are seen.

 

SQL Server Health Check v1

health check

As a consultant, I frequently find myself in places where the on-boarding information for a new environment is incomplete or non-existent.  For Oracle environments, there are many tools available to obtain an initial review of a database server such as edb360 or OraChk.  For SQL Server, there are probably many tools as well, but I wanted something light-weight and non-intrusive.  Something that would give me a quick overview and maybe point out some trouble spots.

This was my plan when I wrote the SQL Server Health Check script.  This script allows someone new to the environment to get a quick snapshot of any existing SQL Server databases and their characteristics.  There is emphasis placed on reviewing existing or absent backups as this should be the first area of concern for any incoming DBA.   The script also briefly reviews waits in the environment, but does not go into more detail on particular performance characteristics.  As this script evolves, a more in-depth review of performance will likely be included.  The plan to keep this script non-intrusive may limit the scope of this review as well as others, though.

Special thanks to Sam Sridharagopal from Avenade for his help in developing and testing this script.

The script works on SQL Server versions 2005 through 2016 (CTP 2) and is best executed though SSMS.  If you find the script useful or have improvement suggestions, please comment.

———————————————————-Start of script

–sql_server_health_check_v1.sql
–v1
–This is a non-intrusive script written to give an overall informational view of a
–SQL Server 2005-2016 (CTP 2) server instance for the purpose of first-time review.
–Select ‘Results To’ text when running from SSMS

use master
GO

–Node name

SELECT SERVERPROPERTY(‘ComputerNamePhysicalNetBIOS’) AS [CurrentNodeName];

–Version

select @@version as [VERSION];

–Instance parameters

print N’Instance Parameter’;

SELECT  *
FROM    sys.configurations
ORDER BY name ;

–Database listing

print N’Database list with Status and Recovery Model’;

SELECT substring(name,1,40) AS name,  substring(state_desc,1,20) AS STATE,
substring(recovery_model_desc,1,20) AS RECOVERY_MODEL
FROM sys.databases
order by name;

–Size and growth

print N’Size and Growth’;

select substring(b.name,1,40) AS DB_Name, substring(a.name,1,40) AS Logical_name,
substring(a.filename,1,100) AS File_Name,
cast((a.size * 8.00) / 1024 as numeric(12,2)) as DB_Size_in_MB,
case when a.growth > 100 then ‘In MB’ else ‘In Percentage’ end File_Growth,
cast(case when a.growth > 100 then (a.growth * 8.00) / 1024
else (((a.size * a.growth) / 100) * 8.00) / 1024
end as numeric(12,2)) File_Growth_Size_in_MB,
case when ( maxsize = -1 or maxsize=268435456 ) then ‘AutoGrowth Not Restricted’ else ‘AutoGrowth Restricted’ end AutoGrowth_Status
from sysaltfiles a
join sysdatabases b on a.dbid = b.dbid
where DATABASEPROPERTYEX(b.name, ‘status’) = ‘ONLINE’
order by b.name;

–Is Cluster Node?

SELECT ‘Clustered’, case when SERVERPROPERTY(‘IsClustered’) = 0 then ‘No’
else ‘Yes’ end;

–Nodes in Cluster

print N’Cluster Nodes’;

SELECT * FROM fn_virtualservernodes();

–Is AlwaysOn?

SELECT ‘AlwaysOn’, case when SERVERPROPERTY(‘IsHadrEnabled’) = 0 then ‘No’
when SERVERPROPERTY(‘IsHadrEnabled’) = 1 then ‘Yes’
else SERVERPROPERTY(‘IsHadrEnabled’) end;

–AlwaysOn status

declare @c int;
declare @rd nvarchar(60);
declare @osd nvarchar(60);
declare @rhd nvarchar(60);
declare @shd nvarchar(60);
declare @csd nvarchar(60);
select @c = COUNT(name)
from sys.all_objects
where name = ‘dm_hadr_availability_replica_states’;
if @c = 0
print N’No AlwaysOn Status’;
else
select @rd = role_desc, @osd= case when operational_state_desc is null then ‘Replica is not local’
else operational_state_desc end,
@rhd = recovery_health_desc, @shd = synchronization_health_desc,
@csd = connected_state_desc
from sys.dm_hadr_availability_replica_states;
print @rd
print @osd
print @rhd
print @shd
print @csd

–Memory usage per database

print N’Memory Usage per User Database’;

SELECT
substring(DB_NAME(database_id),1,40) AS [Database Name]
,COUNT(*) * 8/1024.0 AS [Cached Size (MB)]
FROM
sys.dm_os_buffer_descriptors
WHERE
database_id > 4
AND database_id <> 32767
AND db_name(database_id) <> ‘SSISDB’
GROUP BY DB_NAME(database_id)
ORDER BY [Cached Size (MB)] DESC OPTION (RECOMPILE);

–Last backup time

SELECT substring(sdb.Name,1,40) AS DatabaseName,
COALESCE(CONVERT(VARCHAR(12), MAX(bus.backup_finish_date), 101),’-‘) AS LastBackUpTime
FROM sys.sysdatabases sdb
LEFT OUTER JOIN msdb.dbo.backupset bus ON bus.database_name = sdb.name
where sdb.Name <> ‘tempdb’
GROUP BY sdb.Name;

–No log backups for FULL or BULK_LOGGED recovery model databases

print N’Databases with FULL or BULK_LOGGED recovery model and no log backups in last 30 days’;

SELECT name AS at_risk_database
FROM sys.databases
where recovery_model_desc in(‘FULL’,’BULK_LOGGED’)
and name not in(
SELECT
msdb.dbo.backupset.database_name AS DBName
FROM msdb.dbo.backupmediafamily
INNER JOIN msdb.dbo.backupset ON msdb.dbo.backupmediafamily.media_set_id = msdb.dbo.backupset.media_set_id
WHERE (CONVERT(datetime, msdb.dbo.backupset.backup_start_date, 102) >= GETDATE() – 30)
and msdb..backupset.type = ‘L’
group by msdb.dbo.backupset.database_name
);

–Databases with no backups in the last 30 says

print N’Databases with NO backups in last 30 days’;

SELECT name AS at_risk_database
FROM sys.databases
where name <> ‘tempdb’
and name not in(
SELECT
substring(msdb.dbo.backupset.database_name,1,40) AS DBName
FROM msdb.dbo.backupmediafamily
INNER JOIN msdb.dbo.backupset ON msdb.dbo.backupmediafamily.media_set_id = msdb.dbo.backupset.media_set_id
WHERE (CONVERT(datetime, msdb.dbo.backupset.backup_start_date, 102) >= GETDATE() – 30)
group by msdb.dbo.backupset.database_name
);

–Backups for the previous week

print N’All backups for previous week’;

SELECT
CONVERT(CHAR(40), SERVERPROPERTY(‘Servername’)) AS Server,
substring(msdb.dbo.backupset.database_name,1,40) AS DBName,
msdb.dbo.backupset.backup_start_date,
msdb.dbo.backupset.backup_finish_date,
msdb.dbo.backupset.expiration_date,
CASE msdb..backupset.type
WHEN ‘D’ THEN ‘Database’
WHEN ‘L’ THEN ‘Log’
WHEN ‘F’ THEN ‘File’
WHEN ‘P’ THEN ‘Partial’
WHEN ‘I’ THEN ‘Differential database’
WHEN ‘G’ THEN ‘Differential file’
WHEN ‘Q’ THEN ‘Differential partial’
WHEN NULL THEN msdb..backupset.type
END AS backup_type,
msdb.dbo.backupset.backup_size,
substring(msdb.dbo.backupmediafamily.logical_device_name,1,50) AS logical_device_name,
substring(msdb.dbo.backupmediafamily.physical_device_name,1,50) AS physical_device_name,
substring(msdb.dbo.backupset.name,1,50) AS backupset_name,
substring(msdb.dbo.backupset.description,1,50) AS description
FROM msdb.dbo.backupmediafamily
INNER JOIN msdb.dbo.backupset ON msdb.dbo.backupmediafamily.media_set_id = msdb.dbo.backupset.media_set_id
WHERE (CONVERT(datetime, msdb.dbo.backupset.backup_start_date, 102) >= GETDATE() – 7)
ORDER BY
msdb.dbo.backupset.database_name,
msdb.dbo.backupset.backup_finish_date;

–Jobs that failed in the last 24 hours

print N’Jobs Failing in last 24 hours’;

— Variable Declarations
DECLARE @PreviousDate datetime
DECLARE @Year VARCHAR(4)
DECLARE @Month VARCHAR(2)
DECLARE @MonthPre VARCHAR(2)
DECLARE @Day VARCHAR(2)
DECLARE @DayPre VARCHAR(2)
DECLARE @FinalDate INT
— Initialize Variables
SET @PreviousDate = DATEADD(dd, -1, GETDATE()) — Last 1 day
SET @Year = DATEPART(yyyy, @PreviousDate)
SELECT @MonthPre = CONVERT(VARCHAR(2), DATEPART(mm, @PreviousDate))
SELECT @Month = RIGHT(CONVERT(VARCHAR, (@MonthPre + 1000000000)),2)
SELECT @DayPre = CONVERT(VARCHAR(2), DATEPART(dd, @PreviousDate))
SELECT @Day = RIGHT(CONVERT(VARCHAR, (@DayPre + 1000000000)),2)
SET @FinalDate = CAST(@Year + @Month + @Day AS INT)
— Final Logic
SELECT substring(j.[name],1,40) AS JOB,
substring(s.step_name,1,40) AS Step,
h.step_id,
substring(h.step_name,1,40) AS Step,
h.run_date,
h.run_time,
h.sql_severity,
substring(h.message,1,100) AS Message,
h.server
FROM msdb.dbo.sysjobhistory h
INNER JOIN msdb.dbo.sysjobs j
ON h.job_id = j.job_id
INNER JOIN msdb.dbo.sysjobsteps s
ON j.job_id = s.job_id AND h.step_id = s.step_id
WHERE h.run_status = 0 — Failure
AND h.run_date > @FinalDate
ORDER BY h.instance_id DESC;

–Missing indexes

print N’Missing Indexes’;

SELECT substring(so.name,1,40) AS Name
, (avg_total_user_cost * avg_user_impact) * (user_seeks + user_scans) as Impact
, ddmid.equality_columns
, ddmid.inequality_columns
, ddmid.included_columns
FROM sys.dm_db_missing_index_group_stats AS ddmigs
INNER JOIN sys.dm_db_missing_index_groups AS ddmig
ON ddmigs.group_handle = ddmig.index_group_handle
INNER JOIN sys.dm_db_missing_index_details AS ddmid
ON ddmig.index_handle = ddmid.index_handle
INNER JOIN sys.objects so WITH (nolock)
ON ddmid.object_id = so.object_id
WHERE ddmigs.group_handle IN (
SELECT TOP (5000) group_handle
FROM sys.dm_db_missing_index_group_stats WITH (nolock)
ORDER BY (avg_total_user_cost * avg_user_impact)*(user_seeks+user_scans)DESC);

–Duplicate indexes

print N’Duplicate Indexes’;

DECLARE @SCHEMANAME VARCHAR(30);
DECLARE @TABLENAME VARCHAR(127);
WITH ind_list AS(
select o.schema_id, i.object_id, i.index_id,
i.name, i.type_desc,
i.is_unique, i.is_primary_key,
STUFF( (SELECT ‘,’ + tc.name
FROM sys.index_columns ic
JOIN sys.columns tc
ON tc.column_id = ic.column_id AND
tc.object_id = ic.object_id
WHERE ic.object_id = i.object_id AND
ic.index_id = i.index_id
AND ic.is_included_column = 0
ORDER BY ic.index_column_id
FOR XML PATH (”) ),1,1,” ) index_columns,
STUFF( (SELECT ‘,’ + tc.name
FROM sys.index_columns ic
JOIN sys.columns tc
ON tc.column_id = ic.column_id AND
tc.object_id = ic.object_id
WHERE ic.object_id = i.object_id AND
ic.index_id = i.index_id
AND ic.is_included_column = 1
ORDER BY ic.index_column_id
FOR XML PATH (”) ),1,1,” ) include_columns
FROM sys.indexes i
JOIN sys.objects o ON o.object_id = i.object_id
WHERE i.index_id > 0 AND i.type_desc <> ‘XML’
AND object_name(i.object_id) LIKE @TABLENAME
AND i.is_disabled = 0
AND schema_name(o.schema_id) LIKE @SCHEMANAME )
SELECT substring(schema_name(included_indexes.schema_id),1,30) AS owner,
object_name(included_indexes.object_id) table_name,
(SELECT SUM(st.row_count) FROM sys.dm_db_partition_stats st
WHERE st.object_id = included_indexes.object_id
AND st.index_id < 2 ) num_rows,
included_indexes.name included_index_name,
included_indexes.index_columns included_index_columns,
included_indexes.include_columns
included_index_include_columns,
included_indexes.type_desc included_index_type,
included_indexes.is_unique included_index_uniqueness,
included_indexes.is_primary_key included_index_PK,
(SELECT SUM(a.total_pages) * 8 FROM sys.allocation_units a
JOIN sys.partitions p ON a.container_id = p.partition_id
WHERE p.object_id = included_indexes.object_id AND
p.index_id = included_indexes.index_id
) included_index_size_kb,
including_indexes.name including_index_name,
including_indexes.index_columns including_index_columns,
including_indexes.include_columns
including_index_include_columns,
including_indexes.type_desc including_index_type,
including_indexes.is_unique including_index_uniqueness,
including_indexes.is_primary_key including_index_PK,
(SELECT SUM(a.total_pages) * 8 FROM sys.allocation_units a
JOIN sys.partitions p ON a.container_id = p.partition_id
WHERE p.object_id = including_indexes.object_id AND
p.index_id = including_indexes.index_id
) including_index_size_kb
FROM ind_list included_indexes
JOIN ind_list including_indexes
ON including_indexes.object_id = included_indexes.object_id
JOIN sys.partitions ing_p
ON ing_p.object_id = including_indexes.object_id AND
ing_p.index_id = including_indexes.index_id
JOIN sys.allocation_units ing_a
ON ing_a.container_id = ing_p.partition_id
WHERE including_indexes.index_id <> included_indexes.index_id
AND LEN(included_indexes.index_columns) <=
LEN(including_indexes.index_columns)
AND included_indexes.index_columns + ‘,’ =
SUBSTRING(including_indexes.index_columns,1,
LEN(included_indexes.index_columns + ‘,’))
ORDER BY 2 DESC;

–Index fragmentation check

print N’Index with HIGH Fragmentation’;

EXEC sp_MSforeachdb ‘
USE [?]
SELECT ”?” AS DB_NAME,
QUOTENAME(sysind.name) AS [index_name],
indstat.*
FROM sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL, NULL, ”LIMITED”)
AS indstat
INNER JOIN sys.indexes sysind ON indstat.object_id = sysind.object_id AND
indstat.index_id = sysind.index_id
where avg_fragmentation_in_percent >= 30
ORDER BY avg_fragmentation_in_percent DESC;

use master
GO

 

–Wait stats

print N’Wait Stats’;

SELECT *
FROM sys.dm_os_wait_stats
where wait_time_ms > 10000
ORDER BY wait_time_ms DESC;

–Users and roles

print N’Users and Roles’;

WITH Roles_CTE(Role_Name, Username)
AS
(
SELECT
User_Name(sm.[groupuid]) as [Role_Name],
user_name(sm.[memberuid]) as [Username]
FROM [sys].[sysmembers] sm
)

SELECT
Roles_CTE.Role_Name,
[DatabaseUserName] = princ.[name],
[UserType] = CASE princ.[type]
WHEN ‘S’ THEN ‘SQL User’
WHEN ‘U’ THEN ‘Windows User’
WHEN ‘G’ THEN ‘Windows Group’
WHEN ‘A’ THEN ‘Application Role’
WHEN ‘R’ THEN ‘Database Role’
WHEN ‘C’ THEN ‘User mapped to a certificate’
WHEN ‘K’ THEN ‘User mapped to an asymmetric key’
END
FROM
sys.database_principals princ
JOIN Roles_CTE on Username = princ.name
where princ.type in (‘S’, ‘U’, ‘G’, ‘A’, ‘R’, ‘C’, ‘K’)
ORDER BY princ.name;

–Job listing (see Sam query)

print N’Job Information’;

SELECT     [JobName] = [jobs].[name]
,[Category] = [categories].[name]
,[Owner] = SUSER_SNAME([jobs].[owner_sid])
,[Enabled] = CASE [jobs].[enabled] WHEN 1 THEN ‘Yes’ ELSE ‘No’ END
,[Scheduled] = CASE [schedule].[enabled] WHEN 1 THEN ‘Yes’ ELSE ‘No’ END
,[Description] = [jobs].[description]
,[Occurs] =
CASE [schedule].[freq_type]
WHEN   1 THEN ‘Once’
WHEN   4 THEN ‘Daily’
WHEN   8 THEN ‘Weekly’
WHEN  16 THEN ‘Monthly’
WHEN  32 THEN ‘Monthly relative’
WHEN  64 THEN ‘When SQL Server Agent starts’
WHEN 128 THEN ‘Start whenever the CPU(s) become idle’
ELSE ”
END
,[Occurs_detail] =
CASE [schedule].[freq_type]
WHEN   1 THEN ‘O’
WHEN   4 THEN ‘Every ‘ + CONVERT(VARCHAR, [schedule].[freq_interval]) + ‘ day(s)’
WHEN   8 THEN ‘Every ‘ + CONVERT(VARCHAR, [schedule].[freq_recurrence_factor]) + ‘ weeks(s) on ‘ +
LEFT(
CASE WHEN [schedule].[freq_interval] &  1 =  1 THEN ‘Sunday, ‘    ELSE ” END +
CASE WHEN [schedule].[freq_interval] &  2 =  2 THEN ‘Monday, ‘    ELSE ” END +
CASE WHEN [schedule].[freq_interval] &  4 =  4 THEN ‘Tuesday, ‘   ELSE ” END +
CASE WHEN [schedule].[freq_interval] &  8 =  8 THEN ‘Wednesday, ‘ ELSE ” END +
CASE WHEN [schedule].[freq_interval] & 16 = 16 THEN ‘Thursday, ‘  ELSE ” END +
CASE WHEN [schedule].[freq_interval] & 32 = 32 THEN ‘Friday, ‘    ELSE ” END +
CASE WHEN [schedule].[freq_interval] & 64 = 64 THEN ‘Saturday, ‘  ELSE ” END ,
LEN(
CASE WHEN [schedule].[freq_interval] &  1 =  1 THEN ‘Sunday, ‘    ELSE ” END +
CASE WHEN [schedule].[freq_interval] &  2 =  2 THEN ‘Monday, ‘    ELSE ” END +
CASE WHEN [schedule].[freq_interval] &  4 =  4 THEN ‘Tuesday, ‘   ELSE ” END +
CASE WHEN [schedule].[freq_interval] &  8 =  8 THEN ‘Wednesday, ‘ ELSE ” END +
CASE WHEN [schedule].[freq_interval] & 16 = 16 THEN ‘Thursday, ‘  ELSE ” END +
CASE WHEN [schedule].[freq_interval] & 32 = 32 THEN ‘Friday, ‘    ELSE ” END +
CASE WHEN [schedule].[freq_interval] & 64 = 64 THEN ‘Saturday, ‘  ELSE ” END
) – 1
)
WHEN  16 THEN ‘Day ‘ + CONVERT(VARCHAR, [schedule].[freq_interval]) + ‘ of every ‘ + CONVERT(VARCHAR, [schedule].[freq_recurrence_factor]) + ‘ month(s)’
WHEN  32 THEN ‘The ‘ +
CASE [schedule].[freq_relative_interval]
WHEN  1 THEN ‘First’
WHEN  2 THEN ‘Second’
WHEN  4 THEN ‘Third’
WHEN  8 THEN ‘Fourth’
WHEN 16 THEN ‘Last’
END +
CASE [schedule].[freq_interval]
WHEN  1 THEN ‘ Sunday’
WHEN  2 THEN ‘ Monday’
WHEN  3 THEN ‘ Tuesday’
WHEN  4 THEN ‘ Wednesday’
WHEN  5 THEN ‘ Thursday’
WHEN  6 THEN ‘ Friday’
WHEN  7 THEN ‘ Saturday’
WHEN  8 THEN ‘ Day’
WHEN  9 THEN ‘ Weekday’
WHEN 10 THEN ‘ Weekend Day’
END + ‘ of every ‘ + CONVERT(VARCHAR, [schedule].[freq_recurrence_factor]) + ‘ month(s)’
ELSE ”
END
,[Frequency] =
CASE [schedule].[freq_subday_type]
WHEN 1 THEN ‘Occurs once at ‘ +
STUFF(STUFF(RIGHT(‘000000’ + CONVERT(VARCHAR(8), [schedule].[active_start_time]), 6), 5, 0, ‘:’), 3, 0, ‘:’)
WHEN 2 THEN ‘Occurs every ‘ +
CONVERT(VARCHAR, [schedule].[freq_subday_interval]) + ‘ Seconds(s) between ‘ +
STUFF(STUFF(RIGHT(‘000000’ + CONVERT(VARCHAR(8), [schedule].[active_start_time]), 6), 5, 0, ‘:’), 3, 0, ‘:’) + ‘ and ‘ +
STUFF(STUFF(RIGHT(‘000000’ + CONVERT(VARCHAR(8), [schedule].[active_end_time]), 6), 5, 0, ‘:’), 3, 0, ‘:’)
WHEN 4 THEN ‘Occurs every ‘ +
CONVERT(VARCHAR, [schedule].[freq_subday_interval]) + ‘ Minute(s) between ‘ +
STUFF(STUFF(RIGHT(‘000000’ + CONVERT(VARCHAR(8), [schedule].[active_start_time]), 6), 5, 0, ‘:’), 3, 0, ‘:’) + ‘ and ‘ +
STUFF(STUFF(RIGHT(‘000000’ + CONVERT(VARCHAR(8), [schedule].[active_end_time]), 6), 5, 0, ‘:’), 3, 0, ‘:’)
WHEN 8 THEN ‘Occurs every ‘ +
CONVERT(VARCHAR, [schedule].[freq_subday_interval]) + ‘ Hour(s) between ‘ +
STUFF(STUFF(RIGHT(‘000000’ + CONVERT(VARCHAR(8), [schedule].[active_start_time]), 6), 5, 0, ‘:’), 3, 0, ‘:’) + ‘ and ‘ +
STUFF(STUFF(RIGHT(‘000000’ + CONVERT(VARCHAR(8), [schedule].[active_end_time]), 6), 5, 0, ‘:’), 3, 0, ‘:’)
ELSE ”
END
,[AvgDurationInSec] = CONVERT(DECIMAL(10, 2), [jobhistory].[AvgDuration])
,[Next_Run_Date] =
CASE [jobschedule].[next_run_date]
WHEN 0 THEN CONVERT(DATETIME, ‘1900/1/1’)
ELSE CONVERT(DATETIME, CONVERT(CHAR(8), [jobschedule].[next_run_date], 112) + ‘ ‘ +
STUFF(STUFF(RIGHT(‘000000’ + CONVERT(VARCHAR(8), [jobschedule].[next_run_time]), 6), 5, 0, ‘:’), 3, 0, ‘:’))
END
FROM     [msdb].[dbo].[sysjobs] AS [jobs] WITh(NOLOCK)
LEFT OUTER JOIN [msdb].[dbo].[sysjobschedules] AS [jobschedule] WITh(NOLOCK)
ON [jobs].[job_id] = [jobschedule].[job_id]
LEFT OUTER JOIN [msdb].[dbo].[sysschedules] AS [schedule] WITh(NOLOCK)
ON [jobschedule].[schedule_id] = [schedule].[schedule_id]
INNER JOIN [msdb].[dbo].[syscategories] [categories] WITh(NOLOCK)
ON [jobs].[category_id] = [categories].[category_id]
LEFT OUTER JOIN
(    SELECT     [job_id], [AvgDuration] = (SUM((([run_duration] / 10000 * 3600) +
(([run_duration] % 10000) / 100 * 60) +
([run_duration] % 10000) % 100)) * 1.0) / COUNT([job_id])
FROM     [msdb].[dbo].[sysjobhistory] WITh(NOLOCK)
WHERE     [step_id] = 0
GROUP BY [job_id]
) AS [jobhistory]
ON [jobhistory].[job_id] = [jobs].[job_id];

–Existing linked server listing

print N’Linked Server Information’;

declare @x int;
select @x = COUNT(name)
from sys.all_objects
where name = ‘Servers’;
if @x <> 0
SELECT *
–c.name, provider, data_source, is_remote_login_enabled, b.modify_date
FROM sys.Servers a
LEFT OUTER JOIN sys.linked_logins b ON b.server_id = a.server_id
LEFT OUTER JOIN sys.server_principals c ON c.principal_id = b.local_principal_id
where a.server_id <> 0;
else
exec sp_linkedservers;

 

———————————————————-End of script

SQL Server startup options for troubleshooting

When something goes wrong with your SQL Server database and you cannot connect, it may be necessary to place it in a mode better suited for troubleshooting.  Two of these modes are single-user and minimal-configuration.

Single-user mode (-m)

This mode is used to change server configuration options or recover a damaged master database or other system database.

NOTE – Since only one user can connect to the SQL Server database in single-user mode (hence the name), it is important to stop and/or temporarily disable the SQL Server Agent service going through Start -> Administrative Tools -> Services and right-clicking on the service to change its status.  If started, this service will automatically occupy the only available connection in single-user mode, preventing further connections.  Also, the clustered service DLL will occupy the only available connection in a clustered environment, so this work around is necessary, but only in a clustered environment:

  1. Remove the –m startup parameter from the SQL Server advanced Properties (more on this process later under “Setting or removing a special startup mode in SQL Server”).
  2. Take the SQL Server service offline.
  3. From the current owner node of this group, issue the following command from the command prompt: net start MSSQLSERVER /m.
  4. Verify from the cluster administrator or failover cluster management console that the SQL Server resource is still offline.
  5. Connect to the SQL Server now using the following command and do the necessary operation: SQLCMD -E -S<servername>.
  6. Once the operation is complete, close the command prompt and bring back the SQL and other resources online through cluster administrator.

Minimal-configuration mode (-f)

This option starts an instance of SQL Server with minimal configuration. This is useful if the setting of a configuration value (for example, over-allocating memory) has prevented the server from starting. Starting SQL Server in minimal configuration mode places SQL Server in single-user mode.

Setting or removing a special startup mode parameter in SQL Server

  • Log into the server hosting the SQL Server database or log into one of the nodes hosting the SQL Server cluster.
  • Start the SQL Server Configuration Manager.  If you do not see it under the SQL Server application group in the Start menu, execute “SQLSERVERMANAGER10.msc” from the Run command window for SQL Server 2008 or “SQLSERVERMANAGER11.msc” from the Run command window for SQL Server 2012.
  • Select “SQL Server Services” in the left window pane and find the service called “SQL Server (instance)” in the right window pane where instance is the name given the named instance.
  • Right-click on the service name and select Properties.
  • In SQL Server 2012, the process is easy.  Simply go to the Startup parameter tab, type in the desired parameter in the “specify a startup parameter” box and click Add.  You can remove the parameter from “existing parameters” window under this tab and click Remove.
  • In SQL Server 2008, go to the Advanced tab, and scroll-down to Startup parameters. Click on the values on the right and use the drop-down arrow to enable a small window to add or remove parameters.
  • Restart this service to enable or disable the startup mode

 

Contrasting the different SQL Server recovery models

Those of us just getting our feet wet in the SQL Server pool should know about the different recovery models available as one of the first pieces of knowledge to set in stone.  Since I am approaching this from the aspect of being experienced in the Oracle database environment, I will compare the SQL Server recovery models to their Oracle database analogs where applicable.  Here we go!

Simple Recovery Model – This model is like the Oracle noarchivelog mode.  Transactions performed between backups are lost in the event of a failure and HA features such as mirroring or AlwaysOn cannot be used on a database with this recovery model.  On the plus side, the log file is kept small and tidy.

Full Recovery Model – Like the Oracle archivelog mode, this model will allow you to recover the database to the point of failure.  However, it requires log backups, like archivelog backups in Oracle.  Log backups should be executed in between full backups at an interval based on the amount of activity in the database and the rate of transaction log growth.

Bulk-Logged Recovery Model – The recovery mode is like the Full recovery mode with the exception that it allows for minimal transaction logging during bulk-loading to increase performance during these operations.  Log backups are also required with this recovery model.

More details on the three recovery models can be found here – https://msdn.microsoft.com/en-us/library/ms189275.aspx

 

 

Nested views

I recently blogged on the use of ‘select distinct’ as a lazy coding practice to remove duplicate rows from a result set.  I solicited opinions from my knowledgeable coworkers and was able to modify my thinking a bit on the subject.  However, nested views are a problem that I think we can all agree on.

Just last week I received a report concerning a query that had been running at an acceptable 39 seconds the day before, but now was taking over two minutes.  A cursory examination of the explain plan displayed multiple nested loops on the same handful of tables.  I relayed this information to the developer reporting the issue, but was told over and over that it ran fine yesterday.  However, I did not think that 39 seconds was all that great.  I was not able to focus on it in greater depth at the time due to other performance issues I was working, all of which were caused by an underlying storage tier issue.  Once that was resolved, all involved systems returned to normal.

The next day, I circled back around on the report query and discovered the reason for the multiple nested loops.  This is the response I sent to the developer with redacted table names for security reasons:

“After yesterday’s crisis had passed, I was able to further review the report query and have enclosed the object mapping notes with the following observations.  Nested views are used in this query.  Nested views refer to views that include other views in their makeup.  The problem with these types of views is that tables in one or more of the child views can sometimes be found in the parent views.  This causes the same tables to be revisited over and over for similar data.  Even if the tables where accessed for different data on subsequent revisits, the same data could have been retrieved during previous visits.  Referring to the enclosed notes, some of the tables being visited numerous times include TAB1, TAB2, TAB3, and TAB4.  Care should be taken when using views that contain other views so that table revisits or nested loops can be avoided.  Even though the query’s performance is acceptable at this time, increased volume of information may not allow this query to continue to perform at acceptable levels.  Also,  two of the subsequent views use one or more UNION statements on the same objects.  The use of UNION more than once may indicate another condition of unnecessary table revisits which could also negatively impact query performance.  Thanks.”

The problem with nested views lies not only with the use of nested views, but with their creation as well.  In this regard, the responsible DBA should resist the creation of views composed of other views unless determined that the same tables are not in the makeup of more than one view.

The way to avoid nested views in my opinion is to select from tables first, then from views and to avoid the joining of views unless it has been determined that they do not contain the same tables.

‘select distinct’ vs. ‘group by’ revisited

Seeing ‘select distinct’ in a query still makes me cringe.  I used to use ‘select distinct’ many years ago as a way to remove duplicate records in my result sets and time and experience taught me that this was not a good practice and what was needed was a better understanding of grouping and a more detailed knowledge of the underlying data itself.

I still see ‘select distinct’ in code these days and wanted go back to what made me think that its use was such a bad idea.  To that end, I put the question to my colleagues at the Accenture Enkitec Group who happen to be some of the most talented people in the world working with Oracle products.  I hope to try to relay some of this wisdom in the following points.

CJ Date, the famous relational database theorist, advocates that SQL’s default ‘select’ behavior should be ‘distinct’. A ‘select’ statement that can return more than one copy of the same tuple is, by definition, not relational. Such behavior turns SQL into a bag processing language instead of a set processing language, which has lots of implications on program complexity and correctness. His workaround advice for Oracle and other SQL languages that behave this way is to always use ‘distinct’.

Some ad-hoc SQL generators like IBM’s Cognos default to include DISTINCT on every SQL  generated, likely because it removes duplicates. Developers love this since they can get lazy with join predicates. If the SQL performs poorly, have the DBA to figure out why, or get more hardware.  Based in this particular reasoning, I am starting to remember why I got on the no ‘select distinct’ band wagon.

Using GROUP BY or DISTINCT as a shortcut to get a result set ordered was a widely used practice, even in products like Oracle EBS. That is why EBS requires the use of the hidden parameter _gby_hash_aggregation_enabled to disable the use of HASH for GROUP BY or DISTINCT. I guess that is cheaper than fixing all the code. People Soft also recommends disabling more efficient HASH algorithms for GROUP BY and DISTINCT since some of their code also uses DISTINCT and GROUP BY expecting a particular order and without using ORDER BY.

Oracle has a new function called APPROX_COUNT_DISTINCT introduced in Oracle version 12c (https://docs.oracle.com/database/121/SQLRF/functions013.htm#SQLRF56900).  This function quickly gathers the number of distinct values in a target column and should be used in place of a ‘select count(distinct col1)’ statement.  As always, test the results and performance before introducing this new function in place of every ‘distinct’ clause.

Still, seeing ‘select distinct’ should raise a red flag when reviewing code as it can indicate bad coding practice.  There are few situations where it is required in ALL select statements.  In these environments, you should be wary of the use of UNION and UNION ALL as well.

Based on these and many other opinions on both sides – pro and con, I will summarize by saying that I disagree with the universal use of ‘select distinct’ in all queries, but vow to not let seeing this clause cause me to bias my thinking that all the particular code is poorly written and poorly performing.  In my role, it is best to evaluate each statement on its own merits.  Don’t get me started about nested views, though.  That is a subject for another blog.