Backups triggered automatically

Database availability is the number one goal of the database administrator.  Backup and recovery should be the second goal.  An internet search on establishing database backup and recovery processes will produce many examples.  I won’t go into my personal preferences, just emphasize the points I think are important:

  • Backup and recovery processes MUST be documented for reference by personnel responsible for maintenance and execution. I find that the lack of backup and recovery documentation is fairly common out in the world. No DBA likes to try to do an undocumented recovery at 3AM.
  • Backups processes that do not include a check for successful completion are INVALID. If the backup has been completing with an error for a month then guess what, you last valid backup was a month ago and if you had to do a recovery today, you just lost a month’s worth of data.
  • Recovery processes that have not been tested by performing a practice recovery or a clone are INVALID. You may have a proven backup method, but it has not been proven until you do a recovery in your own environment.
  • If there is available time and resources, add an export to your backup process. Exports are good for granular recovery of metadata and objects such as indexes and PL/SQL code.

That said, even solid backup processes can get blind-sided by unexpected activity in the database that produces vastly more redo activity then is normally seen.  It only takes one instance of the dreaded “ORA-00257: archiver error” to send personnel from support through management looking for or demanding answers and remediation steps.

The work around for these situations is usually increasing either the amount of space in the archived log destination or increasing the frequency of backups.  While either of these are fairly easy to implement, they each have areas of concern that should be considered prior to implementation.

Increasing the amount of space in the archived log location could be costly based on amount and type of storage used.  Also, if the surge of redo is not a common occurrence, all the additional space will be unused, essentially wasted.  Increasing the backup frequency may not cost in terms of additional storage, but in database resources needed to perform the additional backups, especially when they occur during normal business hours.

When a database is configured to use the fast recovery area (FRA) for backups and flashback, Oracle automatically monitors and alerts on its space usage.  Oracle 11.2 and 12.2 documentation both have the same information on these alerts:

The database issues a warning alert when reclaimable space is 
less than 15% and a critical alert when reclaimable space is 
less than 3%. To warn the DBA of this condition, an entry is 
added to the alert log and to the DBA_OUTSTANDING_ALERTS table 
(used by Enterprise Manager). Nevertheless, the database 
continues to consume space in the fast recovery area until 
there is no reclaimable space left.

When the recovery area is completely full, the error displayed 
is as follows, where nnnnn is the number of bytes required and 
mmmmm is the disk quota:

ORA-19809: limit exceeded for recovery files ORA-19804: cannot 
reclaim nnnnn bytes disk space from mmmmm limit

From <https://docs.oracle.com/cd/E11882_01/backup.112/e10642/rcmmaint.htm>
The database issues a warning alert when reclaimable space is 
less than 15% and a critical alert when reclaimable space is 
less than 3%. To warn the DBA of this condition, an entry is 
added to the alert log and to the DBA_OUTSTANDING_ALERTS table 
(used by Enterprise Manager). Nevertheless, the database 
continues to consume space in the fast recovery area until 
there is no reclaimable space left.

When the recovery area is completely full, the error displayed 
is as follows, where nnnnn is the number of bytes required and 
mmmmm is the disk quota:

ORA-19809: limit exceeded for recovery files ORA-19804: cannot 
reclaim nnnnn bytes disk space from mmmmm limit

From <https://docs.oracle.com/en/database/oracle/oracle-database/12.2/bradv/maintaining-rman-backups.html>

A practical database administrator should already be monitoring the alert log for errors and warnings.  Why not configure these monitoring processes to automatically trigger backups if this error is encountered?  One of my favorite alert log monitor scripts sends an email when an ORA- error is found in the alert log and sends a page or text message to designated parties if the error involves some type of corruption:

#

ERRMSG="Missing command line parameters"

if [ "$1" ]; then
        export ORACLE_SID=$1
else
        echo $ERRMSG
        exit 1
fi

if [ "$2" ]; then
        export ALERT_LOG_DIR=$2
else
        echo $ERRMSG
        exit 1
fi

            LOGFILE="${ALERT_LOG_DIR}/alert_${ORACLE_SID}.log"
            ERRFILE="${ALERT_LOG_DIR}/alert_error.log"
            CHECK_TIMEB="`date +%m/%d/%y-%H:%M`"
            SAVFILE="${ALERT_LOG_DIR}/alert_${ORACLE_SID}_`date '+%y%m%d-%H%M'`.log"
            GROUPID="SAPADMIN1.BASIS_ON_CALL"
            RECIP="youremail@sendwordnow.com"
            echo "  "
            echo "**"
            echo "**** $ORACLE_SID"
            echo "**"
            echo " "
            if [ -f $LOGFILE ]; then
               if [ `egrep -i "ORA-|corrupt" "${LOGFILE}" | wc -l` -eq 0 ]; then
                  echo "No errors found in alert log "
               else
                  egrep -i "ORA-|corrupt" $LOGFILE > $ERRFILE
                  echo " " >> $ERRFILE
                  echo "  Found the above error(s) in the alert log for ${ORACLE_SID}" >> $ERRFILE
                  echo "  Alert Log with error(s) saved as ${SAVFILE}" >> $ERRFILE
                  echo " " >> $ERRFILE
                  echo " " >> $ERRFILE
                  mailx -s "ALERT-Errors found in ${ORACLE_SID} Alert Log" youremail@yourcompany.com < $ERRFILE
                  if [ `egrep -i "corrupt" "${ERRFILE}" | wc -l` -gt 0 ]; then
                    mailx -s "ALERT - CORRUPTION DETECTED IN ${ORACLE_SID} ALERT LOG" YourPager@yourcompany.com < $ERRFILE
                  fi
                  mv $LOGFILE $SAVFILE
               fi
            else
               echo "No alert log found for ${ORACLE_SID} "
            fi
exit

You could easily add a line to call a separate backup script from this script.  I usually set my alert log monitor to execute once an hour, but increasing the frequency of execution will help catch redo space problems and not cost in terms of resource or space usage.  Also,  having an automatically-triggered backup will save time diagnosing the issue when an archiver ‘out of space’ problem occurs.

The drawback here is that you must have FRA enabled to utilize the alert functionality.  What if usage of the FRA is not preferred?  You could use the alert log monitor script above to trigger a backup when an archiver error is seen, but that would mean the database is already in the process of halting activity when the backup is triggered.  It is better to be pre-emptive in these types of alerts by setting up monitoring to alert of a problem before an alert log message is generated.

For example, here is a snippet of code for a Linux/Unix shell script that determines how much space is available in GB for a certain filesystem.  If that amount of space falls below a certain value, a message is generated:

#!/bin/sh

space_avail="`df -k | grep /archive | awk '{print $4}'`"

if [ `expr $space_avail / 1048576` -lt 40  ]; then
echo Backup needed
fi

You could easily replace the echo command with a call to a backup script.  After testing, this script could be called at an interval of your choosing from the crontab.

If you have your backup location in ASM, you can use this next piece of code to accomplish the same thing by querying the desired diskgroup:

#!/bin/sh

free_mb=`$CRS_HOME/bin/sqlplus -s "/ as sysasm"<<EOF
set heading off
select free_mb
from v\\$asm_diskgroup
where name = 'DATA'
order by name;
exit
EOF`

if [ `expr $free_mb / 1024` -lt 40  ]; then
echo Backup needed
fi

With a few simple steps, you can be proactive about excessive redo activity in your database.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s