Working with Oracle external tables in a nutshell

It’s easy to create data sets using a text editor or a spreadsheet utility. With the latest version of these utilities, you can even make them up to a million rows. Once the data is created, it needs to be loaded into a database table or tables so that is can be utilized. Oracle has a couple of utilities for loading data from flat file – SQL Loader and external tables. SQL Loader can be a bit complicated to work with, so I prefer external tables for quickly loading data from flat files. Here’s a quick method I recently used.

I created a .csv file with 1048576 rows and 25 columns.  Here is the first comma-delimited record.  You can copy and paste it into a spreadsheet or a text file over and over, creating as any records as you like:

123456,This is a test,3.14,14-Dec-14,http://www.msn.com/en-us/autos/classic-cars/the-derelict-1952-chrysler-town-and-country-beauty-in-the-beast/ar-BBt2Y1e,123456,This is a test,3.14,14-Dec-14,http://www.msn.com/en-us/autos/classic-cars/the-derelict-1952-chrysler-town-and-country-beauty-in-the-beast/ar-BBt2Y1e,123456,This is a test,3.14,14-Dec-14,http://www.msn.com/en-us/autos/classic-cars/the-derelict-1952-chrysler-town-and-country-beauty-in-the-beast/ar-BBt2Y1e,123456,This is a test,3.14,14-Dec-14,http://www.msn.com/en-us/autos/classic-cars/the-derelict-1952-chrysler-town-and-country-beauty-in-the-beast/ar-BBt2Y1e

This is the table I created to hold the data in the Oracle 11.2.0.1 database (please excuse the lack of creative column names):

create table jontest.jontab(
 ID1              number,
 LABEL1     varchar2(15),
 PI1              number,
 DATE1       date,
 URL1          varchar2(500),
 ID2             number,
 LABEL2     varchar2(15),
 PI2             number,
 DATE2      date,
 URL2         varchar2(500),
 ID3            number,
 LABEL3    varchar2(15),
 PI3             number,
 DATE3      date,
 URL3         varchar2(500),
 ID4             number,
 LABEL4    varchar2(15),
 PI4             number,
 DATE4      date,
 URL4        varchar2(500),
 ID5            number,
 LABEL5   varchar2(15),
 PI5            number,
 DATE5     date,
 URL5       varchar2(500)
 );

Now that the target has been created, we need to set up the system for the external table process.  We need to create a directory object in the database using a directory where the oracle software account has read and write access:

create directory impdir as ‘/home/oracle’;

Make sure the table owner has read and write permission on this directory object:

grant read, write on directory impdir to jontest;

Place the input .csv file (called ironically input.csv) in the directory object location and ensure the ownership and permissions on the file are appropriate.

Next, create the external table to access the input file.  An external table is not actually a physical table in the database, but more like a view that allows access to the data in the file.  Note that this table was created as the ‘jontest’ user.

 

CREATE TABLE extload
 (
 ID1              char(6),
 LABEL1     char(15),
 PI1               char(4),
 DATE1        date,
 URL1          char(500),
 ID2             char(6),
 LABEL2     char(15),
 PI2              char(4),
 DATE2       date,
 URL2         char(500),
 ID3             char(6),
 LABEL3    char(15),
 PI3             char(4),
 DATE3      date,
 URL3         char(500),
 ID4            char(6),
 LABEL4   char(15),
 PI4            char(4),
 DATE4     date,
 URL4       char(500),
 ID5           char(6),
 LABEL5   char(15),
 PI5            char(4),
 DATE5     date,
 URL5       char(500)
 )
 ORGANIZATION EXTERNAL (
 TYPE ORACLE_LOADER
 DEFAULT DIRECTORY impdir
 ACCESS PARAMETERS (
 RECORDS DELIMITED BY NEWLINE
 FIELDS TERMINATED BY ‘,’
 MISSING FIELD VALUES ARE NULL
 (
 ID1              char(6),
 LABEL1     char(15),
 PI1              char(4),
 DATE1       char(10) date_format DATE mask “DD-MON-YY”,
 URL1         char(500),
 ID2            char(6),
 LABEL2    char(15),
 PI2             char(4),
 DATE2      char(10) date_format DATE mask “DD-MON-YY”,
 URL2        char(500),
 ID3            char(6),
 LABEL3   char(15),
 PI3           char(4),
 DATE3     char(10) date_format DATE mask “DD-MON-YY”,
 URL3       char(500),
 ID4          char(6),
 LABEL4  char(15),
 PI4           char(4),
 DATE4    char(10) date_format DATE mask “DD-MON-YY”,
 URL4      char(500),
 ID5          char(6),
 LABEL5  char(15),
 PI5           char(4),
 DATE5    char(10) date_format DATE mask “DD-MON-YY”,
 URL5      char(500)
 )
 )
 LOCATION (‘input.csv’)
 ) reject limit unlimited;

I used the same column names as my target table, again lacking any creativity in this area.  Notice that the column data types are either ‘char’ or ‘date’.  This just makes thing easier and the data can be transformed during the loading into the target table, as I will demonstrate.  You can already see in the second listing of the columns in this table that I am already formatting the date columns.

If the creation is successful, you should be able to perform a count on the number of records in the table, or select one of the column values.  If the count or select comes back with no records or errors, check the directory object location for .bad or .log files with the table name as the prefix of the file name.  These files are created by default even though we did not specify them in the external table creation statement.

Once you can select from the external table, you can treat it like any other read-only table.  We can now perform any data conversion while we load our target table.

insert into jontest.jontab
 select
 to_number(ID1,999999),
 LABEL1,
 to_number(PI1,9.99),
 to_date(DATE1,’DD-MON-YY’),
 URL1,
 to_number(ID2,999999),
 LABEL2,
 to_number(PI2,9.99),
 to_date(DATE2,’DD-MON-YY’),
 URL2,
 to_number(ID3,999999),
 LABEL3,
 to_number(PI3,9.99),
 to_date(DATE3,’DD-MON-YY’),
 URL3,
 to_number(ID4,999999),
 LABEL4,
 to_number(PI4,9.99),
 to_date(DATE4,’DD-MON-YY’),
 URL4,
 to_number(ID5,999999),
 LABEL5,
 to_number(PI5,9.99),
 to_date(DATE5,’DD-MON-YY’),
 URL5
 from jontest.extload;

 

 

PL/SQL function result caching for better performance

Result caching has been around since Oracle database 11g release 1, but it was only recently that I was introduced to its full benefits in PL/SQL functions.  I was at a client site last month looking at a very poorly-written query that included millions of calls to PL/SQL functions, one for every row queried. The query had a ton of optimizer hints and still took hours to execute.

A colleague of mine at the Accenture Enkitec Group who is an Oracle developer by trade put forth that enabling result caching in the PL/SQL functions called could make a big difference in query performance without changing the actual query.  I had heard of result caching when querying tables, but had not used it before nor its PL/SQL function variation.  After enabling the result caching in the functions called by the previously-mentioned query, performance was dramatically increased.  I am now a believer and I want to make you one as well.

I set up a simple demonstration on a 11g release 1 database.  Here is my test table:

SQL> desc jontab
 Name                                      Null?    Type
 —————————————– ——– —————————-
 ID1                                                NUMBER
 LABEL1                                             VARCHAR2(15)
 PI1                                                NUMBER
 DATE1                                              DATE
 URL1                                               VARCHAR2(500)
 ID2                                                NUMBER
 LABEL2                                             VARCHAR2(15)
 PI2                                                NUMBER
 DATE2                                              DATE
 URL2                                               VARCHAR2(500)
 ID3                                                NUMBER
 LABEL3                                             VARCHAR2(15)
 PI3                                                NUMBER
 DATE3                                              DATE
 URL3                                               VARCHAR2(500)
 ID4                                                NUMBER
 LABEL4                                             VARCHAR2(15)
 PI4                                                NUMBER
 DATE4                                              DATE
 URL4                                               VARCHAR2(500)
 ID5                                                NUMBER
 LABEL5                                             VARCHAR2(15)
 PI5                                                NUMBER
 DATE5                                              DATE
 URL5                                               VARCHAR2(500)
SQL> select count(*)
 2  from jontab;

COUNT(*)
 ———-
 1048576

Now a simple PL/SQL function for testing:

create or replace FUNCTION jonfunc (employee_id_in IN number)
 RETURN number
 AS
 v_empno number;
 BEGIN
 SELECT empno
 INTO v_empno
 FROM scott.emp
 WHERE empno = employee_id_in;
 RETURN v_empno;
 EXCEPTION
 WHEN NO_DATA_FOUND
 THEN
 /* Return an empty record. */
 RETURN 0;
 END jonfunc;
 /

I compared the data in the scott.emp table with the column data I planned to use in my test table and the function will NEVER return a value other than zero.

Now I need to run a test query and trace it.  I use the level 12 of the 10046 event in order to get the most complete information I can.  The tracefile_identifier is just to make identifying the trace file easier:

alter session set statistics_level=ALL;
 alter session set tracefile_identifier = ‘jon_trace’;
 alter session set events = ‘10046 TRACE NAME CONTEXT FOREVER, LEVEL 12’;

select id1, jonfunc(id1)
 from jontab;

alter session set events = ‘10046 TRACE NAME CONTEXT OFF’;

I then processed the trace file using tkprof, sorting it by query execution time in descending order:

tkprof jon11g_ora_3308_jon_trace.trc output.log sort=EXEELA,FCHELA

Here is the particular session regarding the calls to our test function:

SQL ID: 5r2ba62345bqa
 Plan Hash: 56244932
 SELECT EMPNO
 FROM
 SCOTT.EMP WHERE EMPNO = :B1
call     count       cpu    elapsed       disk      query    current        rows
——- ——  ——– ———- ———- ———- ———-  ———-
Parse        0      0.00       0.00          0          0          0           0
Execute 1048572     56.97      72.48          0          0          0           0
Fetch   1048572      4.26       6.56          0    1048572          0           0
——- ——  ——– ———- ———- ———- ———-  ———-
total   2097144     61.23      79.05          0    1048572          0           0

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 92     (recursive depth: 1)

Elapsed times include waiting on following events:
Event waited on                             Times   Max. Wait  Total Waited
—————————————-   Waited  ———-  ————
SQL*Net message to client                       1        0.00          0.00
SQL*Net message from client                     1       14.47         14.47
latch: shared pool                             51        0.00          0.03
********************************************************************************

There were 1048572 calls to this function taking a total elapsed time of 79 seconds.  Let’s enable result caching in the test function:

create or replace FUNCTION jonfunc (employee_id_in IN number)
 RETURN number
 RESULT_CACHE  --enabling result caching
 AS
 v_empno number;
 BEGIN
 SELECT empno
 INTO v_empno
 FROM scott.emp
 WHERE empno = employee_id_in;
 RETURN v_empno;
 EXCEPTION
 WHEN NO_DATA_FOUND
 THEN
 /* Return an empty record. */
 RETURN 0;
 END jonfunc;
 /

After rerunning the test query and processing the resulting trace file:

SQL ID: 5r2ba62345bqa
Plan Hash: 56244932
SELECT EMPNO
FROM
SCOTT.EMP WHERE EMPNO = :B1

call     count       cpu    elapsed       disk      query    current        rows
——- ——  ——– ———- ———- ———- ———-  ———-
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        1      0.00       0.00          0          1          0           0
——- ——  ——– ———- ———- ———- ———-  ———-
total        3      0.00       0.00          0          1          0           0

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 92     (recursive depth: 1)

Rows     Row Source Operation
——-  —————————————————
0  INDEX UNIQUE SCAN PK_EMP (cr=1 pr=0 pw=0 time=0 us cost=0 size=4 card=1)(object id 73202)

********************************************************************************

There was now only one execution of this function which resulted in sub-second performance removing the entire 79 seconds from the previous execution.  It used the result cache exclusively after the first execution since it returned the same value for all subsequent calls.

There is also an option for the result caching enabling that can be used to invalidate the cache whenever a change occurs in the underlying table.  After the result cache is invalidated (cleared), the next call causes an actual execution to create the cache once more.  The option could have been used in the test function with this modified line:

RESULT_CACHE relies on(emp);

I hope that this simple demonstration has convinced you of the usefulness of this option.

Update:

In Oracle version 11g, both release 1 and 2, you cannot enable result caching in a function with invoker rights (i.e. AUTHID CURRENT_USER clause).  If you do, you will receive the following error during compilation:

PLS-00999: implementation restriction 
(may be temporary) RESULT_CACHE is 
disallowed on subprograms in 
Invoker-Rights modules

This limitation is fixed in Oracle version 12c.  As a work around, you can place the ivoker rights in a package or procedure that calls the function with result caching enabled.