Generating test data sets with Swingbench

Reading about new features is great, but finding data to test them out on is  sometimes troublesome.  As DBAs, we’ve created test databases countless times, but the majority of the time they are empty.

Oracle does provide sample schemas that can be installed automatically by using the database configuration assistant (DBCA) when creating a database or manually by executing scripts under the Oracle software home directory.  Consult the documentation of your particular version for more details.  While these sample schemas do provide a framework that mimics a common layout, they lack the volume of data that typical production systems have.  If your employer allows “sandboxes” to be created to test new versions and features then you can stop reading here because you are one of the fortunate ones.  However, if you do not have this opportunity but would still like to create a sizeable data set to work with, there is a utility called Swingbench that fills this need very well.

Swingbench was created by Dominic Giles (http://dominicgiles.com/index.html) as an Oracle database load generator, but has since become a very useful suite of tools including Data Generator, Trace Analyzer, and several monitoring tools.  Best of all, these tools are free to use.  I would recommend that you provide a small donation if you find them useful, though.

Quoting the creator, “Data Generator is a utility to populate, create, and load tables with semi random data. Users can specify sequences, random text, random numbers, database columns, and files as sources for tables.”  This sounds pretty good.

I recently downloaded Swingbench version 2.5 which includes the Data Generator utility and worked with it on a Oracle VirtualBox virtual machine running Oracle Enterprise Linux (OEL) version 6 update 6 with and Oracle version 12.1.0.2 non-container database.  Here are the steps I followed to create my test data set:

  • Swingbench requires Java 8 JDK.  Download the jdk-8u111-linux-x64.rpm and install it (as the root user) with the command: “rpm -Uvh jdk-8u111-linux-x64.rpm”. Note that this file is for the 64-bit version of OEL.
  • Download the swingbench25971.zip file for version 2.5 from the website mentioned previously and place it in a location where the oracle user account has read, write, and execute privileges.
  • Unzip the swingbench25971.zip file.  it creates a “swingbench” directory in this location
  •  As the oracle user, set the following environment variables if they are not already set (actual values depend on your environment):
    • JAVA_HOME=/usr/bin/java
    • PATH=$JAVA_HOME/bin:$PATH
  • Go to the swingbench/bin directory.  You have the option to create the sales history schema (shwizard), the calling center schema (ccwizard), or the order entry schema (oewizard).
  • Once you execute the desired wizard file, the interface appears

dg1

 

  • After clicking Next, you have the option to create or drop your desired schema.

dg2

  • For the connection string, use //<servername>/<database_sid>. Make sure you can ping the host by the <servername> used and that there is a listener present providing service for the <database_sid>.  Leave the connection type as the default (jdbc thin).  Leave DBA username as ‘sys as sysdba’ and make sure the password is correct.  Press Next to connect.

dg3

  • If the connection is successful, you will see the next screen where you can choose the username, password, schema tablespace, and tablespace datafile of your data set schema.  I like to keep my data separated for easier management, so I elect to create a separate tablespace and datafile.

dg4

  • The next screen allows you to choose database options such as partitioning (range or none), compression (yes or no), tablespace type (bigfile or normal), and indexing (all, primary key, or none).  Be careful here.  Do not select options that you are not licensed for in systems that are eligible for license audits.

dg5

  • Sizing options come next.  I don’t suppose I need to tell you not to size your schema any bigger than you have space available.  What is good here is that two counters at the bottom will let you know the size of the schema tablespace and the size of the temp tablespace required based on your selection.  Click next when you are done.

dg6

  • Finally, select the level of parallelism and click finish to create your schema.

dg7

  • The wizard log popup appears and the main wizard updates to tell you the progress of the creation process

dg10

  • Once the build is over, you should see a “successful” message popup window.  Click Ok to acknowledge the message and close the window.  You then have the option to review the wizard log in the wizard log window or save it to a file.  Close both the wizard log and main wizard windows after you are done.

dg8

At this point, you have the option to generate other data sets if you want to.  Enjoy.

 

Advertisements

Sequences are not gap free!

mind the gap

Yesterday I encountered a trace file on a “poorly-performing” process with some interesting waits.  The trace was incomplete because the trace process was started once the process was identified as “poorly-performing”.  Even so, it was interesting.

OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        0      0.00       0.00          0          0          0           0
Execute      0      0.00       0.00          0          0          0           0
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        0      0.00       0.00          0          0          0           0

OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        0      0.00       0.00          0          0          0           0
Execute     78      0.03       0.05          0        106        106          53
Fetch       25      0.01       0.03          0         75          0          25
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      103      0.05       0.08          0        181        106          78


********************************************************************************
Trace file: PROD2_ora_2406.trc
Trace file compatibility: 11.1.0.7
Sort options: exeela  fchela  prsela  
       1  session in tracefile.
       0  user  SQL statements in trace file.
       2  internal SQL statements in trace file.
       2  SQL statements in trace file.
       2  unique SQL statements in trace file.
  384089  lines in trace file.
    1507  elapsed seconds in trace file.

 

Ok, we have 1507 seconds of tracing, or slightly over 25 minutes.  Of that, only .08 seconds are accounted for between the recursive (system) and non-recursive (user) statements.  Where did the time go?  There were no event waits under the non-recursive statements section, but the wait events for the recursive statements was very telling.  Note that this is not a complete listing, I just filtered out any waits that did not add up to more than 1 second.

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  enq: SV -  contention                      289471        0.48       1210.91
  gc current grant 2-way                      22811        0.13         17.21
  db file sequential read                     57139        0.53         83.05
  gc current block 2-way                       2094        0.02          1.94

We now see that about 1211 seconds, or about 20 minutes of the trace, was spent on “enq: SV – contention”.  But what is that?  This event is sequence value (SV) contention and appears when there is a high demand for sequence values across multiple instances in a RAC environment.  I typically see “log file sync” waits that point to sequence contention issues in non-RAC environments.

When sequences are created they have a default cached value of 20.  This means that every time a sequence is accessed, the next 20 values are cached in memory for quick access.  This value needs to be much higher when the sequence is used to generate values for inserts when hundreds of thousands or millions of records are inserted in one process or routine.

I always recommend increasing the size of the sequence cache values when I see these type of situations, but I am often met with one of two very weak arguments.  One is that the client does not want to have to recreate the sequences to increase the cache value while the other is that the client wants to keep the cache small in order to avoid gaps in sequence values.  Let’s take these arguments one at a time.

First,  altering the sequence cache value dynamically has been available since Oracle database version 10g release 1.  In fact, there is a quote in the Oracle 10g release 1 SQL reference guide which states “Oracle recommends using the CACHE setting to enhance performance if you are using sequences in a Real Application Clusters environment.”  The rest of this argument is disproved by creating a test sequence and changing the cache value with a simple “alter sequence” command.

The second argument is the most exasperating for me because the client usually won’t budge from their position.  Although the client is correct when they state that a smaller cache size reduces the risk of sequence value gaps, they are mistaken when they assume that a reduced cache will ensure no sequence gaps.

If you are relying on sequences to generate primary keys for newly-inserted records, there is NO guarantee that there will be no sequence value gaps.  Even Tom Kyte is in agreement here (https://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:29059659958024).

The closest you can get to no gaps using a sequence in this fashion is creating the sequence with the “INCREMENT BY 1”,  “NOCACHE”, and “ORDER” options together.  There will be a MAJOR performance cost using a sequence in this fashion for generating values during large insert activity.  Even with these values set, there is still the possibility of sequence value gaps.  Here is a version 11.2.04 demonstration of that fact:

SQL> create table jontab2 (jonkey number, jonstring varchar2(20));

Table created.

SQL> create sequence jonseq start with 1 increment by 1 nomaxvalue nocycle order nocache;

Sequence created.

SQL> insert into jontab2 values(jonseq.nextval, 'String');


1 row created.

SQL> insert into jontab2 values(jonseq.nextval, 'String');

1 row created.

SQL> commit;

Commit complete.

SQL> insert into jontab2 values(jonseq.nextval, 'String');

1 row created.

SQL> insert into jontab2 values(jonseq.nextval, 'String');

1 row created.

SQL> insert into jontab2 values(jonseq.nextval, 'String');

1 row created.

SQL> insert into jontab2 values(jonseq.nextval, 'String');

1 row created.

SQL> insert into jontab2 values(jonseq.nextval, 'String');

1 row created.

SQL> insert into jontab2 values(jonseq.nextval, 'String');

1 row created.

<I crash the instance here by killing the PMON process>

SQL> insert into jontab2 values(jonseq.nextval, 'String');
insert into jontab2 values(jonseq.nextval, 'String')
*
ERROR at line 1:
ORA-03135: connection lost contact
Process ID: 3657
Session ID: 125 Serial number: 7

SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options


<I restart the instance and login>

[oracle@rac1 ~]$ sqlplus jontest/semperf1

SQL*Plus: Release 11.2.0.4.0 Production on Fri Aug 4 09:10:01 2017

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> insert into jontab2 values(jonseq.nextval, 'String');

1 row created.

SQL> commit;

Commit complete.

SQL> select *
  2  from jontab2
  3  order by jonkey;

    JONKEY JONSTRING
---------- --------------------
         1 String
         2 String
         9 String                        --------GAAAAAAP!!!!!!!

SQL> select jonseq.nextval
  2  from dual;

   NEXTVAL
----------
        10

 

Because I did not commit some newly-inserted records before the instance crashed, they were not written to the table even though the sequence incremented and created a gap in the key values of the table.  Even if you have a batch process that commits after every record inserted, you can’t account for instance crashes.

If you STILL intend to use a sequence alone to generate a “no gap” primary key, what happens if you have to delete records from the table?  Guess what, you now have gaps in your primary key values.  What are you going to do, re-index all the remaining records to remove the gaps?  I doubt it.  The best bet is to use the sequence and one or more other values in the table, perhaps even a timestamp, as the basis for a primary key value.  This way, you can increase the sequence cache and still have solid primary keys.  You just need to get over the “no gap sequence values” idea.