I needed to insert 10s of millions of records into a table the other day for a future blog I'm putting together on hashing. For the life of me I couldn't work out how to achieve this, without using CONNECT BY ROWNUM. The issue being, I ran out of memory before the script completes when generating records over a specific fresh hold. For the machine I was working on, this was 2,392,762 records.
I finally cracked it, so thought I would share.
set timing on
set autotrace on
When trying to generate rows above a certain fresh hold the CONNECT BY ROWNUM option throws out an error because an excessive amount of UGA/PGA memory is consumed.
SQL> select count(*) from (select rownum from dual connect by rownum <= 10000000)
>> select count(*) from (select rownum from dual connect by rownum <= 10000000)
*
Error at line 1
ORA-30009: Not enough memory for CONNECT BY operation
Let’s take a look at the explain plan, where we will see a single recursive CONNECT BY operator.
Plan hash value: 2688747630
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 2 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | VIEW | | 1 | 2 (0)| 00:00:01 |
| 3 | COUNT | | | | |
|* 4 | CONNECT BY WITHOUT FILTERING| | | | |
| 5 | FAST DUAL | | 1 | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - filter(ROWNUM<=10000000)
There are other methods for generating large datasets, such as pipelined table functions and the model function introduced in 10g, which are explained very nicely in Adrian Billington's article: http://www.oracle-developer.net/display.php?id=408. However, I still wanted to see if it was possible to use CONNECT BY ROWNUM with a really large fresh hold.
This is what I came up with.
SQL> select count(*)
from ( select rownum
from (select rownum from dual connect by rownum <= 1000) a
, (select rownum from dual connect by rownum <= 1000) b
, (select rownum from dual connect by rownum <= 1000) c
where rownum <= 10000000
)
COUNT(*)
----------
10000000
1 row selected.
Elapsed: 00:00:04.73
By splitting the CONNECT BY into lower recursion and letting the Cartesian merge join do the record multiplication, the UGA/PGA never exceeds more than a few megabytes.
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer Mode=ALL_ROWS (Cost=6 Card=1)
1 0 SORT AGGREGATE (Card=1)
2 1 VIEW (Cost=6 Card=1)
3 2 COUNT STOPKEY
4 3 MERGE JOIN CARTESIAN (Cost=6 Card=1)
5 4 MERGE JOIN CARTESIAN (Cost=4 Card=1)
6 5 VIEW (Cost=2 Card=1)
7 6 COUNT
8 7 CONNECT BY WITHOUT FILTERING
9 8 FAST DUAL (Cost=2 Card=1)
10 5 BUFFER SORT (Cost=4 Card=1)
11 10 VIEW (Cost=2 Card=1)
12 11 COUNT
13 12 CONNECT BY WITHOUT FILTERING
14 13 FAST DUAL (Cost=2 Card=1)
15 4 BUFFER SORT (Cost=4 Card=1)
16 15 VIEW (Cost=2 Card=1)
17 16 COUNT
18 17 CONNECT BY WITHOUT FILTERING
19 18 FAST DUAL (Cost=2 Card=1)
Statistics
----------------------------------------------------------
5 user calls
0 spare statistic 2
0 spare statistic 6
0 spare statistic 11
0 physical reads retry corrupt
0 java call heap live size
0 java call heap live size max
0 java call heap object count
0 java session heap used size max
0 java session heap live size
1 rows processed