The Geek Stuff

I Need More Data Scotty!

22nd May 2017

I needed to insert 10s of millions of records into a table the other day for a future blog I'm putting together on hashing. For the life of me I couldn't work out how to achieve this, without using CONNECT BY ROWNUM. The issue being, I ran out of memory before the script completes when generating records over a specific fresh hold. For the machine I was working on, this was 2,392,762 records.

I finally cracked it, so thought I would share.

set timing on

set autotrace on

When trying to generate rows above a certain fresh hold the CONNECT BY ROWNUM option throws out an error because an excessive amount of UGA/PGA memory is consumed.

SQL> select count(*) from (select rownum from dual connect by rownum <= 10000000)

>> select count(*) from (select rownum from dual connect by rownum <= 10000000)

*

Error at line 1

ORA-30009: Not enough memory for CONNECT BY operation

Let’s take a look at the explain plan, where we will see a single recursive CONNECT BY operator.

Plan hash value: 2688747630

--------------------------------------------------------------------------------

--------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 2 (0)| 00:00:01 |

| 1 | SORT AGGREGATE | | 1 | | |

| 2 | VIEW | | 1 | 2 (0)| 00:00:01 |

| 3 | COUNT | | | | |

|* 4 | CONNECT BY WITHOUT FILTERING| | | | |

| 5 | FAST DUAL | | 1 | 2 (0)| 00:00:01 |

--------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

4 - filter(ROWNUM<=10000000)

There are other methods for generating large datasets, such as pipelined table functions and the model function introduced in 10g, which are explained very nicely in Adrian Billington's article: http://www.oracle-developer.net/display.php?id=408. However, I still wanted to see if it was possible to use CONNECT BY ROWNUM with a really large fresh hold.

This is what I came up with.

SQL> select count(*)

from ( select rownum

from (select rownum from dual connect by rownum <= 1000) a

, (select rownum from dual connect by rownum <= 1000) b

, (select rownum from dual connect by rownum <= 1000) c

where rownum <= 10000000

)

COUNT(*)

----------

10000000

1 row selected.

Elapsed: 00:00:04.73

By splitting the CONNECT BY into lower recursion and letting the Cartesian merge join do the record multiplication, the UGA/PGA never exceeds more than a few megabytes.

Execution Plan

----------------------------------------------------------

0 SELECT STATEMENT Optimizer Mode=ALL_ROWS (Cost=6 Card=1)

1 0 SORT AGGREGATE (Card=1)

2 1 VIEW (Cost=6 Card=1)

3 2 COUNT STOPKEY

4 3 MERGE JOIN CARTESIAN (Cost=6 Card=1)

5 4 MERGE JOIN CARTESIAN (Cost=4 Card=1)

6 5 VIEW (Cost=2 Card=1)

7 6 COUNT

8 7 CONNECT BY WITHOUT FILTERING

9 8 FAST DUAL (Cost=2 Card=1)

10 5 BUFFER SORT (Cost=4 Card=1)

11 10 VIEW (Cost=2 Card=1)

12 11 COUNT

13 12 CONNECT BY WITHOUT FILTERING

14 13 FAST DUAL (Cost=2 Card=1)

15 4 BUFFER SORT (Cost=4 Card=1)

16 15 VIEW (Cost=2 Card=1)

17 16 COUNT

18 17 CONNECT BY WITHOUT FILTERING

19 18 FAST DUAL (Cost=2 Card=1)

Statistics

----------------------------------------------------------

5 user calls

0 spare statistic 2

0 spare statistic 6

0 spare statistic 11

0 physical reads retry corrupt

0 java call heap live size

0 java call heap live size max

0 java call heap object count

0 java session heap used size max

0 java session heap live size

1 rows processed

The Geek Stuff

I Need More Data Scotty!

22nd May 2017

Back to blogs

Contact

Follow

Legal stuff