A follow on from my previous blog entitled "The Hashed Speed Test & How to Avoid Collisions", I'm comparing DBMS_CRYPTO with Oracle's new 12c hashing function STANDARD_HASH on speed and reliability.
STANDARD_HASH computes a hash value for a given expression using one of several hash algorithms as defined and standardised by the National Institute of Standards and Technology. These are;
RAW datatype of 128 bytes
Output character length = 32
RAW datatype of 160 bytes
Output character length = 40
RAW datatype of 256 bytes
Output character length = 64
RAW datatype of 384 bytes
Output character length = 96
RAW datatype of 512 bytes
Output character length = 128
The function accepts two arguments;
EXPR - This is the data that you want to hash. There are no restrictions on the length of data. It cannot be a LONG or LOB type, nor can it be a user-defined object type.
METHOD - Lets you specify the name of the hash algorithm to be used as detailed above. If this value is omitted, the default value of SHA1 will be used.
Let’s start by preparing the Oracle 12cR2 database, sitting on a VirtualBox Unix instance with 2 processors and 2048mb of base memory.
I initially create a table called "A" with a single column called HSH which has a datatype of VARCHAR2. I'll start of with a length to this column of 32 as this is the output (or digest) length of the MD5 hashing method. I'll increase the size of this column for each method I test.
SQL> set echo on
SQL> create table a (hsh varchar2(32));
Table A created.
SQL> create index hsh_ix on a (hsh);
Index HSH_IX created.
SQL> analyze table a compute statistics;
Table A analyzed.
SQL> set timing on
To successfully compare the two methods, I re-ran my previous tests for DBMS_CRYPTO against my Oracle 12cR2 instance. I inserted 10 million records for each cryptographic hash algorithm as before. I ran each test three times to calculate an average load speed, giving me a stable base for comparison. The results were;
Output character length = 32
Elapsed time 00:24:31.563 (24.5 mins)
Collisions = zero
Output character length = 32
Elapsed time 00:25:50.487 (26 mins)
Collisions = zero
Output character length = 40
Elapsed time 00:27:46.611 (28 mins)
Collisions = zero
We'll start by exploring the cryptographic hash algorithms that are shared between both hashing functions. These are MD5 and SHA-1. Then we will move onto the new hashing algorithms of SHA-256, SHA-384 and SHA-512.
Starting with MD5.
SQL> insert /*+ APPEND */ into a
2 select standard_hash(x,'MD5')
3 from ( select dbms_random.string('X',10)||lpad(rownum,10,'0') as x
4 from (select rownum from dual connect by rownum <= 1000) a
5 , (select rownum from dual connect by rownum <= 1000) b
6 , (select rownum from dual connect by rownum <= 1000) c
7 where rownum <= 10000000
8 );
10,000,000 rows inserted.
Elapsed: 00:21:00.527
SQL> commit;
Commit complete.
It ran roughly 3.5 minutes faster than the DBMS_CRYPTO equivalent method. What about collisions?
SQL> analyze table a compute statistics;
Table A analyzed.
SQL> with collision as
2 (select hsh, count(*)
3 from a
4 group by
5 hsh having count(*) > 1
6 )
7 select count(hsh) collisions, round((count(hsh)/10000000)*100,2) as pct_collisions
8 from collision;
COLLISIONS PCT_COLLISIONS
---------- --------------
0 0
As expected zero collisions from 10 million hashes.
Next up SHA-1.
SQL> truncate table a;
Table A truncated.
truncate table a;
SQL> alter table a modify hsh varchar2(40);
Table A altered.
SQL> analyze table a compute statistics;
Table A analyzed.
SQL> insert /*+ APPEND */ into a
2 select standard_hash(x,'SHA1')
3 from ( select dbms_random.string('X',10)||lpad(rownum,10,'0') as x
4 from (select rownum from dual connect by rownum <= 1000) a
5 , (select rownum from dual connect by rownum <= 1000) b
6 , (select rownum from dual connect by rownum <= 1000) c
7 where rownum <= 10000000
8 );
10,000,000 rows inserted.
Elapsed: 00:24:15.068
SQL> commit;
Commit complete.
A slight improvement was gained. STANDARD_HASH was about 1 minute faster than the DBMS_CRYPTO equivalent method.
SQL> analyze table a compute statistics;
Table A analyzed.
SQL> with collision as
2 (select hsh, count(*)
3 from a
4 group by
5 hsh having count(*) > 1
6 )
7 select count(hsh) collisions, round((count(hsh)/10000000)*100,2) as pct_collisions
8 from collision;
COLLISIONS PCT_COLLISIONS
---------- --------------
0 0
And again zero collisions as we would expect. As with DBMS_CRYPTO, I ran both of these tests three times to obtain an average load speed. The results can be seen in the table below.
OK, so lets move onto the new cryptographic hash algorithms that are not supported in DBMS_CRYPTO.
SQL> truncate table a;
Table A truncated.
SQL> analyze table a compute statistics;
Table A analyzed.
I ran the remaining hashing methods in the same way as I ran MD5 and SHA-1 for STANDARD_HASH. The results are displayed below.
These new hashing methods are considerably slower, taking between 35 minutes and and 1 half hours on average to process the 10 million records. This may in part be due to the datatype column on table "A" being a VARCHAR2 rather than a RAW datatype. Oracle is needing to convert the RAW hash string to a character string on insert. This could be more noticeable due to the extended length of the hashes generated by STANDARD_HASH. I'll be testing this theory in a future blog post.
Let’s tidy up.
SQL> drop table a;
Table A dropped.
STANDARD_HASH offers performance improvements over DBMS_CRYPTO, but these are negligible. More importantly the new 12c function offers a simplified approach to generating hash values with support for up to 512 bytes cryptographic algorithms.
The newly supported cryptographic methods deliver almost fall proof collision avoidance but are slower to run due to the complexity in the algorithms required.
As an example, to find a collision in SHA-256, you would probably have to execute the algorithm some 2^128 times (or 340,282,366,920,938,463,463,374,607,431,768,211,456). In contrast it is estimated SHA-1 would only need 2^80 (or 1,208,925,819,614,629,174,706,176) executions to obtain a collision.