Tuesday, June 03, 2008

Cube

Today, a question on the Oracle forum for a alternate query (here)

Coming from a bad design, find the sum of two columns, grouping by two others columns.
The first thing coming in mind is a subquery with a UNION ALL :
select dt, ccy, sum(amt) amt
from (select dt, ccy1 as ccy, amt1 as amt from a
union all
select dt, ccy2 as ccy, amt2 as amt from a)
group by dt,ccy;


Then, trying to avoid it :
select dt, decode(t,1,ccy1,2,ccy2) ccy, sum(decode(t,1,amt1,2,amt2)) amt
from a , (select 1 t from dual union select 2 from dual)
group by dt, decode(t,1,ccy1,2,ccy2);


As we can see in the thread, the UNION ALL (which implies 2 table scan) is more performant than 1 table scan (implies a cartesian product and DECODE usage).

Well, all was said, until the nice demo of CUBE usage by DimaCit, the query is now the following :
with X as (
select dt, ccy1, sum(amt1) amt1, ccy2, sum(amt2) amt2
from a
group by dt, cube(ccy1, ccy2)
)
select x1.dt, x1.ccy1, x1.amt1 + x2.amt2
from X x1, X x2
where x1.dt = x2.dt
and x1.ccy1 is not null and x1.ccy2 is null
and x2.ccy2 is not null and x2.ccy1 is null
and x1.ccy1 = x2.ccy2;


I have to say I never thought about CUBE() on those cases.
So, very nice query, a little bit tricky, and a very good point for the best performance !
A nice demo of CUBE usage. Thank to DimaCit

Addendum : the solution doesn't work if one of the currency is not in the other column, then the CUBE solution is no more a solution. Thanks to Dominic Brooksfor comment it out this point.

3 comments:

DomBrooks said...

Nicolas,

I've just posted a comment on the forum thread.

This is a good example of what cube could do - I would never think of it in this circumstance either.

However, from assumptions (!) about the nature of the data - currency transactions - I would believe that you would not always have a currency on both side, although this is not part of the data supplied by the OP.

Cheers,
Dominic

Nicolas Gasparotto said...

Yeah, that's right. That was mainly for the fun of the CUBE solution, which is not as you explained it as well.

Laurent Schneider said...

not as performant as cube but maybe more readable and still fast :

select ccy,sum(amt) from a
unpivot ((amt,ccy) for (x,y) in ((amt1,ccy1),(amt2,ccy2)))
group by ccy;