Merge的具体功能,就不细说了,在实际的ETL开发过程中,碰到了一个很有意思的问题,
原加工存储片段:
merge into temp_data.tmp_indpty_prod_stat_h_1 p
using ( select indpty_agmt_rela.conform_indparty_id,
'S05009' || trim(f.operation_typ_cd) as product_id,
sum(thisincomeamt - thispayoutamt) as inv_income,
sum(lastincomeamt - lastpayoutamt + thisincomeamt -
thispayoutamt) as sum_inv_income
from tcs_indpty_agmt_rela_h indpty_agmt_rela
left join raw_data.f_acctquotient f on f.customerid =
indpty_agmt_rela.agmt_num
and indpty_agmt_rela.province_cd = 96
and indpty_agmt_rela.agmt_modifier_num = 5
and indpty_agmt_rela.indpty_agmt_rela_cd = '10'
and p_txdate between
indpty_agmt_rela.start_dt and
indpty_agmt_rela.end_dt - 1
group by indpty_agmt_rela.conform_indparty_id,
'S05009' || trim(f.operation_typ_cd)) s
on (p.conform_indparty_id = s.conform_indparty_id and p.product_id = s.product_id)
when matched then
update
set p.inv_income = s.inv_income, p.sum_inv_income = s.sum_inv_income;
这就是一个简单的Merge语句,但这个加工存储执行了7,8个小时,数据量为70多万左右,速度实在是不敢恭维,
经过尝试,将Megrge中的生成源数据的SQL语句,做张临时表,并将加工存储,改成如下:
INSERT INTO temp_data.tmp_indpty_prod_stat_h_3
select indpty_agmt_rela.conform_indparty_id,
'S05009' || trim(f.operation_typ_cd) as product_id,
sum(thisincomeamt - thispayoutamt) as inv_income,
sum(lastincomeamt - lastpayoutamt + thisincomeamt -
thispayoutamt) as sum_inv_income
from tcs_indpty_agmt_rela_h indpty_agmt_rela
left join raw_data.f_acctquotient f on f.customerid =
indpty_agmt_rela.agmt_num
and indpty_agmt_rela.province_cd = 96
and indpty_agmt_rela.agmt_modifier_num = 5
and indpty_agmt_rela.indpty_agmt_rela_cd = '10'
and p_txdate between
indpty_agmt_rela.start_dt and
indpty_agmt_rela.end_dt - 1
group by indpty_agmt_rela.conform_indparty_id,
'S05009' || trim(f.operation_typ_cd);
merge into temp_data.tmp_indpty_prod_stat_h_1 p
using (SELECT CONFORM_INDPARTY_ID, PRODUCT_ID, INV_INCOME, SUM_INV_INCOME
from temp_data.tmp_indpty_prod_stat_h_3) s
on (p.conform_indparty_id = s.conform_indparty_id and p.product_id = s.product_id)
when matched then
update
set p.inv_income = s.inv_income, p.sum_inv_income = s.sum_inv_income;
这样改完之后,发现该存储运行完成,只要几分钟,性能大大提升。
很有意思,后来,又在其他几个加工存储中做了验证,
发现Merge中,避免生成数据源的sql 过于复杂,如果无法避免,提前做张临时表,
然Merge直接去读,这样性能很大程度上有所提升。