本例以hive为例,只考虑到实现,与性能无关
首先创建表
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | CREATE TABLE orders (
orderid INT ,
createtime STRING,
modifiedtime STRING,
status STRING
) row format delimited fields terminated by '\t'
CREATE TABLE ods_orders_inc (
orderid INT ,
createtime STRING,
modifiedtime STRING,
status STRING
) PARTITIONED BY ( day STRING)
row format delimited fields terminated by '\t'
CREATE TABLE dw_orders_his (
orderid INT ,
createtime STRING,
modifiedtime STRING,
status STRING,
dw_start_date STRING,
dw_end_date STRING
) row format delimited fields terminated by '\t' ;
|
首先全量更新,我们先到2016-08-20为止的数据。
初始化,先把2016-08-20的数据初始化进去
1 2 3 4 | INSERT overwrite TABLE ods_orders_inc PARTITION ( day = '2016-08-20' )
SELECT orderid,createtime,modifiedtime,status
FROM orders
WHERE createtime < '2016-08-21' and modifiedtime < '2016-08-21' ;
|
刷到dw中
1 2 3 4 5 6 | INSERT overwrite TABLE dw_orders_his
SELECT orderid,createtime,modifiedtime,status,
createtime AS dw_start_date,
'9999-12-31' AS dw_end_date
FROM ods_orders_inc
WHERE day = '2016-08-20' ;
|
如下结果
1 2 3 4 5 | select * from dw_orders_his;
OK
1 2016-08-20 2016-08-20 创建 2016-08-20 9999-12-31
2 2016-08-20 2016-08-20 创建 2016-08-20 9999-12-31
3 2016-08-20 2016-08-20 创建 2016-08-20 9999-12-31
|