接前文
Session重叠问题学习(二),这是问题和需求的描述,执行时间90秒
http://blog.itpub.net/29254281/viewspace-2150229/
Session重叠问题学习(三)--优化,一次优化后,执行时间25秒
http://blog.itpub.net/29254281/viewspace-2150259/
Session重叠问题学习(四)--再优化,二次优化后,执行时间10秒
http://blog.itpub.net/29254281/viewspace-2150297/
Session重叠问题学习(五)--最优化,三次优化后,执行时间1.6秒
http://blog.itpub.net/29254281/viewspace-2150339/
Session重叠问题学习(六)--极致优化,四次优化后,执行时间1250-1300毫秒
http://blog.itpub.net/29254281/viewspace-2150364/
这是对这个问题的算法总结和最后一次优化.
经过这次优化,在我的电脑上(SSD硬盘,机械硬盘还是没有这么快),运行时间是980毫秒左右.真正意义上的秒出.并且我确实觉得是优无可优了。
之所以能从10秒的版本,跳跃优化到1.6s,1.3s的版本.是因为采用了小花狸Session合并算法。
假如用户间首尾时间段没有重复的情况下,满足如下的规律

但是该规律仅仅存在于用户首尾时间段不重合的情况.
比如A用户上线时间是 10点至11点整,而用户B上线时间是11点整到12点.
因为11点整这个时刻,用户A和B重合了,所以这个算法就不能生效了.
所以当时取巧,如果重合了,就增加或者减去一个很小的时间
s+interval startnum/1000000 second s,
e-interval endnum/1000000 second e
-
insert into t2 (roomid,s,e)
-
select roomid,
-
s+interval startnum/1000000 second s,
-
e-interval endnum/1000000 second e
-
from (
-
select
-
roomid,
-
s,e,
-
startnum,
-
when @eflag=eflag then @rn:=@rn+1 when @eflag:=eflag then @rn else @rn end endnum
-
from (
-
select * from (
-
select when @sflag=sflag then @rn:=@rn+1 when @sflag:=sflag then @rn else @rn end startnum,roomid,s,e,sflag,eflag from
-
(
-
select * from
-
(
-
select t1.*,concat('[',roomid,'],',s) sflag,concat('[',roomid,'],',e) eflag from t1 order by roomid ,sflag
-
)a,(select @sflag:='',@rn:=0,@eflag:='') vars
-
) b
-
) bb order by roomid,eflag
-
) c
-
) d ;
但是这样引入一个问题,就是有误差.误差只能缩小却不能消除.
这也是为什么1.3s和1.6s版本使用timestamp(6)的原因,就是为了缩小误差.
但是经过反复的思考,之前发现的规律其实是一个特例.
小花狸Session合并的通用情况其实是下图所示

先给每个点标号.
每个点,如果是开始点则+1,结束点则-1.
以图中左侧第二点为例,该点是四个点的重合,其中有一个结束点(-1)三个开始点(+3),所以该点标号是2.
然后从左到右计算,
最左点是0加上该点标号,作为右侧点的最大在线人数.
其他点的最大在线人数 都是左侧点的最大在线人数加上标号.
最后查询最大在线人数大于等于2的时间段,汇总即可。
改进的过程如下:
-
DELIMITER $$
-
-
CREATE DEFINER=`root`@`localhost` PROCEDURE `p`()
-
BEGIN
-
drop table if exists t1;
-
drop table if exists t2;
-
drop table if exists tmp_time_point;
-
drop table if exists tmp_min_range;
-
drop table if exists tmp_s;
-
CREATE temporary TABLE `t1` (
-
`roomid` int(11) NOT NULL DEFAULT '0',
-
`userid` bigint(20) NOT NULL DEFAULT '0',
-
`s` timestamp,
-
`e` timestamp,
-
primary key(roomid,userid,s,e)
-
) ENGINE=memory;
-
-
CREATE temporary TABLE `t2` (
-
`roomid` int(11) NOT NULL DEFAULT '0',
-
`timepoint` timestamp,
-
c int,
-
key(roomid,timepoint)
-
) ENGINE=memory;
-
-
CREATE temporary TABLE `tmp_min_range` (
-
`roomid` int(11) NOT NULL DEFAULT '0',
-
`s` timestamp,
-
`e` timestamp,
-
primary key(roomid,s,e),
-
key(roomid,e)
-
) ENGINE=memory;
-
-
create temporary table tmp_time_point(
-
roomid bigint,
-
timepoint timestamp,
-
type smallint,
-
key(roomid,timepoint)
-
) engine=memory;
-
-
create temporary table tmp_s(
-
roomid bigint,
-
userid bigint,
-
s timestamp,
-
e timestamp,
-
i int
-
) engine=memory;
-
-
SET @A=0;
-
SET @B=0;
-
-
insert into tmp_s
-
SELECT x.roomid,x.userid,s,e,datediff(e,s)+1 i
-
FROM
-
(
-
(
-
SELECT @B:=@B+1 AS id,roomid,userid,s
-
FROM (
-
SELECT DISTINCT roomid, userid, roomstart AS s
-
FROM u_room_log a
-
WHERE NOT EXISTS (SELECT *
-
FROM u_room_log b
-
WHERE a.roomid = b.roomid
-
AND a.userid = b.userid
-
AND a.roomstart > b.roomstart
-
AND a.roomstart <= b.roomend)
-
) AS p
-
) AS x,
-
(
-
SELECT @A:=@A+1 AS id,roomid,userid,e
-
FROM
-
(
-
SELECT DISTINCT roomid, userid, roomend AS e
-
FROM u_room_log a
-
WHERE NOT EXISTS (SELECT *
-
FROM u_room_log b
-
WHERE a.roomid = b.roomid
-
AND a.userid = b.userid
-
AND a.roomend >= b.roomstart
-
AND a.roomend < b.roomend)
-
) AS o
-
) AS y
-
)
-
WHERE x.id = y.id AND x.roomid = y.roomid AND x.userid = y.userid ;
-
-
select max(i) into @c from tmp_s;
-
-
insert ignore into t1(roomid,userid,s,e)
-
select
-
roomid, userid,
-
if(date(s)!=date(e) and id>1,date(s+interval id-1 date(s+interval id-1 date(e) ,e,date_format(s+interval id-1 '%Y-%m-%d 23:59:59')) e
-
from tmp_s t1 STRAIGHT_JOIN
-
nums on(nums.id<=t1.i)
-
where nums.id<=@c
-
-
;
-
-
-- 开始点+1,结束点-1
-
insert into tmp_time_point(roomid,timepoint,type) select roomid,s,1 from t1;
-
insert into tmp_time_point(roomid,timepoint,type) select roomid,e,-1 from t1;
-
-
-- 计算每个点的标号
-
insert into t2(roomid,timepoint,c)
-
select roomid,timepoint,from tmp_time_point group by roomid,timepoint;
-
-
-- 计算最小范围
-
insert ignore into tmp_min_range(roomid,s,e)
-
select roomid,starttime starttime, endtime endtime from (
-
select
-
if(@roomid=roomid,@d,'') as starttime,@d:=str_to_date(timepoint,'%Y-%m-%d %H:%i:%s.%f'),@roomid:=roomid,p.roomid,str_to_date(timepoint,'%Y-%m-%d %H:%i:%s.%f') endtime
-
from tmp_time_point p,(select @d:='',@roomid:=-1) vars
-
order by roomid,timepoint
-
) v4 where starttime!='' and date(starttime)=date(endtime);
-
-
-
select roomid,date(s) dt,round(second,date_format(s,'%Y-%m-%d %H:%i:%s'),date_format(e,'%Y-%m-%d %H:%i:%s')))/60) ts,max(num) c from
-
(
-
select a.roomid,num,a.s,a.e from (
-
select when @roomid=roomid and date(@timepoint)=date(timepoint) then @num:=@num+prevC when @roomid:=roomid then @num:=0 end num,@timepoint:=timepoint ,a.* from (
-
select when @roomid=roomid then @prevC when @roomid:=roomid then @prevC:=0 end prevC,@prevC:=c,b.* from (
-
select * from t2 ,(select @roomid:=-1,@timepoint:='',@num:=0,@prevC:=-1) vars
-
) b order by roomid,timepoint
-
) a order by roomid,timepoint
-
) c
-
inner join
-
tmp_min_range a on( c.timepoint=a.e and c.roomid=a.roomid)
-
where num>=2
-
) d group by roomid,date(s);
-
-
END
耗时分析:
填充tmp_s,合并同一房间同一用户的重叠部分,耗时639毫秒
填充t1,拆分跨天的用户数据,耗时62毫秒
填充tmp_time_point,写入开始节点和结束节点的信息和权重,耗时忽略不计
填充t2,计算每个点的标号.耗时78毫秒
填充tmp_min_range,计算最小间隔范围,耗时125毫秒
小花狸合并算法并展示,耗时266毫秒.
总计时长983毫秒.
好了,这个问题不能再玩了..over,over
|