分析函數使用筆記匯總
?
??? 把之前零散記錄的分析函數用法筆記整理了一下,發現有限不常用的還是沒記住,-_-||| 。在這里匯總留存,不定期添加中。
???
?
一、over函數(基本分析函數):
?
code example 1:
?
SQL> select * from t1;
?
date? sale
----- ------
??? 1 ??? 20
??? 2???? 15
??? 3???? 14
??? 4???? 18
??? 5???? 30
?
使用分析函數逐條匯總sale:
?
SQL> select date,sum(sale)over(order by date) sum from t1;
?
date? sale? sum
----- ----- -----
??? 1??? 20??? 20? <----1天??????????
??? 2??? 15??? 35? <----1天+2天??????????
??? 3??? 14??? 49? <----1天+2天+3天??????????
??? 4??? 18??? 67?? ……
??? 5??? 30??? 97?? ……
?
另:使用count、avg、min、max等均與此類同。
?
code example 2:
?
SQL> select * from t1;
?
A?? B?? C
---?--- ---
m?? a?? ? 2?????????????????????
n?? a? ? 3?????????????????????
m?? a?? ? 2?????????????????????
n?? b?? ? 2?????????????????????
n?? b?? ? 1?????????????????????
x?? b?? ? 3?????????????????????
x?? b?? ? 2?????????????????????
x?? b?? ? 4?????????????????????
h?? b?? ? 3
?
使用分析函數分項匯總C:
?
SQL>
select a,c,sum(c)over(partition by a) sum from t2;
?
A?? B?? C?? SUM
--- --- --- -----
h?? b?? ? 3???? 3???????????????????????
m?? a?? ? 2???? 4???????????????????????
m?? a?? ? 2???? 4???????????????????????
n?? a?? ? 3???? 6???????????????????????
n?? b?? ? 2???? 6???????????????????????
n?? b?? ? 1???? 6???????????????????????
x?? b?? ? 3???? 9???????????????????????
x?? b?? ? 2???? 9???????????????????????
x?? b?? ? 4???? 9?
?
?
若無須分項匯總,則使用partition by null
?
SQL>
select a,c,sum(c)over(partition by null) sum from t2;
?
A?? B?? C?? SUM
--- --- --- -----
h?? b?? ? 3??? 22???????????????????????
m?? a?? ? 2??? 22???????????????????????
m?? a?? ? 2??? 22???????????????????????
n?? a?? ? 3??? 22???????????????????????
n?? b?? ? 2??? 22???????????????????????
n?? b?? ? 1??? 22???????????????????????
x?? b?? ? 3??? 22???????????????????????
x?? b?? ? 2??? 22???????????????????????
x?? b?? ? 4??? 22???????????????????????
?
?
code example 3:
?
SQL> select * from t1;
?
NAME? CLASS? S
----- ------ ------
fda??? ??? 1???? 80????????????????????
ffd??? ??? 1???? 78????????????????????
dss??? ??? 1???? 95????????????????????
cfe?? ??? 2???? 74????????????????????
gds?? ??? 2???? 92????????????????????
gf???? ??? 3???? 99????????????????????
ddd?? ??? 3???? 99????????????????????
adf?? ??? 3???? 45????????????????????
asdf? ??? 3???? 55????????????????????
3dd?? ??? 3???? 78?????????
?
?
統計每個班的第一名成績:
?
SQL>
select * from??????????????????????????????????????????????????????????????????????
? 2?
(select name,class,s,rank()over(partition by class order by s desc) mm
? 3? from t1)???????????????????????????????????????????????????????????????????????????
? 4? where mm=1;
?
NAME?? CLASS? S???? MM?
------ ------ ----- -----
dss???? ?? 1??? 95???? 1
gds????? ?? 2??? 92???? 1
gf?????? ?? 3??? 99???? 1
ddd????? ?? 3??? 99???? 1
?
?
code example 4:★★★★
?
SQL> select * from t1;
?
ID???? NUM
------ -------
?? 1?????? 0
?? 2?????? 2
?? 3?????? 2
?? 4?????? 0
?? 5?????? 0
?? 6?????? 2???
?? 7?????? 2
?? 8?????? 0
?? 9?????? 1
?? 10????? 1
?
使用開窗函數進行匯總的研究:
?
SQL>
select id,sum(num)over(order by id range between 3 preceding and 3 following) from t1;
?
ID???? SUM(NUM)OVER(ORDERBYIDRANGEBET
------ ------------------------------
??? 1???????????? 4
??? 2???????????? 4
??? 3???????????? 6
??? 4???????????? 8
?? 5???????????? 8
??? 6???????????? 7
??? 7???????????? 6
?? 8???????????? 6
?? 9???????????? 6
?? 10??????????? 4
?
說明:range between代表了該id前后數值區間(注意是數值,不是行數)的數字進行匯總。
?
注1:若要通過行開窗,則將range改為rows
?
SQL>
select id,sum(num)over(order by id rows between 3 preceding and 3 following) from t1;
?
此例中的運行結果與上面相同(因為行號和數值相同)
?
注2:若需要某一邊不設限,則使用關鍵字unbounded
?
SQL>
select id,sum(num)over(order by id rows between unbounded preceding and 2 following)
? 2? from t1;
?
?
?
2、排序函數的研究:
?
一般的3種排序函數,注意排序的區別:
select col,value,rank() over([partition by col] order by value) from tmp1;
--排名相同則重復,之后跳過
?
select col,value,dense_rank() over([partition by col] order by value) from tmp1;
--排名相同則重復,之后不跳過
?
select col,value,row_number() over([partition by col] order by value) from tmp1;
--排名相同也不重復
?
select col,value,cume_dist() over([partition by col] order by value) from tmp1;
--0-1之間比率,最大1,rank形式
?
select col,value,percent_rank() over([partition by col] order by value) from tmp1;
--0-1之間比率,不到1,decse_rank形式
?
?
另一種有趣的排序用法,確定某數在序列中的排位:
?
select col,rank(121) within group(order by value desc) from tmp1 group by col
--121在各個col中的value中的排位(排名相同則重復,之后跳過)
?
select col,dense_rank(121) within group(order by value desc) from tmp1 group by col;
--121在各個col中的value中的排位(排名相同則重復,之后不跳過)
?
select col,cume_dist(121) within group(order by value desc) from tmp1 group by col;
--121在各個col中的value中的位置占比(結果均在0-1之間,用rank的規則)
?
select col,percent_rank(121) within group(order by value desc) from tmp1 group by col;
--121在各個col中的value中的位置占比(結果均在0-1之間,用decse_rank的規則)
?
?
附:KEEP(DENSE_RANK FIRST ORDER BY ) 的研究
------------------------------------------------
?
有表test數據如下:
?
SQL> select * from test;
?
ID? MC?? SL
--- ----- ---
? 1?? 111?? 1
? 1?? 222?? 1
? 1?? 333?? 2
? 1?? 555?? 3
? 1?? 666?? 3
? 2?? 111?? 1
? 2?? 222?? 1
? 2?? 333?? 2
? 2?? 555?? 2
?
SQL>
select id,mc,sl,
? 2? min(mc) keep (DENSE_RANK first ORDER BY sl) over(partition by id) A,
? 3? max(mc) keep (DENSE_RANK first ORDER BY sl) over(partition by id) B,
? 4? min(mc) keep (DENSE_RANK last? ORDER BY sl) over(partition by id) C,
? 5? max(mc) keep (DENSE_RANK last? ORDER BY sl) over(partition by id) D
? 6? from test;
?
ID?? MC? SL A?? B?? C?? D
---- --- -- --- --- --- ---
?? 1 111? 1 111 222 555 666
?? 1 222? 1 111 222 555 666
?? 1 333? 2 111 222 555 666
?? 1 555? 3 111 222 555 666
?? 1 666? 3 111 222 555 666
?? 2 111? 1 111 222 333 555
?? 2 222? 1 111 222 333 555
?? 2 333? 2 111 222 333 555
?? 2 555? 2 111 222 333 555
?
具體邏輯如下:以 min(mc) keep (DENSE_RANK first ORDER BY sl) over(partition by id)? 為例
1、確認over(partition by id),選取ID=1的5行
2、通過(DENSE_RANK ORDER BY sl),對ID=1的5行中的sl進行排序
3、通過keep (DENSE_RANK first ORDER BY sl),選出其中的第一個sl=1
4、用min(mc)選取id=1 & sl=1時的mc最小值
?
?
?
3、幾個層級函數的使用:
?
① 首先是grouping sets的使用方法:
?
select a,b,c,sum(d) from t
group by grouping sets (a,b,c);
?
上面這個語句等同于一下語句:
select * from
(select a,null,null,sum( d ) from t group by a
?union all
select null,b,null,sum( d ) from t group by b
?union all
select null,null,c,sum( d ) from t group by c);
② rollup函數:
?
select a,b,c,sum(d) from t
group by rollup(a, b, c);
?
等同于:
?
select * from
(select a,b,c,sum(d) from t group by a, b, c
?union all
select a,b,null,sum(d) from t group by a, b
?union all
select a,null,null,sum(d) from t group by a
?union all
select null,null,null,sum(d) from t);
?
------------------------------------------
所以說:
?
select a,b,c,sum(d) from t
group by grouping sets ((a,b,c),(a,b),a,());
?
等同與
?
select a,b,c,sum(d) from t
group by rollup(a, b, c);
?
③ cube函數:
?
select a,b,c,sum(d) from t
group by cube(a, b, c);
?
等同于:
?
select * from
(select a,b,c,sum(d) from t group by a, b, c
union all
select a,b,null,sum(d) from t group by a, b
union all
select a,null,c,sum(d) from t group by a, c
union all
select null,b,c,sum(d) from t group by b, c
union all
select a,null,null,sum(d) from t group by a
union all
select null,b,null,sum(d) from t group by b
union all
select null,null,c,sum(d) from t group by c
union all
select null,null,null,sum(d) from t);
?
注:即以所有可以組合的種類匯總!
?
④ Grouping() 函數的用法
以上的grouping sets、rollup、Cube函數在匯總時均會涉及到null值
如何對這些null進行自定義呢?
需要使用Grouping()函數
select
decode(grouping(a),1,'all a',a) a,
?????? decode(grouping(b),1,'all b',b) b,
?????? decode(grouping(c),1,'all c',c) c,
?????? sum(d)
? from t
?group by cube(a,b,c);
?
則:所有a列為null處顯示all a,其他列類推
?
我的綜合應用:
-----------------------------------------
select oid,decode(grouping(oid),1,'合計',min(oname)) oname,
?????? sum(decode(to_char(sdate,'yyyymm'),'200801',prem,0))/10000 一月,
?????? sum(decode(to_char(sdate,'yyyymm'),'200802',prem,0))/10000 二月,
?????? sum(decode(to_char(sdate,'yyyymm'),'200803',prem,0))/10000 三月,
?????? sum(decode(to_char(sdate,'yyyymm'),'200804',prem,0))/10000 四月,
?????? sum(decode(to_char(sdate,'yyyymm'),'200805',prem,0))/10000 五月,
?????? sum(decode(to_char(sdate,'yyyymm'),'200806',prem,0))/10000 六月,
?????? sum(decode(to_char(sdate,'yyyymm'),'200807',prem,0))/10000 七月,
?????? sum(decode(to_char(sdate,'yyyy'),'2008',prem,0))/10000? 合計
? from test
?where btype='1'
?group by rollup(organ_id)
?order by 1;
?
4、lag和lead函數的使用技巧:
?
SQL> select rownum from t1;
ROWNUM
-------
????? 1
????? 2
???? 3
????? 4
????? 5
???? 6
???? 7
???? 8
???? 9
????? 10
SQL> select rownum,lag(rownum,3,8) over(order by rownum) LAG,
? 2? lead(rownum,3,8) over(order by rownum) LEAD from t1;
?
ROWNUM?? LAG?? LEAD
-------- ----- -----
?????? 1??? 8???? 4
?????? 2???? 8???? 5
?????? 3???? 8???? 6
?????? 4???? 1???? 7
?????? 5???? 2???? 8
?????? 6???? 3???? 9
?????? 7???? 4??? 10
?????? 8???? 5???? 8
????? 9???? 6???? 8
????? 10??? 7???? 8
?
說明:
LAG:列出某字段的前N行的值;
LEAD:列出某字段的后N行的值;
LAG(arg1,arg2,arg3):
arg1是從其他行返回的表達式
arg2是希望檢索的當前行分區的偏移量。是一個正的偏移量(不能為負),是一個往回檢索以前的行的數目。
arg3是在arg2表示的數目超出了分組的范圍時返回的值(必須為數字或空)
?
?
?
?
?
?
附一篇分析函數應用的博文:
*****************************************************************************************
?
---------------------------------------------------------------
Oracle層次查詢和分析函數在號段選取中的應用
?
?
Oracle層次查詢和分析函數
在號段選取中的應用
?
?
摘要
?
一組連續的數,去掉中間一些數,如何求出剩下的數的區間(即號段)?知道號段的起止,如何求出該號段內所有的數?知道一個大的號段范圍和已經取過的號段,如何求出可用的號段?利用Oracle提供的強大的查詢功能以及分析函數,我們可以很輕松的解決上述問題。
?
關鍵詞:
?
號段選取、連續數、斷點、層次查詢、分析函數、connect by、rownum、level、lead、lag
?
?
1. 問題的提出
在實際工作中,我們常常會碰到號段選取的問題,例如:
?
* 一組連續的數,去掉中間一些數,要求出剩下的數的區間(即號段)例如:一串數字為1,2,3,4,7,9,10,則號段為1-4,7-7,9-10
* 知道號段的起止,要求出該號段內所有的數例如:號段為1-3,15-15,則號段內所有的數為1,2,3,15
* 一組數,中間可能有斷點,要求出缺失的數例如:一串數字為1,2,3,4,7,9,10,則缺失的數為5,6,8
* 已知大號段范圍及已用號段范圍,求可用號段范圍例如:大號段范圍0-999,已用號段范圍0-200,399-599,則可用號段范圍為201-398,600-999
?
2. 基礎知識
先做下熱身運動,回顧一下層次查詢和lead/lag函數的運用。
?
2.1 偽列rownum和level
偽列就是并非在表中真正存在的列。已有很多資料介紹rownum和level這兩個偽列。這里只想強調一點,偽列是只針對結果集的。
?
2.2 利用層次查詢構造連續的數
* 產生5~8這4個連續的數
?
select * from (select rownum+4 from dual connect by rownum<5);
select * from (select level+4 from dual connect by level<5);
?
* 以8月為界,例如2005年8月1日,之前的在校學生入學年份為2001~2004,之后的為2002~2005。求當前日期下的在校學生入學年份:
?
select * from (select to_char(add_months(sysdate, 4), 'yyyy') - rownum from dual connect by rownum<5);
?
2.3 用分析函數Lead和Lag獲得相鄰行的字段值
select rn, lag(rn)over(order by rn) previos, lead(rn)over(order by rn) next from (select rownum+4 rn from dual connect by rownum<5);
??????? RN??? PREVIOS?????? NEXT
---------- ---------- ----------
???????? 5???????????????????? 6
???????? 6????????? 5????????? 7
???????? 7????????? 6????????? 8
???????? 8????????? 7
?
簡單的說,在這里,Lag是獲得前一行的內容,而Lead是獲得后一行的內容。
?
select rn, lag(rn,2,-1)over(order by rn) previos, lead(rn,2,-1) over(order by rn) next from (select rownum+4 rn from dual connect by rownum<5);
??????? RN??? PREVIOS?????? NEXT
---------- ---------- ----------
???????? 5???????? -1????????? 7
???????? 6???????? -1????????? 8
???????? 7????????? 5???????? -1
???????? 8????????? 6???????? -1
?
這里,通過指定offset參數來獲得兩行前的內容和兩行后的內容,如果offset超出范圍并且未設定默認值-1,那么系統會自動將其值設為NULL。
?
3. 問題的解決
有了基礎知識的積累,我們就可以解決前面提到的問題。
?
3.1 已知號碼求號段
3.1.1 題例
我有一個表結構,
?
fphm,kshm
2014,00000001
2014,00000002
2014,00000003
2014,00000004
2014,00000005
2014,00000007
2014,00000008
2014,00000009
2013,00000120
2013,00000121
2013,00000122
2013,00000124
2013,00000125
(第二個字段內可能是連續的數據,可能存在斷點。)
?
怎樣能查詢出來這樣的結果,查詢出連續的記錄來。
?
就像下面的這樣?
2014,00000001,00000005
2014,00000009,00000007
2013,00000120,00000122
2013,00000124,00000125
3.1.2 解答
思路:利用lag取得前一行的kshm,然后和本行的kshm想比,如果差值為1,說明這一行和上一行是連續的。由于首尾的特殊性,故而需要先用max和min來獲得首尾點。
select fphm, nvl(lag(e)over(partition by fphm order by s),minn) ST, nvl(S,maxn) EN from
(select fphm, lag(kshm,1) over(partition by fphm order by kshm) S, kshm E, min(kshm)over(partition by fphm) minn, max(kshm) over(partition by fphm) maxn from t)
where nvl(E-S-1,1)<>0;
FPHM? ST???????? EN
----- ---------- ----------
2013? 00000120?? 00000122
2013? 00000124?? 00000125
2014? 00000001?? 00000005
2014? 00000007?? 00000009
?
?
3.2 根據號段求出包含的數
3.2.1 題例
有表及測試數據如下:
?
CREATE TABLE T20
(
ID NUMBER(2),
S NUMBER(5),
E NUMBER(5)
);
INSERT INTO T20 ( ID, S, E ) VALUES ( 1, 10, 11);
INSERT INTO T20 ( ID, S, E ) VALUES ( 2, 1, 5);
INSERT INTO T20 ( ID, S, E ) VALUES ( 3, 88, 92);
COMMIT;
S為號段起點,E為號段終點,求出起點和終點之間的數(包括起點和終點)
?
3.2.2 解答
很明顯,這需要構造序列來解決問題
?
select a.id, a.s, a.e,b.dis, a.S+b.dis-1 h from
t20 a,
(select rownum dis from
(select max(e-s)+1 gap from t20)
connect by rownum<=gap) b
where a.e>=a.s+b.dis-1
order by a.id, 4
運行結果:
?
ID????? S????? E??????? DIS????????? H
--- ------ ------ ---------- ----------
? 1???? 10???? 11????????? 1???????? 10
? 1???? 10???? 11????????? 2???????? 11
? 2????? 1????? 5????????? 1????????? 1
? 2????? 1????? 5????????? 2????????? 2
? 2????? 1????? 5????????? 3????????? 3
? 2????? 1????? 5????????? 4????????? 4
? 2????? 1????? 5????????? 5????????? 5
? 3???? 88???? 92????????? 1???????? 88
? 3???? 88???? 92????????? 2???????? 89
? 3???? 88???? 92????????? 3???????? 90
? 3???? 88???? 92????????? 4???????? 91
? 3???? 88???? 92????????? 5???????? 92
?
?
我們再看下面這種做法:
?
select a.id, a.s, a.e,rownum, a.S+rownum-1 h from
t20 a ,
(select id, e-s+1 gap from t20 where id=2) b
where a.id=b.id
connect by rownum<=gap
?
ID????? S????? E???? ROWNUM????????? H
--- ------ ------ ---------- ----------
? 2????? 1????? 5????????? 1????????? 1
? 2????? 1????? 5????????? 2????????? 2
? 2????? 1????? 5????????? 3????????? 3
? 2????? 1????? 5????????? 4????????? 4
? 2????? 1????? 5????????? 5????????? 5
?
?
嗯,得到的結果也是正確的,若我們把粗斜體字部分去掉后,看看結果是什么樣:
?
select a.id, a.s, a.e,rownum, a.S+rownum-1 h from
t20 a ,
(select id, e-s+1 gap from t20) b
where a.id=b.id
connect by rownum<=gap
?
ID????? S????? E???? ROWNUM????????? H
--- ------ ------ ---------- ----------
? 1???? 10???? 11????????? 1???????? 10
? 1???? 10???? 11????????? 2???????? 11
? 2????? 1????? 5????????? 3????????? 3
? 2????? 1????? 5????????? 4????????? 4
? 2????? 1????? 5????????? 5????????? 5
? 2????? 1????? 5????????? 6????????? 6
? 3???? 88???? 92????????? 7???????? 94
?
?
這樣的結果,顯然不是我們需要的,更何況,這是錯誤的。由此更能深入理解,偽列是只針對結果集的。
?
3.3 求缺失的號
3.3.1 題例
table T,列:serial_no
?
我想能夠查詢一下serial_no這個字段的不連續的值。
?
例如:
?
serial_no
------------
1
2
3
4
6
8
9
10
?
我想一個sql語句查出來缺失的號碼,
?
顯示結果為:
?
5
7
?
3.3.2 解答
思路:找出數B和它前面的數A進行比較(數按從大到小進行排序),如果B-A=1,則說明是連續的,中間沒有斷點。
?
select distinct s+level-1 rlt from (select lag(serial_no,1) over(order by serial_no)+1 S, serial_no-1 E from t) where E-S<>0 connect by level<=e-s
?
3.4 求尚未使用的號段
3.4.1 題例
表A結構:
?
bill_type_id varchar2(1),
bill_start number,
bill_end number,
office_level varchar2(4)
?
數據如下:
?
A 0 ? 999 1
A 0 ? 199 2
A 300 499 2
A 700 799 2
?
sql目的是取出包含在level1級別里的,還沒有錄入level2級別的號段。
?
3.4.2 解答
這個好像是3.1和3.3這兩個問題的逆問題
?
創建表及測試數據:
?
CREATE TABLE T8
(
A NUMBER(4),
?B NUMBER(4),
?C NUMBER(4),
?Q VARCHAR2(1 BYTE)
);
?
Insert into T8(A, B, C, Q)Values(555, 666, 2, 'A');
Insert into T8(A, B, C, Q)Values(100, 199, 2, 'A');
Insert into T8(A, B, C, Q)Values(0, 999, 1, 'A');
Insert into T8(A, B, C, Q)Values(300, 499, 2, 'A');
COMMIT;
思路:將大號段的邊界與小號段的邊界相比,從大號段中將小號段“挖”掉,這樣剩下的就是可用號段了。
select S,E from
(
SELECT NVL2(LAG(A)OVER(PARTITION BY Q ORDER BY A), B+1, MIN(A)OVER(PARTITION BY Q)) S,
NVL(LEAD(A)OVER(PARTITION BY Q ORDER BY A)-1, MAX(B)OVER(PARTITION BY Q)) E
from t8 START WITH C=1 CONNECT BY C-1 = PRIOR C AND Q= PRIOR Q
)
where s<=e
?
運行結果:
?
???????? S????? E
---------- ------
???????? 0???? 99
?????? 200??? 299
?????? 500??? 554
?????? 667??? 999
?
?
?
?
參考資料
?
* 參考資料
* Oracle 8i SQL Reference
* 論壇相關討論帖子
?
作者簡介
?
*:本文為本人參加ITPUB 2007香山年會的論文。
?
lastwinner 發表于:2007.02.05 20:13 ::分類: ( Oracle , ) ::閱讀:(1519次) :: 評論 (7)
?
*****************************************************************************************
?
?