在線索引重建被取消導(dǎo)致大量ORA-600(kghstack_underflow_internal_3)錯誤
客戶在比較繁忙的時刻執(zhí)行了索引的REBUILD操作,導(dǎo)致大量會話被鎖,最終對ONLINE REBUILD操作執(zhí)行了取消操作,引發(fā)了一系列更嚴(yán)重的ORA-600錯誤。
登錄數(shù)據(jù)庫后,發(fā)現(xiàn)兩個節(jié)點上ORACLE_BASE所在目錄都已經(jīng)100%占用。數(shù)據(jù)庫無法正常通過/ AS SYSDBA方式登錄。
查詢告警日志發(fā)現(xiàn)大量的ORA-600和ORA-7445錯誤:
Tue May 08 21:20:26 EAT 2012
Errors in file /oracle/app/admin/orcl/udump/orcl2_ora_1555.trc:
ORA-07445: exception encountered: core dump [0000000000000000] [SIGSEGV] [Invalid permissions for mapped object] [0x000000000] [] []
Tue May 08 21:20:27 EAT 2012
Errors in file /oracle/app/admin/orcl/udump/orcl2_ora_3891.trc:
ORA-00600: internal error code, arguments: [kghstack_underflow_internal_3], [0x60000000003002F0], [keybuf], [], [], [], [], []
Tue May 08 21:20:27 EAT 2012
Errors in file /oracle/app/admin/orcl/udump/orcl2_ora_26190.trc:
ORA-07445: exception encountered: core dump [0000000000000000] [SIGSEGV] [Invalid permissions for mapped object] [0x000000000] [] []
Tue May 08 21:20:27 EAT 2012
Errors in file /oracle/app/admin/orcl/udump/orcl2_ora_2873.trc:
ORA-07445: exception encountered: core dump [0000000000000000] [SIGSEGV] [Invalid permissions for mapped object] [0x000000000] [] []
Tue May 08 21:20:27 EAT 2012
Errors in file /oracle/app/admin/orcl/udump/orcl2_ora_4518.trc:
ORA-00600: internal error code, arguments: [kghstack_underflow_internal_3], [0x60000000003002F0], [keybuf], [], [], [], [], []
Tue May 08 21:20:27 EAT 2012
Errors in file /oracle/app/admin/orcl/udump/orcl2_ora_22469.trc:
ORA-00600: internal error code, arguments: [kghstack_underflow_internal_3], [0x60000000003002F0], [keybuf], [], [], [], [], []
Tue May 08 21:20:28 EAT 2012
Errors in file /oracle/app/admin/orcl/udump/orcl2_ora_26440.trc:
ORA-07445: exception encountered: core dump [0000000000000000] [SIGSEGV] [Invalid permissions for mapped object] [0x000000000] [] []
Tue May 08 21:20:29 EAT 2012
Errors in file /oracle/app/admin/orcl/udump/orcl2_ora_762.trc:
ORA-07445: exception encountered: core dump [0000000000000000] [SIGSEGV] [Invalid permissions for mapped object] [0x000000000] [] []
Tue May 08 21:20:29 EAT 2012
Errors in file /oracle/app/admin/orcl/udump/orcl2_ora_26106.trc:
ORA-07445: exception encountered: core dump [0000000000000000] [SIGSEGV] [Invalid permissions for mapped object] [0x000000000] [] []
Tue May 08 21:20:30 EAT 2012
Errors in file /oracle/app/admin/orcl/udump/orcl2_ora_1597.trc:
ORA-07445:出現(xiàn)異常錯誤:核心轉(zhuǎn)儲[0000000000000000] [SIGSEGV] [Invalid permissions for mapped object] [0x000000000] [] []
Tue May 08 21:20:30 EAT 2012
Errors in file /oracle/app/admin/orcl/udump/orcl2_ora_856.trc:
ORA-07445: exception encountered: core dump [0000000000000000] [SIGSEGV] [Invalid permissions for mapped object] [0x000000000] [] []
.
.
.
檢查對應(yīng)的TRACE文件,發(fā)現(xiàn)導(dǎo)致錯誤語句執(zhí)行的是TABLE_A表的插入:
ksedmp: internal or fatal error tb
ORA-00600:內(nèi)部錯誤代碼,參數(shù): [kghstack_underflow_internal_3], [0x60000000003002F0], [keybuf], [], [], [], [], []
Current SQL statement for this session:
INSERT INTO TABLE_A (O_ID, P_ID, S_ID, F_ITEM, F_NAME, T_MON, D_MON, I_MON, P_STATE, P_TIME, R1, R2) VALUES (:B10 , SUBSTR(:B10 , LENGTH(:B10 ) - 1, 2), :B9 , :B8 , :B7 , :B6 , :B5 , :B4 , :B3 , NULL, :B2 , :B1 )
----- PL/SQL Call Stack -----
object line object
handle number name
c00000203c2cc550 119 package body U1.P_O_I
c00000203b788200 288 procedure U1.U_B_O_I
c00000203719b8d0 1 anonymous block
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst()+64 call ksedst1() 000000000 ? 000000001 ?
ksedmp()+2176 call ksedst() 000000000 ?
C000000000000D20 ?
4000000004037940 ?
000000000 ? 000000000 ?
000000000 ?
ksfdmp()+112 call ksedmp() 000000003 ?
9FFFFFFFFFFF1230 ?
60000000000BA290 ?
9FFFFFFFFFFF1800 ?
C000000000000999 ?
400000000407F9B0 ?
kgerinv()+304 call ksfdmp() 9FFFFFFFFFFF1D90 ?
000000003 ?
9FFFFFFFFFFF1810 ?
60000000000BA290 ?
C000000000000612 ?
40000000098C38B0 ?
kgeasnmierr()+144 call kgerinv() 60000000000318D0 ?
4000000001AD98A0 ?
6000000000032988 ?
4000000001AD98A0 ?
9FFFFFFFFFFF1DD0 ?
$cold_kghstack_unde call kgeasnmierr() 60000000000318D0 ?
rflow_internal()+36 9FFFFFFFBF3B1168 ?
8 9FFFFFFFBF3B1178 ?
6000000000032D00 ?
000000002 ?
60000000003002F0 ?
000000001 ? 000000006 ?
kghstack_free()+208 call $cold_kghstack_unde 60000000000318D0 ?
rflow_internal() 60000000003002F0 ?
60000000000BA290 ?
C000000000000B1D ?
4000000002F7A510 ?
00002C87B ?
6000000000031A10 ?
ksmfrs()+48 call kghstack_free() 60000000000318D0 ?
60000000003002F0 ?
rpiswu2()+1312 call ksmfrs() 60000000003002F0 ?
C000000000001026 ?
4000000002F78960 ?
000000000 ? 000000000 ?
000000000 ?
rpidrv()+2352 call rpiswu2() 9FFFFFFFFFFF2AF0 ?
4000000002F7AE60 ?
00002F833 ?
60000000000A7D20 ?
9FFFFFFFFFFF1E20 ?
C000000000001ABD ?
4000000000F14558 ?
60000000000C6CF0 ?
psddr0()+864 call rpidrv() 000000018 ? 000000066 ?
9FFFFFFFFFFF3700 ?
000000038 ?
9FFFFFFFFFFF2B20 ?
60000000000BA290 ?
psdnal()+736 call psddr0() 000000018 ? 000000066 ?
9FFFFFFFFFFF3700 ?
000000030 ?
pevm_EXECC()+832 call psdnal() 9FFFFFFFFFFF54D0 ?
C00000203489A9F8 ?
C0000000000011AA ?
9FFFFFFFBE832840 ?
C00000203C2CC550 ?
400000000313C770 ?
000026035 ?
pfrinstr_EXECC()+16 call pevm_EXECC() 9FFFFFFFBE83D1D0 ?
0 9FFFFFFFBE8328B0 ?
000000020 ?
pfrrun_no_tool()+19 call pfrinstr_EXECC() 9FFFFFFFBE83D1D0 ?
2 C000001DA198D61C ?
9FFFFFFFBE83D238 ?
pfrrun()+1376 call pfrrun_no_tool() 9FFFFFFFBE83D1D0 ?
000002001 ?
9FFFFFFFBE83D238 ?
60000000000BA290 ?
C000000000000A1C ?
4000000003163040 ?
9FFFFFFFBE83D620 ?
9FFFFFFFBE83D298 ?
plsql_run()+1328 call pfrrun() 9FFFFFFFFFFF3788 ?
9FFFFFFFFFFF3770 ?
60000000000BA290 ?
9FFFFFFFFFFF4370 ?
9FFFFFFFFFFF4370 ?
C000000000000E23 ?
4000000002C34D70 ?
peicnt()+560 call plsql_run() 9FFFFFFFFFFF4380 ?
000000001 ? 000000000 ?
9FFFFFFFFFFF4380 ?
60000000000BA290 ?
9FFFFFFFFFFF4E90 ?
kkxexe()+1008 call peicnt() 9FFFFFFFFFFF54D0 ?
9FFFFFFFBE83D1D0 ?
9FFFFFFFFFFF4EA0 ?
60000000000BA290 ?
9FFFFFFFFFFF5420 ?
C000000000000A1C ?
000000020 ?
9FFFFFFFFFFF4F00 ?
opiexe()+8016 call kkxexe() 0000051F0 ?
.
.
.
這個SQL語句是正常的業(yè)務(wù)操作,而這種簡單的INSERT都會導(dǎo)致ORA-600和ORA-7445錯誤,說明插入的表或索引本身存在問題。
檢查表和索引的狀態(tài),發(fā)現(xiàn)索引部分分區(qū)狀態(tài)異常:
CREATE INDEX "U1"."IDX_F_2"
ON "U1"."TABLE_A" ("S_ID")
PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING
STORAGE(
BUFFER_POOL DEFAULT) LOCAL
(PARTITION "P1"
PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "TBS7" ,
PARTITION "P2"
PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "TBS8" ,
PARTITION "P3"
PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "TBS9" ,
PARTITION "P4"
PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "TBS10" ,
PARTITION "P5"
PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "TBS1" ,
PARTITION "P6"
PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "TBS2" ,
PARTITION "P7"
PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "TBS3" ,
PARTITION "P8"
PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "TBS4" ,
PARTITION "P9"
PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "TBS5" ,
PARTITION "P10"
PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "TBS6" )
ALTER INDEX "U1"."IDX_F_2" MODIFY PARTITION "P6" UNUSABLE
ALTER INDEX "U1"."IDX_F_2" MODIFY PARTITION "P7" UNUSABLE
和客戶溝通后,確認(rèn)今天問題發(fā)生之前有人對索引進(jìn)行維護操作,操作語句包括:
alter index u1.IDX_F_2 rebuild partition p6 parallel
alter index U1.IDX_T_2 rebuild partition P9 nologging parallel online
alter index U1.IDX_T_1 rebuild partition P3 nologging parallel online
alter index U1.IDX_F_2 rebuild partition P1 nologging parallel online
alter index U1.IDX_T_4 rebuild partition P1 nologging parallel online
alter index U1.IDX_F_4 rebuild partition P5 nologging parallel online
在對索引進(jìn)行維護之后,數(shù)據(jù)庫中出現(xiàn)了大量的enq: TM – contention等待,TM鎖出現(xiàn)的原和是索引IDX_F_2的PARTITION P6重建語句沒有添加ONLINE參數(shù)有直接關(guān)系。隨后大量會話被阻塞,而整個索引重建操作被人為中止,其中包括正在運行的ONLINE REBUILD操作,而ONLINE REBUILD操作被中止是十分危險的,很容易導(dǎo)致bug的產(chǎn)生,而當(dāng)前就導(dǎo)致了ORA-600和ORA-7445的產(chǎn)生。
由于客戶著急解決問題,因此對數(shù)據(jù)庫進(jìn)行了重啟。重啟后,ORA-600錯誤信息不再出現(xiàn)。但是前臺應(yīng)用報錯,索引失效。
SELECT 'ALTER INDEX U1.' || INDEX_NAME || ' REBUILD PARTITION ' || PARTITION_NAME || ';'
FROM DBA_IND_PARTITIONS
WHERE INDEX_OWNER = 'U1'
AND STATUS != 'USABLE';
獲取所有失效的索引重建語句,將索引重新編譯后,數(shù)據(jù)庫恢復(fù)正常。
在Oracle的MOS上沒有找到與之最相關(guān)的bug信息,反而是找到了一個相關(guān)性很高的補丁信息,而對應(yīng)的bug信息并沒有公開:補丁程序13737888: ONLINE DDL:ORA-600[KGHSTACK_UNDERFLOW_INTERNAL_3], [0X2B7F4E1E7678], [KEYBUF]。