#
最近發現10g的RMAN備份保留REDUNDANCY策略和9i相比發生了改變。
在Oracle9i中,備份保留策略的REDUNDANCY的值,指的是備份冗余的個數。也就是說,如果REDUNDANCY設置為1,那么Oracle會保留2個備份。
但是在10g以后,REDUNDANCY的值,就是最終備份保留的值,手頭沒有10g的環境,用11g的rman做了一個例子:
solaris*orcl-/home/oracle$ rman target /
Recovery Manager: Release 11.2.0.3.0 - Production on Sun Jul 8 19:04:43 2012
Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.
connected to target database: ORCL (DBID=1299676637)
RMAN> show retention policy;
using target database control file instead of recovery catalog
RMAN configuration parameters for database with db_unique_name ORCL are:
CONFIGURE RETENTION POLICY TO REDUNDANCY 1; # default
RMAN> backup tablespace ts_32k;
Starting backup at 08-JUL-12
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=180 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datbafile(s) in backup set
input datafile file number=00005 name=/u01/app/oracle/oradata/ORCL/datafile/o1_mf_ts_32k_7w1w3zmb_.dbf
channel ORA_DISK_1: starting piece 1 at 08-JUL-12
channel ORA_DISK_1: finished piece 1 at 08-JUL-12
piece handle=/u01/app/oracle/flash_recovery_area/ORCL/backupset/2012_07_08/o1_mf_nnndf_TAG20120708T190559_7zltdqxy_.bkp tag=TAG20120708T190559 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 08-JUL-12
RMAN> backup tablespace ts_32k;
Starting backup at 08-JUL-12
using channel ORA_DISK_1
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datbafile(s) in backup set
input datafile file number=00005 name=/u01/app/oracle/oradata/ORCL/datafile/o1_mf_ts_32k_7w1w3zmb_.dbf
channel ORA_DISK_1: starting piece 1 at 08-JUL-12
channel ORA_DISK_1: finished piece 1 at 08-JUL-12
piece handle=/u01/app/oracle/flash_recovery_area/ORCL/backupset/2012_07_08/o1_mf_nnndf_TAG20120708T190609_7zltf22b_.bkp tag=TAG20120708T190609 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:02
Finished backup at 08-JUL-12
RMAN> list backup of tablespace ts_32k;
List of Backup Sets
===================
BS Key Type LV Size Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ ---------------
20 Full 2.69M DISK 00:00:01 08-JUL-12
BP Key: 20 Status: AVAILABLE Compressed: NO Tag: TAG20120708T190559
Piece Name: /u01/app/oracle/flash_recovery_area/ORCL/backupset/2012_07_08/o1_mf_nnndf_TAG20120708T190559_7zltdqxy_.bkp
List of Datafiles in backup set 20
File LV Type Ckp SCN Ckp Time Name
---- -- ---- ---------- --------- ----
5 Full 28932281 08-JUL-12 /u01/app/oracle/oradata/ORCL/datafile/o1_mf_ts_32k_7w1w3zmb_.dbf
BS Key Type LV Size Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ ---------------
21 Full 2.69M DISK 00:00:01 08-JUL-12
BP Key: 21 Status: AVAILABLE Compressed: NO Tag: TAG20120708T190609
Piece Name: /u01/app/oracle/flash_recovery_area/ORCL/backupset/2012_07_08/o1_mf_nnndf_TAG20120708T190609_7zltf22b_.bkp
List of Datafiles in backup set 21
File LV Type Ckp SCN Ckp Time Name
---- -- ---- ---------- --------- ----
5 Full 28932300 08-JUL-12 /u01/app/oracle/oradata/ORCL/datafile/o1_mf_ts_32k_7w1w3zmb_.dbf
RMAN> delete obsolete;
RMAN retention policy will be applied to the command
RMAN retention policy is set to redundancy 1
using channel ORA_DISK_1
Deleting the following obsolete backups and copies:
Type Key Completion Time Filename/Handle
-------------------- ------ ------------------ --------------------
Backup Set 20 08-JUL-12
Backup Piece 20 08-JUL-12 /u01/app/oracle/flash_recovery_area/ORCL/backupset/2012_07_08/o1_mf_nnndf_TAG20120708T190559_7zltdqxy_.bkp
Do you really want to delete the above objects (enter YES or NO)? yes
deleted backup piece
backup piece handle=/u01/app/oracle/flash_recovery_area/ORCL/backupset/2012_07_08/o1_mf_nnndf_TAG20120708T190559_7zltdqxy_.bkp RECID=20 STAMP=788123159
Deleted 1 objects
可以看到,從10g開始設置的REDUNDANCY的值,就是最終備份保留的個數。為了確認這個問題,特意查詢了一下9i和10g的官方文檔。
9i的說法是:
The REDUNDANCY parameter specifies that any number of backups or copies beyond a specified number need not be retained.
而10g的文檔中,該參數的描述變為:
A redundancy-based backup retention policy determines whether a backup is obsolete based on how many backups of a file are currently on disk.
Oracle改變功能的實現很常見,但是沒有想到,對于這種細節的定義也會調整。對于熟悉9i備份策略的DBA需要留神,在設置10g以后的RMAN備份保留策略時,需要在9i的基礎上增加1。
客戶數據庫版本為9208 RAC FOR AIX,客戶反應系統緩慢,檢查告警日志,發現大量Library cache lock和Library cache load lock等待。
由于客戶的原因,這個問題只是遠程協助的方式幫忙檢查了一下,因此沒有留下任何的操作記錄,這里只是簡單描述一下問題。
客戶反應數據庫操作響應變慢,平時一個執行很快的基于主鍵的UPDATE操作也變得異常緩慢,且執行計劃本身并未發生改變。
登錄數據庫后檢查兩個節點上的告警日志,并未發現任何異常報錯。分別檢查兩個實例的等待信息,發現除了上面提到的大量Library cache lock和Library cache load lock以外,還有明顯的gc等待。
但是隨后發現,查詢V$SESSION和GV$SESSION的結果居然沒有區別,接著查詢GV$INSTANTBCE視圖,發現只有當前的實例存在,而此時恰好連接另一個節點的工具出現了斷連,以至于我一度以為另外一個節點上的實例已經DOWN掉,但是隨后重新登錄到該節點上,發現數據庫實例仍然存在,而且登錄到數據庫實例中也可以進行任何正常的操作。不過發現在當前節點所有的GV$視圖都只會返回當前實例的信息,這與另外一個節點的情況完全一樣。顯然兩個節點間的通信出現了問題,當前節點已經不清楚另外一個節點的狀態的。
現在再去分析那些等待信息已經沒有太多的意義了,因為整個數據庫已經處于不正常的狀態。不難推斷,當前數據庫的異常是由于節點間的通信異常導致。由于9i使用的操作系統的CLUSTER,還沒有Oracle的clusterware,剩下只能由操作系統或硬件維護人員去進一步跟蹤了。
最終數據庫和系統在夜間閑時進行了重啟操作,重啟后數據庫恢復正常,GV$視圖的結果也恢復了正常。
正常情況下,DBMS_OUTPUT包無法直接輸出一個空行。
以前還真沒有注意這個問題,前兩天想在輸出結果的時候進行一下簡單的格式化,發現了這個問題:
SQL> set serverout on
SQL> begin
2 dbms_output.put_line('a');
3 dbms_output.put_line(' ');
4 dbms_output.put_line('b');
5 dbms_output.new_line;
6 dbms_output.put_line('c');
7 end;
8 /
a
b
c
PL/SQL procedure successfully completed.
導致問題的原因在于,如果使用DBMS_OUTPUTB包輸出的一行都是不可見字符,那么這行內容被DBMS_OUTPUT包忽略掉。
雖然DBMS_OUTPUT包本身并沒有提供開關來屏蔽這個屬性,不過這個問題依然很容易解決,最簡單的方法莫過于直接把回車包含在字符串中:
SQL> begin
2 dbms_output.put_line('a
3
4 b');
5 dbms_output.put_line('
6 c');
7 end;
8 /
a
b
c
PL/SQL procedure successfully completed.
當然這種方法有可能導致PL/SQL代碼的可讀性變差,也容易影響代碼的縮進格式,此外還有一種方式:
SQL> begin
2 dbms_output.put_line('a' || chr(10) || chr(13));
3 dbms_output.put_line('b');
4 dbms_output.put_line(chr(10) || chr(13) || 'c');
5 end;
6 /
a
b
c
PL/SQL procedure successfully completed.
客戶10.2.0.4環境告警日志出現ORA-27468錯誤。
詳細錯誤信息為:
Errors in file /u01/app/oracle/admin/orcl/bdump/orcl1_j000_18724.trc:
ORA-12012: error on auto execute of job 42791
ORA-27468: "EXFSYS.RLM$EVTCLEANUP" is locked by another process
導致這個錯誤的原因在于升級時沒有執行catupgrd.sql,而是執行了建庫的部分腳本如catalog.sql和catproc.sql,這導致只有CATALOG視圖和系統的PACKAGE以及TYPE的版本更新到10.2.0.4,而其他數據庫中組件的版本并沒有升級,仍然是10.2.0.1。
在MOS文檔ORA-12012 ORA-27468: "SYS.PURGE_LOG" is Locked by Another Process [ID 751884.1]中介紹了這個錯誤,這個問題可能發生在10.2.0.2到10.2.0.5之間,解決問題的方法很簡單,在閑時執行catbupgrd.sql,完成升級組件的后續操作既可。
客戶的10.2.0.4 RAC for Hp-un環境碰到了這個錯誤。
錯誤信息為:
Wed Feb 29 19:42:05 2012
Errors in file /opt/app/oracle/admin/orcl/udump/orcl1_ora_11261.trc:
ORA-00600: internal error code, arguments: [qersqCloseRem-2], [Invalid Handle], [], [], [], [], [], []
ORA-02068: following severe error from WEBDB.COM
ORA-03113: end-of-file on communication channel
Wed Feb 29 19:42:05 2012
Errors in file /opt/app/oracle/admin/orcl/udump/orcl1_ora_32036.trc:
ORA-00600: internal error code, arguments: [qersqCloseRem-2], [Invalid Handle], [], [], [], [], [], []
ORA-02068: following severe error from WEBDB.COM
ORA-03113: end-of-file on communication channel
Wed Feb 29 19:42:05 2012
Errors in file /opt/app/oracle/admin/orcl/udump/orcl1_ora_5935.trc:
ORA-00600: internal error code, arguments: [qersqCloseRem-2], [Invalid Handle], [], [], [], [], [], []
ORA-02068: following severe error from WEBDB.COM
ORA-03113: end-of-file on communication channel
Wed Feb 29 19:42:05 2012
Errors in file /opt/app/oracle/admin/orcl/udump/orcl1_ora_5026.trc:
ORA-00600: internal error code, arguments: [qersqCloseRem-2], [Invalid Handle], [], [], [], [], [], []
ORA-02068: following severe error from WEBDB.COM
ORA-03113: end-of-file on communication channel
Wed Feb 29 19:42:05 2012
Errors in file /opt/app/oracle/admin/orcl/udump/orcl1_ora_7620.trc:
ORA-00600: internal error code, arguments: [qersqCloseRem-2], [Invalid Handle], [], [], [], [], [], []
ORA-02068: following severe error from WEBDB.COM
ORA-03113: end-of-file on communication channel
Wed Feb 29 19:42:08 2012
Trace dumping is performing id=[cdmp_20120229194207]
Wed Feb 29 19:42:17 2012
Trace dumping is performing id=[cdmp_20120229194217]
這個ORA-600[qersqCloseRem-2]錯誤非常罕見,在MOS上居然沒有任何記載。不過從tb錯誤信息進行進一步的分析,這個錯誤發生在遠端數據庫的訪問異常。
檢查進一步的詳細信息:
*** 2012-02-29 19:42:05.564
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [qersqCloseRem-2], [Invalid Handle], [], [], [], [], [], []
ORA-02068: following severe error from WEBDB.COM
ORA-03113: end-of-file on communication channel
Current SQL statement for this session:
SELECT ACCESS_LOG_SEQUENCE.NEXTVAL@WEBDB.COM FROM DUAL
----- PL/SQL Call Stack -----
object line object
handle number name
0x39b5c3720 5 ECOMMERCE.P_USER_AT
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst()+31 call ksedst1() 000000000 ? 000000001 ?
7FBFFF4370 ? 7FBFFF43D0 ?
7FBFFF4310 ? 000000000 ?
ksedmp()+610 call ksedst() 000000000 ? 000000001 ?
7FBFFF4370 ? 7FBFFF43D0 ?
7FBFFF4310 ? 000000000 ?
ksfdmp()+21 call ksedmp() 000000003 ? 000000001 ?
7FBFFF4370 ? 7FBFFF43D0 ?
7FBFFF4310 ? 000000000 ?
.
.
.
0059DF200 ? 683F6E400000001 ?
main()+116 call opimai_real() 000000002 ? 7FBFFFF4E0 ?
000000004 ? 7FBFFFF478 ?
0059DF200 ? 683F6E400000001 ?
__libc_start_main() call main() 000000002 ? 7FBFFFF4E0 ?
+219 000000004 ? 7FBFFFF478 ?
0059DF200 ? 683F6E400000001 ?
_start()+42 call __libc_start_main() 0007139F8 ? 000000002 ?
7FBFFFF628 ? 0052B4BD0 ?
000000000 ? 000000002 ?
--------------------- Binary Stack Dump ---------------------
從詳細TRACE分析,在問題發生時刻,正在通過數據庫鏈讀取遠端序列的值。而此時出現的ORA-3113通信錯誤,多半與遠端數據庫狀態異常有關。
檢查遠端數據庫的告警日志,果然發現在問題出現時刻,數據庫狀態異常并最終導致了實例重啟:
Wed Feb 29 19:39:29 2012
WARNING: inbound connection timed out (ORA-3136)
Wed Feb 29 19:39:30 2012
WARNING: inbound connection timed out (ORA-3136)
Wed Feb 29 19:40:01 2012
WARNING: inbound connection timed out (ORA-3136)
Wed Feb 29 19:40:01 2012
WARNING: inbound connection timed out (ORA-3136)
Wed Feb 29 19:40:01 2012
WARNING: inbound connection timed out (ORA-3136)
Wed Feb 29 19:40:01 2012
WARNING: inbound connection timed out (ORA-3136)
Wed Feb 29 19:40:01 2012
WARNING: inbound connection timed out (ORA-3136)
Wed Feb 29 19:40:01 2012
WARNING: inbound connection timed out (ORA-3136)
.
.
.
Wed Feb 29 19:43:28 2012
WARNING: inbound connection timed out (ORA-3136)
Wed Feb 29 19:43:28 2012
WARNING: inbound connection timed out (ORA-3136)
Wed Feb 29 19:43:28 2012
WARNING: inbound connection timed out (ORA-3136)
Wed Feb 29 19:43:28 2012
WARNING: inbound connection timed out (ORA-3136)
Wed Feb 29 19:43:29 2012
WARNING: inbound connection timed out (ORA-3136)
Wed Feb 29 19:43:29 2012
WARNING: inbound connection timed out (ORA-3136)
Wed Feb 29 19:43:29 2012
WARNING: inbound connection timed out (ORA-3136)
Wed Feb 29 19:43:29 2012
WARNING: inbound connection timed out (ORA-3136)
Wed Feb 29 19:43:30 2012
WARNING: inbound connection timed out (ORA-3136)
Wed Feb 29 19:45:26 2012
PMON failed to acquire latch, see PMON dump
Wed Feb 29 19:46:32 2012
WARNING: inbound connection timed out (ORA-3136)
Wed Feb 29 19:46:33 2012
WARNING: inbound connection timed out (ORA-3136)
Wed Feb 29 19:46:34 2012
PMON failed to acquire latch, see PMON dump
Wed Feb 29 19:46:40 2012
WARNING: inbound connection timed out (ORA-3136)
Wed Feb 29 19:46:43 2012
WARNING: inbound connection timed out (ORA-3136)
Wed Feb 29 19:46:44 2012
Errors in file /opt/app/oracle/admin/orcl/bdump/orcl1_asmb_14614.trc:
ORA-15064: communication failure with ASM instance
ORA-03113: end-of-file on communication channel
Wed Feb 29 19:46:44 2012
ASMB: terminating instance due to error 15064
Wed Feb 29 19:46:44 2012
System state dump is made for local instance
System State dumped to trace file /opt/app/oracle/admin/orcl/bdump/orcl1_diag_14555.trc
Wed Feb 29 19:46:47 2012
Shutting down instance (abort)
License high water mark = 1623
Wed Feb 29 19:46:49 2012
Instance terminated by ASMB, pid = 14614
Wed Feb 29 19:46:52 2012
Instance terminated by USER, pid = 3684
顯然遠端數據庫狀態異常是這個ORA-600錯誤的直接原因。
簡單描述一下Oracle VM Server的安裝過程。
需要注意,VM 3.0以上版本才支持升級操作,因此在VM 2.2沒有辦法升級到當前版本,安裝VM 3.0將會刪除服務器上所有的數據。
將VM Server的光盤放入,并從光盤啟動服務器。
在啟動界面直輸入Enter開始安裝過程:
Oracle會提示是否監測截至,這里可以直接SKIP跳過;
鍵盤選擇:選擇us;
然后是版權聲明,選擇Accept后,開始正式的安裝步驟;
如果服務器上沒有系統,那么會直接進入后面的分區階段,否則會提示重裝系統還是在原有系統上升級;
選擇ReInstall后,會顯示當前系統磁盤分區信息,首先選擇準備進行系統安裝的分區,然后選擇Remove all partitions and create a new defaultb partition layout,Oracle在格式化分區之前會要求再次確認,并詢問是否預覽分區空間詳細配置,可以完全按照默認推薦值安裝,因此這里可以跳過,也可以進入到分區空間修改頁面進行自定義的修改;
隨后選擇Boot Loader配置,選擇Master Boot Record;
然后選擇一個管理網絡接口,手工輸入IP和掩碼,在下一個頁面輸入網關、DNS信息,接著是主機名信息;
配置服務器所在時區,配置中找不到北京,可以tb設置Asia/Shanghai代替;
分別輸入Agent密碼和root密碼后,安裝操作完成,這是會提示整個安裝的日志文件的位置。
在重啟界面選擇REBOOT,完成整個安裝過程。
啟動后,進入Oracle VM Server 3.0控制臺界面,可以通過Alt + F2進入linux的登錄界面。至此VM Server安裝完成。
在VM Manager中搜索VM Server時出現這個錯誤。
按照VM Server以及VM Manager后,通過指定IP地址,讓VM Manager自動尋找VM Server,結果JOB運行報錯,詳細的錯誤信息為:
Job Construction Phase
----------------------
begin()
Appended operation 'Discover Manager Server Discover' to object 'OVM Foundry : Discover Manager'.
commit()
Completed Step: COMMIT
Objects and Operations
----------------------
Object (IN_USE): [Server] 35:38:33:39:31:34:43:4e:47:31:33:30:53:37:33:42 (server2.zihexin.com)
Object (IN_USE): [DiscoverManager] OVM Foundry : Discover Manager
Operation: Discover Manager Server Discover
Job Running Phase at 18:05 on Fri, Nov 25, 2011
----------------------------------------------
Job Participants: []
Actioner
--------
Starting operation 'Discover Manager Server Discover' on object 'OVM Foundry : Discover Manager'
Setting Context to model only in job with id=1322215534120
Job Internal Error (Operation)com.oracle.ovm.mgr.api.exception.FailedOperationException: OVMAPI_4010E Attempt to send command: discover_server to server: 10.0.10.171 failed. OVMAPI_4004E Server Failed Command: discover_server, Status:
Fri Nov 25 18:05:34 CST 2011
Fri Nov 25 18:05:34 CST 2011
at com.oracle.ovm.mgr.action.ActionEngine.sendCommandToServer(ActionEngine.java:474)
at com.oracle.ovm.mgr.action.ActionEngine.sendDiscoverCommand(ActionEngine.java:283)
at com.oracle.ovm.mgr.action.ServerAction.getServerInfo(ServerAction.java:95)
at com.oracle.ovm.mgr.discover.ovm.ServerBasicDiscoverHandler.query(ServerBasicDiscoverHandler.java:131)
at com.oracle.ovm.mgr.discover.ovm.ServerBasicDiscoverHandler.query(ServerBasicDiscoverHandler.java:61)
at com.oracle.ovm.mgr.discover.ovm.DiscoverHandler.execute(DiscoverHandler.java:50)
at com.oracle.ovm.mgr.discover.DiscoverEngine.handleDiscover(DiscoverEngine.java:435)
at com.oracle.ovm.mgr.discover.DiscoverEngine.discoverNewServer(DiscoverEngine.java:345)
at com.oracle.ovm.mgr.discover.DiscoverEngine.discoverServer(DiscoverEngine.java:265)
at com.oracle.ovm.mgr.op.manager.DiscoverManagerServerDiscover.action(DiscoverManagerServerDiscover.java:48)
at com.oracle.ovm.mgr.api.job.JobEngine.operationActioner(JobEngine.java:191)
at com.oracle.ovm.mgr.api.job.JobEngine.objectActioner(JobEngine.java:257)
at com.oracle.ovm.mgr.api.job.InternalJobDbImpl.objectCommitter(InternalJobDbImpl.java:1019)
at com.oracle.odof.core.AbstractVessel.invokeMethod(AbstractVessel.java:223)
at com.oracle.odof.core.BasicWork.invokeMethod(BasicWork.java:136)
at com.oracle.odof.command.InvokeMethodCommand.process(InvokeMethodCommand.java:100)
at com.oracle.odof.core.BasicWork.processCommand(BasicWork.java:81)
at com.oracle.odof.core.TransactionManager.processCommand(TransactionManager.java:751)
at com.oracle.odof.core.WorkflowManager.processCommand(WorkflowManager.java:395)
at com.oracle.odof.core.WorkflowManager.processWork(WorkflowManager.java:453)
at com.oracle.odof.io.AbstractClient.run(AbstractClient.java:42)
at java.lang.Thread.run(Thread.java:662)
Caused by: com.oracle.ovm.mgr.api.exception.IllegalOperationException: OVMAPI_4004E Server Failed Command: discover_server, Status:
Fri Nov 25 18:05:34 CST 2011
at com.oracle.ovm.mgr.action.ActionEngine.sendAction(ActionEngine.java:752)
at com.oracle.ovm.mgr.action.ActionEngine.sendCommandToServer(ActionEngine.java:470)
... 24 more
FailedOperationCleanup
----------
Starting failed operation 'Discover Manager Server Discover' cleanup on object 'OVM Foundry : Discover Manager'
Complete rollback operation 'Discover Manager Server Discover' completed with direction=OVM Foundry : Discover Manager
Rollbacker
----------
Objects To Be Rolled Back
-------------------------
Object (IN_USE): [Server] 35:38:33:39:31:34:43:4e:47:31:33:30:53:37:33:42 (server2.zihexin.com)
Object (IN_USE): [DiscoverManager] OVM Foundry : Discover Manager
Completed Step: ROLLBACK
Job failed commit (internal) due to OVMAPI_4010E Attempt to send command: discover_server to server: 10.0.10.171 failed. OVMAPI_4004E Server Failed Command: discover_server, Status:
Fri Nov 25 18:05:34 CST 2011
Fri Nov 25 18:05:34 CST 2011
com.oracle.ovm.mgr.api.exception.FailedOperationException: OVMAPI_4010E Atbtempt to send command: discover_server to server: 10.0.10.171 failed. OVMAPI_4004E Server Failed Command: discover_server, Status:
Fri Nov 25 18:05:34 CST 2011
Fri Nov 25 18:05:34 CST 2011
at com.oracle.ovm.mgr.action.ActionEngine.sendCommandToServer(ActionEngine.java:474)
at com.oracle.ovm.mgr.action.ActionEngine.sendDiscoverCommand(ActionEngine.java:283)
at com.oracle.ovm.mgr.action.ServerAction.getServerInfo(ServerAction.java:95)
at com.oracle.ovm.mgr.discover.ovm.ServerBasicDiscoverHandler.query(ServerBasicDiscoverHandler.java:131)
at com.oracle.ovm.mgr.discover.ovm.ServerBasicDiscoverHandler.query(ServerBasicDiscoverHandler.java:61)
at com.oracle.ovm.mgr.discover.ovm.DiscoverHandler.execute(DiscoverHandler.java:50)
at com.oracle.ovm.mgr.discover.DiscoverEngine.handleDiscover(DiscoverEngine.java:435)
at com.oracle.ovm.mgr.discover.DiscoverEngine.discoverNewServer(DiscoverEngine.java:345)
at com.oracle.ovm.mgr.discover.DiscoverEngine.discoverServer(DiscoverEngine.java:265)
at com.oracle.ovm.mgr.op.manager.DiscoverManagerServerDiscover.action(DiscoverManagerServerDiscover.java:48)
at com.oracle.ovm.mgr.api.job.JobEngine.operationActioner(JobEngine.java:191)
at com.oracle.ovm.mgr.api.job.JobEngine.objectActioner(JobEngine.java:257)
at com.oracle.ovm.mgr.api.job.InternalJobDbImpl.objectCommitter(InternalJobDbImpl.java:1019)
at sun.reflect.GeneratedMethodAccessor1001.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at com.oracle.odof.core.AbstractVessel.invokeMethod(AbstractVessel.java:223)
at com.oracle.odof.core.BasicWork.invokeMethod(BasicWork.java:136)
at com.oracle.odof.command.InvokeMethodCommand.process(InvokeMethodCommand.java:100)
at com.oracle.odof.core.BasicWork.processCommand(BasicWork.java:81)
at com.oracle.odof.core.TransactionManager.processCommand(TransactionManager.java:751)
at com.oracle.odof.core.WorkflowManager.processCommand(WorkflowManager.java:395)
at com.oracle.odof.core.WorkflowManager.processWork(WorkflowManager.java:453)
at com.oracle.odof.io.AbstractClient.run(AbstractClient.java:42)
at java.lang.Thread.run(Thread.java:662)
Caused by: com.oracle.ovm.mgr.api.exception.IllegalOperationException: OVMAPI_4004E Server Failed Command: discover_server, Status:
Fri Nov 25 18:05:34 CST 2011
at com.oracle.ovm.mgr.action.ActionEngine.sendAction(ActionEngine.java:752)
at com.oracle.ovm.mgr.action.ActionEngine.sendCommandToServer(ActbionEngine.java:470)
... 24 more
----------
End of Job
----------
由于關鍵性信息確實,所以無法判斷導致錯誤的原因。即使是在metalink或GOOGLE中查詢,也得不到任何有價值的信息。
雖然在VM Manager中得不到有意義的信息,但是在VM Server上,卻可以得到更詳細的信息,通過檢查var/log/ovs-agent.log文件,獲取到下面的信息:
[2011-04-16 13:21:46 25970] ERROR (OVSAgentServer:108) Unauthorized access attempt from ('10.0.10.173', 59424)!
Traceback (most recent call last):
File "/opt/ovs-agent-3.0/OVSAgentServer.py", line 103, in do_POST
auth(username, password)
File "/opt/ovs-agent-3.0/OVSAgentServer.py", line 42, in auth
raise Exception('Authorization failed: user does not exist or password error.')
Exception: Authorization failed: user does not exist or password error.
[2011-04-16 13:21:46 25970] INFO (OVSAgentServer:169) code 403, message Unauthorized access attempt from ('10.0.10.173', 59424)!
這次信息就明確多了,顯然是由于VM Manager中配置的密碼不正確所致,在VM Server上修改oracle用戶密碼:
[root@server2 ~]# ovs-agent-passwd oracle
Password:
Again:
在搜索VM Server時使用這里修改的密碼,VM Manager成功的發現了VM Server信息。
一個客戶碰到的具體需求,分區表中有些分區所在的表空間被OFFLINE,tb導致在刪除統計信息時報錯。
下面通過例子來說明這個問題:
SQL> create table t_part_read (id number)
2 partition by range (id)
3 (partition p1 values less than (10) tablespace ts1,
4 partition p2 values less than (20) tablespace ts2,
5 partition pmax values less than (maxvalue) tablespace users);
Table created.
SQL> insert into t_part_read select rownum from tab;
54 rows created.
SQL> commit;
Commit complete.
SQL> exec dbms_stats.gather_table_stats(user, 'T_PART_READ')
PL/SQL procedure successfully completed.
SQL> alter tablespace ts1 read only;
Tablespace altered.
SQL> exec dbms_stats.gather_table_stats(user, 'T_PART_READ')
PL/SQL procedure successfully completed.
SQL> alter tablespace ts1 offline;
Tablespace altered.
SQL> exec dbms_stats.gather_table_stats(user, 'T_PART_READ')
BEGIN dbms_stats.gather_table_stats(user, 'T_PART_READ'); END;
*
ERROR at line 1:
ORA-00376: file 6 cannot be read at this time
ORA-01110: data file 6: '/u01/app/oracle/oradata/ORCL/datafile/o1_mf_ts1_7w8l5fz1_.dbf'
ORA-06512: at "SYS.DBMS_STATS", line 23829
ORA-06512: at "SYS.DBMS_STATS", line 23880
ORA-06512: at line 1
如果將表空間只讀,并不會影響到表空間上的表或分區的統計信息的收集,因為收集過程只是讀取,而收集的結果信息是寫到SYSTEM表空間的。
但是如果分區所在的表空間處于OFFLINE狀態,那么在統計信息收集的過程中就會報錯。
有一個很簡單的方法可以解決這個問題,就是將被OFFLINE影響的分區的統計信息鎖定,這樣Oracle在收集統計信息時就會跳過鎖定的分區,通過這個辦法就可以避免統計信息收集過程中的報錯:
SQL> exec dbms_stats.lock_partition_stats(user, 'T_PART_READ', 'P1')
PL/SQL procedure successfully completed.
SQL> exec dbms_stats.gather_table_stats(user, 'T_PART_READ')
BEGIN dbms_stats.gather_table_stats(user, 'T_PART_READ'); END;
*
ERROR at line 1:
ORA-00376: file 6 cannot be read at this time
ORA-01110: data file 6: '/u01/app/oracle/oradata/ORCL/datafile/o1_mf_ts1_7w8l5fz1_.dbf'
ORA-06512: at "SYS.DBMS_STATS", line 23829
ORA-06512: at "SYS.DBMS_STATS", line 23880
ORA-06512: at line 1
SQL> exec dbms_stats.gather_table_stats(user, 'T_PART_READ', granularity => 'PARTITION')
PL/SQL procedure successfully completed.
即使鎖定分區后,嘗試收集統計信息仍然報錯,這是因為Oracle默認除了要收集分區上的統計信息以外,還要收集表級的統計信息,而這就會造成被OFFLINE影響的分區也要被讀取。
解決方法就是在收集統計信息的時候指定收集的粒度是分區,不收集表上的GLOBAL信息。
當用戶密碼即將過期時,在登錄時Oracle會提示ORA-28002錯誤,但是并不會影響正常的登錄。
本來認為這個信息并沒有太大的影響,但是沒想到這個tb錯誤會導致SET AUTOTRACE功能失效:
solaris*orcl-/home/oracle$ sqlplus test/test
SQL*Plus: Release 11.2.0.3.0 Production on Fri Jul 13 11:27:28 2012
Copyright (c) 1982, 2011, Oracle. All rights reserved.
ERROR:
ORA-28002: the password will expire within 1 days
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Oracle Label Security and Real Application Testing options
SQL> set autot on
ERROR:
ORA-28002: the password will expire within 1 days
SP2-0619: Error while connecting
SP2-0611: Error enabling STATISTICS report
SQL> alter user test identified by test;
User altered.
SQL> set autot on
ERROR:
ORA-24315: illegal attribute type
SP2-0619: Error while connecting
SP2-0611: Error enabling STATISTICS report
SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Oracle Label Security and Real Application Testing options
solaris*orcl-/home/oracle$ sqlplus test/test
SQL*Plus: Release 11.2.0.3.0 Production on Fri Jul 13 11:27:52 2012
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Oracle Label Security and Real Application Testing options
SQL> set autot on
SQL> insert into t values (1, 'a');
1 row created.
Execution Plan
----------------------------------------------------------
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | 1 | 12 | 1 (0)| 00:00:01 |
| 1 | LOAD TABLE CONVENTIONAL | T | | | | |
---------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
71 recursive calls
9 db block gets
75 consistent gets
10 physical reads
1284 redo size
829 bytes sent via SQL*Net to client
785 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
9 sorts (memory)
0 sorts (disk)
1 rows processed
很顯然,由于ORA-28002錯誤導致了SET AUTOTRACE ON功能啟用時碰到了錯誤。當修改了當前的用戶密碼,則ORA-28002不再出現后,SET AUTOTRACE ON的功能恢復正常。
根據上面的信息其實可以判斷,在啟用SET AUTOTRACE ON功能時,sqlplus會自動創建一個新的會話來記錄當前會話的統計信息。
而啟用的新的會話會使用當前會話登錄時保存的密碼來進行登錄,因此可以通過下面的例子來驗證這個推論:
solaris*orcl-/home/oracle$ sqlplus test/test
SQL*Plus: Release 11.2.0.3.0 Production on Sun Jul 15 01:28:38 2012
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Oracle Label Security and Real Application Testing options
SQL> set autot on
SQL> set autot off
SQL> alter user test identified by test1;
User altered.
SQL> set autot on
ERROR:
ORA-01017: invalid username/password; logon denied
SP2-0619: Error while connecting
SP2-0611: Error enabling STATISTICS report
果然,在用戶登錄后,如果當前的密碼被修改,是會導致SET AUTOTRACE ON啟用時報錯不正確的用戶名密碼錯誤的。
Oracle10g的DBA_TAB_STATISTICS視圖的STATTYPE_LOCKED列沒有tb正確的顯示結果。
看一個簡單的例子:
SQL> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bi
PL/SQL Release 10.2.0.5.0 - Production
CORE 10.2.0.5.0 Production
TNS for Linux: Version 10.2.0.5.0 - Production
NLSRTL Version 10.2.0.5.0 - Production
SQL> create table t_part (id number, name varchar2(30))
2 partition by range (id)
3 (partition p1 values less than (10),
4 partition p2 values less than (20),
5 partition pmax values less than (maxvalue));
Table created.
SQL> select table_name, partition_name, stattype_locked from all_tab_statistics where wner = user and table_name = 'T_PART';
TABLE_NAME PARTITION_NAME STATT
------------------------------ ------------------------------ -----
T_PART
T_PART P1
T_PART P2
T_PART PMAX
SQL> exec dbms_stats.lock_partition_stats(user, 'T_PART', 'P1')
PL/SQL procedure successfully completed.
SQL> select table_name, partition_name, stattype_locked from all_tab_statistics where wner = user and table_name = 'T_PART';
TABLE_NAME PARTITION_NAME STATT
------------------------------ ------------------------------ -----
T_PART
T_PART P1
T_PART P2
T_PART PMAX
SQL> exec dbms_stats.gather_table_stats(user, 'T_PART')
PL/SQL procedure successfully completed.
SQL> select table_name, partition_name, last_analyzed, stattype_locked from all_tab_statistics where wner = user and table_name = 'T_PART';
TABLE_NAME PARTITION_NAME LAST_ANAL STATT
------------------------------ ------------------------------ --------- -----
T_PART 16-JUL-12
T_PART P1
T_PART P2 16-JUL-12
T_PART PMAX 16-JUL-12
可以看到在10.2環境中,LOCK_PARTITION_STATS過程是正常工作的,但是DBA_TAB_STATISTICS視圖的STATTYPE_LOCKED列并沒有正確的顯示分區被鎖定的結果。
而對于表來說,LOCK_TABLE_STATS過程執行后,STATTYPE_LOCKED的結果顯示是正常的:
SQL> exec dbms_stats.lock_table_stats(user, 'T_PART')
PL/SQL procedure successfully completed.
SQL> select table_name, partition_name, last_analyzed, stattype_locked from all_tab_statistics where wner = user and table_name = 'T_PART';
TABLE_NAME PARTITION_NAME LAST_ANAL STATT
------------------------------ ------------------------------ --------- -----
T_PART 16-JUL-12 ALL
T_PART P1 ALL
T_PART P2 16-JUL-12 ALL
T_PART PMAX 16-JUL-12 ALL
這說明在10.2中,Oracle對于分區列的鎖定的支持是存在問題的。查詢了一下MOS,Oracle將這個問題確認為內部BUG:7240460,這個問題在11.1.0.7中被FIXED。
而在11.2中,這個問題以及不存在了:
SQL> select * from v$version;
BANNER
----------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production
CORE 11.2.0.3.0 Production
TNS for Solaris: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production
SQL> select owner, table_name, partition_name, stattype_locked
2 from dba_tab_statistics
3 where wner = 'TEST'
4 and table_name = 'T_PART';
OWNER TABLE_NAME PARTITION_NAME STATT
---------- ------------ --------------- -----
TEST T_PART
TEST T_PART P2
TEST T_PART P3
TEST T_PART P4
TEST T_PART P5
TEST T_PART PMAX
6 rows selected.
SQL> exec dbms_stats.lock_partition_stats('TEST', 'T_PART', 'P2')
PL/SQL procedure successfully completed.
SQL> select owner, table_name, partition_name, stattype_locked
2 from dba_tab_statistics
3 where wner = 'TEST'
4 and table_name = 'T_PART';
OWNER TABLE_NAME PARTITION_NAME STATT
---------- ------------ --------------- -----
TEST T_PART
TEST T_PART P2 ALL
TEST T_PART P3
TEST T_PART P4
TEST T_PART P5
TEST T_PART PMAX
6 rows selected.