在報表數據庫的後台alert文件中發現了這個錯誤簡單記錄一下問題的診斷和解決過程數據庫版本 for Solaris sparc 錯誤信息如下
Errors in file /u/oracle/admin/repdb/bdump/repdb_j_trc:
ORA: error on auto execute of job
ORA: out of process memory when trying to allocate bytes (hash join subhkllcqas:kllsltba)
ORA: at JSGOV_OLDINSERT_HOS_INFO line
ORA: at JSGOV_OLDP_GEN_STAT line
ORA: at line
這個JOB是昨天才添加到數據庫中的而運行這個JOB的用戶是從其他數據庫遷移到當前數據庫中的
產生問題的情況有很多種有可能是本地配置和遠端配置的區別造成的;也可能是由於源數據庫是而當前數據庫是版本的差異造成了執行計劃改變;還有可能是遷移過程中出現了錯誤從而引起了問題
從錯誤本身觀察是由於無法為HASH JOIN分配M的內存所導致的觀察數據庫的PGA內存設置
SQL> SHOW PARAMETER PGA
NAME TYPE VALUE
pga_aggregate_target big integer
對於一個報表系統來說這個設置確實小了一點但是考慮這個數據庫在處理很多比當前數據量大得多的情況都未出現這個問題基本上可以確定不是系統參數設置造成的
如果排除第一種情況那麼無論是遷移出現了問題還是版本差異的問題最大的可能性都是執行計劃發生了變化那麼現在就需要找到出現問題的SQL語句檢查執行計劃
根據錯誤信息給出的存儲過程名稱和位置提示可以輕易的找到出現問題的SQL語句不過由於SQL語句太長而且和問題的關系並不太大這裡將SQL語句省略只列出這個SQL語句對應的執行計劃
SQL> SELECT * FROM TABLE(DBMS_XPLANDISPLAY);
PLAN_TABLE_OUTPUT
| Id | Operation | Name | Rows | Bytes | Cost |
| | SELECT STATEMENT | | | | |
| | NESTED LOOPS | | | | |
| | HASH JOIN | | | | |
| | HASH JOIN | | | | |
| | MERGE JOIN CARTESIAN | | | | |
| | VIEW | | | | |
| | SORT GROUP BY | | | | |
| | VIEW | | | | |
| | SORT GROUP BY | | | | |
| | TABLE ACCESS FULL | ORD_HIT_COMM | | | |
| | BUFFER SORT | | | | |
| | VIEW | | | | |
| | SORT GROUP BY | | | | |
| | VIEW | | | | |
| | SORT GROUP BY | | | | |
| | TABLE ACCESS FULL | ORD_HIT_COMM | | | |
| | VIEW | | | | |
| | SORT GROUP BY | | | | |
| | TABLE ACCESS FULL | ORD_HIT_COMM | | | |
| | VIEW | | | | |
| | SORT GROUP BY | | | | |
| | NESTED LOOPS | | | | |
| | INLIST ITERATOR | | | | |
| | TABLE ACCESS BY INDEX ROWID| SW_PLAT_CAT_ORG | | | |
| | INDEX RANGE SCAN | IDX_SW_PLAT_CAT_ORG_ENABLE | | | |
| | INLIST ITERATOR | | | | |
| | TABLE ACCESS BY INDEX ROWID| SW_PLAT_CAT_BUYER | | | |
| | INDEX UNIQUE SCAN | PK_SW_PLAT_CAT_BUYER | | | |
| | INLIST ITERATOR | | | | |
| | TABLE ACCESS BY INDEX ROWID | PLT_PLAT | | | |
| | INDEX UNIQUE SCAN | PK_PLT_PLAT | | | |
Note: cpu costing is off PLAN_TABLE is old version
rows selected
雖然SQL本身寫的有缺點但是絕對不應該產生這種包含笛卡兒積的執行計劃檢查SQL並沒有發現缺少關聯條件的情況即問題和SQL本身並不大雖然SQL有很多可以優化的地方但是這並不是產生笛卡兒積的關鍵因素
觀察執行計劃本身除了笛卡兒積之外另人比較疑惑的一點就是返回記錄數Oracle認為全表掃描ORD_HIT_COMM僅僅返回一條記錄這顯然是有問題的
SQL> SELECT COUNT(*) FROM ORD_HIT_COMM;
COUNT(*)
從這一點上判斷可以很容易的斷定是統計信息出現了問題檢查ORD_HIT_COMM的統計信息
SQL> SELECT TABLE_NAME NUM_ROWS FROM USER_TABLES
WHERE TABLE_NAME = ORD_HIT_COMM;
TABLE_NAME NUM_ROWS
ORD_HIT_COMM
本以為不存在統計信息或者得到一個很小的值沒想到統計信息基本上是准確的那麼是哪裡出現的問題呢
觀察SQL語句發現對ORD_HIT_COMM表唯一的限制條件是ENABLE_FLAG=而這個限制條件其實對過濾數據來說沒有多大的意義不過檢查執行計劃
SQL> EXPLAIN PLAN FOR SELECT * FROM ORD_HIT_COMM
WHERE ENABLE_FLAG = ;
Explained
SQL> SELECT * FROM TABLE(DBMS_XPLANDISPLAY);
PLAN_TABLE_OUTPUT
| Id | Operation | Name | Rows | Bytes | Cost |
| | SELECT STATEMENT | | | | |
| | TABLE ACCESS FULL | ORD_HIT_COMM | | | |
Note: cpu costing is off PLAN_TABLE is old version
rows selected
看來問題多半出現在ENABLE_FLAG列的統計信息上
SQL> SELECT COLUMN_NAME NUM_DISTINCT NUM_NULLS DENSITY NUM_BUCKETS
FROM USER_TAB_COLUMNS
WHERE TABLE_NAME = ORD_HIT_COMM
AND COLUMN_NAME = ENABLE_FLAG;
COLUMN_NAME NUM_DISTINCT NUM_NULLS DENSITY NUM_BUCKETS
ENABLE_FLAG E
在i的環境Oracle根據DENSITY來確定返回的記錄數因此得到條記錄的結果是很正常的
SQL> SELECT * E FROM DUAL;
*E
顯然這時需要刪除錯誤的統計信息並重新收集統計信息
SQL> EXEC DBMS_STATSDELETE_TABLE_STATS(USER ORD_HIT_COMM)
PL/SQL procedure successfully completed
SQL> EXEC DBMS_STATSGATHER_TABLE_STATS(USER ORD_HIT_COMM)
PL/SQL procedure successfully completed
檢查統計信息中的DENSITY值
SQL> SELECT COLUMN_NAME NUM_DISTINCT NUM_NULLS DENSITY NUM_BUCKETS
FROM USER_TAB_COLUMNS
WHERE TABLE_NAME = ORD_HIT_COMM
AND COLUMN_NAME = ENABLE_FLAG;
COLUMN_NAME NUM_DISTINCT NUM_NULLS DENSITY NUM_BUCKETS
ENABLE_FLAG
下面檢查訪問ORD_HIT_COMM的執行計劃檢查優化器認為的返回記錄數
SQL> EXPLAIN PLAN FOR
SELECT * FROM ORD_HIT_COMM WHERE ENABLE_FLAG = ;
Explained
SQL> SELECT * FROM TABLE(DBMS_XPLANDISPLAY);
PLAN_TABLE_OUTPUT
| Id | Operation | Name | Rows | Bytes | Cost |
| | SELECT STATEMENT | | K| M| |
| | TABLE ACCESS FULL | ORD_HIT_COMM | K| M| |
Note: cpu costing is off PLAN_TABLE is old version
rows selected
現在統計信息已經恢復正常檢查一下出現問題的SQL語句執行計劃是否正常
SQL> SELECT * FROM TABLE(DBMS_XPLANDISPLAY);
PLAN_TABLE_OUTPUT
| Id | Operation |Name |Rows| Bytes |TempSpc| Cost |
| | SELECT STATEMENT | | | | | |
| | HASH JOIN | | | | | |
| | HASH JOIN | | | | | |
| | HASH JOIN | | | | | |
| | MERGE JOIN CARTESIAN | | | | | |
| | VIEW | | | | | |
| | SORT GROUP BY | | | | | |
| | NESTED LOOPS | | | | | |
| | INLIST ITERATOR | | | | | |
| | TABLE ACCESS BY INDEX ROWID|SW_PLAT_CAT_ORG | | | | |
| | INDEX RANGE SCAN |IDX_SW_PLAT_CAT_ORG_ENABLE| | | | |
| | INLIST ITERATOR | | | | | |
| | TABLE ACCESS BY INDEX ROWID|SW_PLAT_CAT_BUYER | | | | |
| | INDEX UNIQUE SCAN |PK_SW_PLAT_CAT_BUYER | | | | |
| | BUFFER SORT | | | | | |
| | INLIST ITERATOR | | | | | |
| | TABLE ACCESS BY INDEX ROWID |PLT_PLAT | | | | |
| | INDEX RANGE SCAN |PK_PLT_PLAT | | | | |
| | VIEW | || K| | |
| | SORT GROUP BY | || K| | |
| | TABLE ACCESS FULL |ORD_HIT_COMM |K| M| | |
| | VIEW | || K| | |
| | SORT GROUP BY | || K| | |
| | VIEW | |K| M| | |
| | SORT GROUP BY | |K| M| M| |
| | TABLE ACCESS FULL |ORD_HIT_COMM |K| M| | |
| | VIEW | || K| | |
| | SORT GROUP BY | || K| | |
| | VIEW | |K| M| | |
| | SORT GROUP BY | |K| M| M| |
| | TABLE ACCESS FULL |ORD_HIT_COMM |K| M| | |
Note: cpu costing is off PLAN_TABLE is old version
rows selected
這個執行計劃至少已經不那麼離譜了下面只需要重新運行JOB就可以了
From:http://tw.wingwit.com/Article/program/Oracle/201311/16912.html