它將冗余數據選擇到一個游標中並根據(LastNameFirstName)來分組(在我們這個方案中)然後打開游標然後循環地取出每一行然後用與先前的取出的鍵值進行比較如果這是第一次取出這個值或者這個值不是冗余鍵那麼跳過這個記錄然後取下一個不然的話這就是這個組中的冗余記錄所以刪掉它.
讓我們運行一下這個存儲過程
BEGIN
DeleteDuplicates;
END;
/
SELECT LastName FirstName COUNT(*)
FROM Customers
GROUP BY LastName FirstName
HAVING COUNT(*) > ;
最後一個查詢語句沒有返回值所以冗余數據沒有了從表中取冗余數據的過程完全是由定義在csr_Duplicates 這個游標中的SQL語句來實現的PL/SQl只是用來實現刪除冗余數那麼能不能完全用SQL語句來實現呢?
二.SQL解決方案使用RANK()刪除冗余數據Oraclei分析函數RANK()來枚舉每一個組中的元素在我們的方案中 我們應用這個方案我們使用這個函數動態的把冗余數據連續的排列起來加上編號組由Partintion by 這個語句來分開然後用Order by 進行分組SELECT ID LastName FirstName RANK() OVER (PARTITION BY LastName
FirstName ORDER BY ID) SeqNumber
FROM Customers
ORDER BY LastName FirstName;
SQL
Listing Output of single SQL statement that uses RANK()
顯示的是根據記錄的條數的個數來顯示尤其對於冗余數據
ID LASTNAME FIRSTNAME SEQNUMBER
Blake Becky
Blue Don
Bradley Tom
Chang Jim
Griffith David
Hill Larry
King Chuck
Krieger Jeff
Krieger Jeff
Krieger Jeff
Loney Julie
Lord Don
Mason Paul
Monroe John
Simon Michael
Simon Michael
Stone Tony
Stone Tony
Stone Tony
Stone Tony
Stone Tony
我們可以看一到SeqNumber這一列中的數值冗余數據是根據ID號由小到大進行的排序所有的冗余數據的SqlNumber都大於一所有的非冗余數據都等於一所以我們取自己所需刪除那麼沒用的SELECT ID LastName FirstName
FROM
(SELECT ID LastName FirstName RANK() OVER (PARTITION BY LastName
FirstName ORDER BY ID) AS SeqNumber
FROM Customers)
WHERE SeqNumber > ;
SQL
Listing 冗余鍵的鍵值
有七行必須被刪除
ID LASTNAME FIRSTNAME
Krieger Jeff
Krieger Jeff
Simon Michael
Stone Tony
Stone Tony
Stone Tony
Stone Tony
rows selected這顯示有七行需要刪除還是用上一個表我測試了一下這個代碼它用了77秒種就刪除了所有的數據准備好了用Sql語句來刪除冗余數據版本一它執行了秒
DELETE
FROM CUSTOMERS
WHERE ID IN
(SELECT ID
FROM
(SELECT ID LastName FirstName RANK() OVER (PARTITION BY LastName
FirstName ORDER BY ID) AS SeqNumber
FROM Customers)
WHERE SeqNumber > );
我們可以看到最後的兩行語句對表中的數據進行了排序這不是有效的所以我們來優化一下最後一個查詢語句把Rank()函數應用到只含有冗余數據的組而不是所有的列下面這個語句是比較有效率的雖然它不像上一個查詢那樣精簡SELECT ID LastName FirstName
FROM
(SELECT ID LastName FirstName RANK() OVER (PARTITION BY LastName
FirstName ORDER BY ID) AS SeqNumber
FROM
(SELECT ID LastName FirstName
FROM Customers
WHERE (LastName FirstName) IN (SELECT LastName FirstName
FROM Customers
GROUP BY LastName FirstName
HAVING COUNT(*) > )))
WHERE SeqNumber > ;
選擇冗余數據只用了26秒鐘這樣就提高了%的性能這樣就提高了將這個作為子查詢的刪除查詢的效率
DELETE
FROM Customers
WHERE ID IN
(SELECT ID
FROM
(SELECT ID LastName FirstName RANK() OVER (PARTITION BY LastName
FirstName ORDER BY ID) AS SeqNumber
FROM
(SELECT ID LastName FirstName
FROM Customers
WHERE (LastName FirstName) IN (SELECT LastName FirstName
FROM Customers
GROUP BY LastName FirstName
HAVING COUNT(*) > )))
WHERE SeqNumber > );
現在只用了秒鐘的就完成的上面的任務比起上一個秒這是一個很大的進步相比之下存儲過程用了秒這樣存儲過程有些慢了使用PL/SQL語句我們和我們以上的代碼會得到更好的更精確的代碼和提高你代碼的執行效率雖然對於從數據庫中枚舉數據PL/SQL對於Sql兩者沒有什麼差別但是對於數據的比較上PL/SQL就比SQL要快很多但是如果冗余數據量比較小的話我們盡量使用SQL而不使用PL/SQL如果你的數據表沒有主鍵的話那麼你可以參考其它技術
Rank()其它的方法
使用Rank()函數你可以對選擇你所保留的數據(或者是小ID的或者是大ID 的就由RECDate這個列來決定這種情況下你可以把REcdate加入到(Orderby )子句中倒序或者正序
這是一種保留最大Id的一種解決方案
DELETE
FROM Customers
WHERE ID IN
(SELECT ID
FROM
(SELECT ID LastName FirstName RANK() OVER (PARTITION BY LastName FirstName ORDER BY RecDate DESC ID) AS SeqNumber
FROM
(SELECT ID LastName FirstName RecDate
FROM Customers
WHERE (LastName FirstName) IN (SELECT LastName FirstName
FROM Customers
GROUP BY LastName FirstName
HAVING COUNT(*) > )))
WHERE SeqNumber > );
這種技術保證了你可以控制每一個表中的保留的組假設你有一個數據庫有一個促銷或者有一個折扣信息比如一個團體可以使用這種促銷5次或者個人可以使用這個折扣三次為了指出要保留的組的個數你可以在where 和having子句中進行設置那麼你將刪除所有大於你
設置有數的冗余組
DELETE
FROM Customers
WHERE ID IN
(SELECT ID
FROM
(SELECT ID LastName FirstName RANK() OVER (PARTITION BY LastName
FirstName ORDER BY ID) AS SeqNumber
FROM
(SELECT ID LastName FirstName
FROM Customers
WHERE (LastName FirstName) IN (SELECT LastName FirstName
FROM Customers
GROUP BY LastName FirstName
HAVING COUNT(*) > )))
WHERE SeqNumber > );
As you can see using the RANK() function allows you to eliminate duplicates in a
single SQL statement and gives you more capabilities by extending the power of
your
queries
正如你所見使用Rank()可以消除冗余數據而且能給你很大的可伸展性
From:http://tw.wingwit.com/Article/program/Oracle/201311/18979.html