SQL Server中LIKE %search_string% 索引查找（Index Seek）淺析

知識 10-18

在SQL Server的SQL優化過程中，如果遇到WHERE條件中包含LIKE "%search_string%"是一件非常頭痛的事情。這種情況下，一般要修改業務邏輯或改寫SQL才能解決SQL執行計划走索引掃描或全表掃描的問題。最近在優化SQL語句的時候，遇到了一個很有意思的問題。某些使用LIKE "%" + @search_string + "%"(或者 LIKE @search_string)這樣寫法的SQL語句的執行計劃居然走索引查找（Index Seek）。下面這篇文章來分析一下這個奇怪的現象。

首先，我們來看看WHERE查詢條件中使用LIKE的幾種情況,這些是我們對LIKE的一些常規認識：

1： LIKE "condition%"

執行計劃會走索引查找（Index Seek or Clustered Index Seek）。

2: LIKE "%condition"

執行計劃會走索引掃描（Index Scan or Clustered Index Scan）或全表掃描（Table Scan）

3: LIKE "%condition%"

執行計劃會走索引掃描（Index Scan or Clustered Index Scan）或全表掃描（Table Scan）

4： LIKE "condition1%condition%";

執行計劃會走索引查找（Index Seek）

下面我們以AdventureWorks2014示例資料庫為測試環境（測試環境為SQL Server 2014 SP2），測試上面四種情況，如下所示：

SQL Server中LIKE %search_string% 索引查找（Index Seek）淺析

其實複雜的情況下，LIKE "search_string%"也有走索引掃描（Index Scan）的情況，上面情況並不是唯一、絕對的。如下所示

在表Person.Person的 rowguid欄位上創建有唯一索引AK_Person_rowguid

SQL Server中LIKE %search_string% 索引查找（Index Seek）淺析

那麼我們來看看上面所說的這個特殊案例（這裡使用一個現成的案例，懶得構造案例了），如何讓LIKE %search_string%走索引查找（Index Seek），這個技巧就是使用變數，如下SQL對比所示：

如下所示，表[dbo].[GEN_CUSTOMER]在欄位CUSTOMER_CD有聚集索引。

SQL Server中LIKE %search_string% 索引查找（Index Seek）淺析

可以看到CUSTOMER_CD LIKE "%" + @CUSTOMER_CD + "%"這樣的SQL寫法（或者CUSTOMER_CD LIKE @CUSTOMER_CD也可以），執行計劃就走聚集索引查找（Clustered Index Seek）了，而條件中直接使用CUSTOMER_CD LIKE "%00630%" 反而走聚集索引掃描（Clustered Index Scan），另外可以看到實際執行的Cost開銷比為4% VS 96% ，初一看，還真的以為第一個執行計劃比第二個執行的代價要小很多。但是從IO開銷，以及CPU time、elapsed time對比來看，兩者幾乎沒有什麼差異。在這個案例中，並不是走索引查找（Index Seek）就真的開銷代價小很多。

SQL Server中LIKE %search_string% 索引查找（Index Seek）淺析

考慮到這裡數據量較小，我使用網上的一個腳本，在AdventureWorks2014資料庫構造了一個10000000的大表，然後順便做了一些測試對比

CREATE TABLE dbo.TestLIKESearches
(
ID1 INT
,ID2 INT
,AString VARCHAR(100)
,Value INT
,PRIMARY KEY (ID1, ID2)
);

WITH Tally (n) AS
(
SELECT TOP 10000000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM sys.all_columns a CROSS JOIN sys.all_columns b
)
INSERT INTO dbo.TestLIKESearches
(ID1, ID2, AString, Value)
SELECT 1+n/500, n%500
,CASE WHEN n%500 > 299 THEN
SUBSTRING("abcdefghijklmnopqrstuvwxyz", 1+ABS(CHECKSUM(NEWID()))%26, 1) +
SUBSTRING("abcdefghijklmnopqrstuvwxyz", 1+ABS(CHECKSUM(NEWID()))%26, 1) +
SUBSTRING("abcdefghijklmnopqrstuvwxyz", 1+ABS(CHECKSUM(NEWID()))%26, 1) +
RIGHT(1000+n%1000, 3) +
SUBSTRING("abcdefghijklmnopqrstuvwxyz", 1+ABS(CHECKSUM(NEWID()))%26, 1) +
SUBSTRING("abcdefghijklmnopqrstuvwxyz", 1+ABS(CHECKSUM(NEWID()))%26, 1) +
SUBSTRING("abcdefghijklmnopqrstuvwxyz", 1+ABS(CHECKSUM(NEWID()))%26, 1)
END
,1+ABS(CHECKSUM(NEWID()))%100
FROM Tally;

CREATE INDEX IX_TestLIKESearches_N1 ON dbo.TestLIKESearches(AString);

如下測試所示，在一個大表上面，LIKE @search_string這種SQL寫法，IO開銷確實要小一些，CPU Time也要小一些。個人多次測試都是這種結果。也就是說對於數據量較大的表，這種SQL寫法性能確實要好一些。

SQL Server中LIKE %search_string% 索引查找（Index Seek）淺析

現在回到最開始那個SQL語句，個人對執行計劃有些疑惑，查看執行計劃，你會看到優化器對CUSTOMER_CD LIKE "%" + @CUSTOMER_CD + "%" 進行了轉換。如下截圖或通過執行計劃的XML，你會發現上面轉換為使用三個內部函數LikeRangeStart, LikeRangeEnd, LikeRangeInfo.

SQL Server中LIKE %search_string% 索引查找（Index Seek）淺析

另外，你會發現Nested Loops & Compute Scalar 等步驟的Cost都為0.後面在「Dynamic Seeks and Hidden Implicit Conversions」這篇博客裡面看到了一個新名詞「Dynamic Seeks」。文字提到因為成本估算為0，所以，你看到的執行計劃的Cost又是「不準確」的，具體描述如下：

The plan now contains an extra Constant Scan, a Compute Scalar and a Nested Loops Join. These operators are interesting because they have zero cost estimates: no CPU, no I/O, nothing. That』s because they are purely architectural: a workaround for the fact that SQL Server cannot currently perform a dynamic seek within the Index Seek operator itself. To avoid affecting plan choices, this extra machinery is costed at zero.

The Constant Scan produces a single in-memory row with no columns. The Compute Scalar defines expressions to describe the covering seek range (using the runtime value of the @Like variable). Finally, the Nested Loops Join drives the seek using the computed range information as correlated values.

The upper tooltip shows that the Compute Scalar uses three internal functions, LikeRangeStart, LikeRangeEnd, andLikeRangeInfo. The first two functions describe the range as an open interval. The third function returns a set of flags encoded in an integer, that are used internally to define certain seek properties for the Storage Engine. The lower tooltip shows the seek on the open interval described by the result of LikeRangeStart and LikeRangeEnd, and the application of the residual predicate 『LIKE @Like』.

不管你返回的記錄有多少，執行計劃Nested Loops & Compute Scalar 等步驟的Cost都為0，如下測試所示，返回1000條記錄，它的成本估算依然為0 ，顯然這樣是不夠精確的。深層次的原因就不太清楚了。執行計劃Cost不可靠的案例很多。

SET STATISTICS IO ON;

SET STATISTICS TIME ON;

DECLARE @CUSTOMER_CD NVARCHAR(10);

SET @CUSTOMER_CD=N"%44%"

SELECT * FROM [dbo].[GEN_CUSTOMER] WHERE CUSTOMER_CD LIKE @CUSTOMER_CD

另外，其實還一點沒有搞清楚的時候在什麼條件下出現Index Seek的情況。有些情況下，使用變數的方式，依然是索引掃描

SQL Server中LIKE %search_string% 索引查找（Index Seek）淺析