利用 SQL Server 过滤索引提高查询语句的性能分析_Sql Server

利用 SQL Server 过滤索引提高查询语句的性能分析

大家好，我是只谈技术不剪发的 tony 老师。

microsoft sql server 过滤索引（筛选索引）是指基于满足特定条件的数据行进行索引。与全表索引（默认创建）相比，设计良好的筛选索引可以提高查询性能、减少索引维护开销并可降低索引存储开销。本文就给大家介绍一下 microsoft sql server 中的过滤索引功能。

在创建过滤索引之前，我们需要了解它的适用场景。

在某个字段中只有少量相关值需要查询时，可以针对值的子集创建过滤索引。例如，当字段中的值大部分为 null 并且查询只从非 null 值中进行选择时，可以为非 null 数据行创建筛选索引。由此得到的索引与对相同字段定义的全表非聚集索引相比，前者更小且维护开销更低。
表中含有分类数据行时，可以为一种或多种类别的数据创建筛选索引。通过将查询范围缩小为表的特定区域，这可以提高针对这些数据行的查询性能。此外，由此得到的索引与全表非聚集索引相比，前者更小且维护开销更低。

我们在创建索引时可以通过一个 where 子句指定需要索引的数据行，从而创建一个过滤索引。例如，对于以下订单表 orders：

				?

									create table orders (

									  id integer primary key,

									  customer_id integer,

									  status varchar(10)

									);

									begin  

									  declare @counter int = 1

									  while @counter <= 1000000

									  begin

									    insert into orders

									    select @counter, (rand() * 100000),

									          case

									            when (rand() * 100)<1 then 'pending'

									            when (rand() * 100)>99 then 'shipped'

									            else 'completed'

									          end

									    set @counter = @counter + 1

									  end

									end;

订单表中总共有 100 万个订单，通常绝大部分的订单都处于完成状态。一般情况下，我们只需要针对某个用户未完成的订单进行查询跟踪，因此可以创建一个基于用户编号和状态的部分索引：

				?

									create index full_idx on orders (customer_id, status);

然后我们查看以下查询语句的执行计划：

				?

									set statistics profile on

									select * 

									from orders

									where customer_id = 5043

									and status != 'completed';

									id    |customer_id|status |

									------+-----------+-------+

									743436|       5043|pending|

									947848|       5043|shipped|

									rows    executes    stmttext    stmtid  nodeid  parent  physicalop  logicalop   argument    definedvalues   estimaterows    estimateio  estimatecpu avgrowsize  totalsubtreecost    outputlist  warnings    type    parallel    estimateexecutions

									2   1   select * from [orders] where [customer_id]=@1 and [status]<>@2    1   1   0   null    null    null    null    1.405213    null    null    null    0.003283546 null    null    select  0   null

									2   1     |--index seek(object:([hrdb].[dbo].[orders].[full_idx]), seek:([hrdb].[dbo].[orders].[customer_id]=(5043) and [hrdb].[dbo].[orders].[status] < 'completed' or [hrdb].[dbo].[orders].[customer_id]=(5043) and [hrdb].[dbo].[orders].[status] > 'completed') ordered forward) 1   2   1   index seek  index seek  object:([hrdb].[dbo].[orders].[full_idx]), seek:([hrdb].[dbo].[orders].[customer_id]=(5043) and [hrdb].[dbo].[orders].[status] < 'completed' or [hrdb].[dbo].[orders].[customer_id]=(5043) and [hrdb].[dbo].[orders].[status] > 'completed') ordered forward  [hrdb].[dbo].[orders].[id], [hrdb].[dbo].[orders].[customer_id], [hrdb].[dbo].[orders].[status] 1.405213    0.003125    0.0001585457    27  0.003283546 [hrdb].[dbo].[orders].[id], [hrdb].[dbo].[orders].[customer_id], [hrdb].[dbo].[orders].[status] null    plan_row    0   1

输出结果显示查询利用索引 full_idx 扫描查找所需的数据。

我们可以查看一下索引 full_idx 占用的空间大小：

				?

									select ix.name as "index name",

									sum(sz.used_page_count) * 8/1024.0 as "index size (mb)"

									from sys.dm_db_partition_stats as sz

									inner join sys.indexes as ix on sz.object_id = ix.object_id

									and sz.index_id = ix.index_id

									inner join sys.tables tn on tn.object_id = ix.object_id

									where tn.name = 'orders'

									group by ix.name;

									index name                  |index size (mb)|

									----------------------------+---------------+

									full_idx                    |      26.171875|

									pk__orders__3213e83f1e3b8a3b|      29.062500|

接下来我们再创建一个部分索引，只包含未完成的订单数据，从而减少索引的数据量：

				?

									create index partial_idx on orders (customer_id)

									where status != 'completed';

索引 partial_idx 中只有 customer_id 字段，不需要 status 字段。同样可以查看一下索引 partial_idx 占用的空间大小：

				?

									select ix.name as "index name",

									sum(sz.used_page_count) * 8/1024.0 as "index size (mb)"

									from sys.dm_db_partition_stats as sz

									inner join sys.indexes as ix on sz.object_id = ix.object_id

									and sz.index_id = ix.index_id

									inner join sys.tables tn on tn.object_id = ix.object_id

									where tn.name = 'orders'

									group by ix.name;

									index name                  |index size (mb)|

									----------------------------+---------------+

									full_idx                    |      26.171875|

									partial_idx                 |       0.289062|

									pk__orders__3213e83f1e3b8a3b|      29.062500|

索引只有 0.29 mb，而不是 26 mb，因为绝大多数订单都处于完成状态。

以下查询显式了适用过滤索引时的执行计划：

				?

									select * 

									from orders with ( index ( partial_idx ) )

									where customer_id = 5043

									and status != 'completed';

									rows    executes    stmttext    stmtid  nodeid  parent  physicalop  logicalop   argument    definedvalues   estimaterows    estimateio  estimatecpu avgrowsize  totalsubtreecost    outputlist  warnings    type    parallel    estimateexecutions

									2   1   select *   from orders with ( index ( partial_idx ) )  where customer_id = 5043  and status != 'completed'  1   1   0   null    null    null    null    1.124088    null    null    null    0.03279812  null    null    select  0   null

									2   1     |--nested loops(inner join, outer references:([hrdb].[dbo].[orders].[id]))    1   2   1   nested loops    inner join  outer references:([hrdb].[dbo].[orders].[id])   null    1.124088    0   4.15295e-05 24  0.03279812  [hrdb].[dbo].[orders].[id], [hrdb].[dbo].[orders].[customer_id], [hrdb].[dbo].[orders].[status] null    plan_row    0   1

									2   1          |--index seek(object:([hrdb].[dbo].[orders].[partial_idx]), seek:([hrdb].[dbo].[orders].[customer_id]=(5043)) ordered forward)   1   3   2   index seek  index seek  object:([hrdb].[dbo].[orders].[partial_idx]), seek:([hrdb].[dbo].[orders].[customer_id]=(5043)) ordered forward, forcedindex    [hrdb].[dbo].[orders].[id], [hrdb].[dbo].[orders].[customer_id] 9.935287    0.003125    0.0001679288    15  0.003292929 [hrdb].[dbo].[orders].[id], [hrdb].[dbo].[orders].[customer_id] null    plan_row    0   1

									2   2          |--clustered index seek(object:([hrdb].[dbo].[orders].[pk__orders__3213e83f1e3b8a3b]), seek:([hrdb].[dbo].[orders].[id]=[hrdb].[dbo].[orders].[id]) lookup ordered forward)  1   5   2   clustered index seek    clustered index seek    object:([hrdb].[dbo].[orders].[pk__orders__3213e83f1e3b8a3b]), seek:([hrdb].[dbo].[orders].[id]=[hrdb].[dbo].[orders].[id]) lookup ordered forward, forcedindex [hrdb].[dbo].[orders].[status]  1   0.003125    0.0001581   16  0.02946366  [hrdb].[dbo].[orders].[status]  null    plan_row    0   9.935287

我们比较通过 full_idx 和 partial_idx 执行以下查询的时间：

				?

									-- 300 ms

									select count(*)

									from orders with ( index ( full_idx ) )

									where status != 'completed';

									-- 10 ms

									select count(*) 

									from orders with ( index ( partial_idx ) )

									where status != 'completed';

另外，过滤索引还可以用于实现其他的功能。例如，我们可以将索引 partial_idx 定义为唯一索引，从而实现每个用户只能存在一个未完成订单的约束。

				?

									drop index partial_idx on orders;

									truncate table orders;

									create unique index partial_idx on orders (customer_id)

									where status != 'completed';

									insert into orders(id, customer_id, status) values (1, 1, 'pending');

									insert into orders(id, customer_id, status) values (2, 1, 'pending');

									sql 错误 [2601] [23000]: 不能在具有唯一索引“partial_idx”的对象“dbo.orders”中插入重复键的行。重复键值为 (1)。