Cardinality Estimator - Part 2- Out of Date Statistics

Continuing from our series on Cardinality Estimator 2014, Let us look at the first difference between 2 CE models. The way estimates are made when the statistics are out of date or stale is the first difference we will be exploring. Couple of prerequisites for this demo

  1. Download SQL Server 2012 Adventureworks database from here
  2. Restore the downloaded database into a 2014 instance and switch the compatibility level to 120 ( SQL 2014 )  

Let us check the row count of the table [Production].[TransactionHistory]

EXEC sp_spaceused '[Production].[TransactionHistory]'
GO


Contains 113,443 rows. Lets insert 10,000 ( little less than 10%) rows so that it doesn't cause the statistic to be automatically updated and statistic remains out of date. Note that 20% of table needs to change for statistics to be updated automatically.

The plan is to insert rows, run queries against both the Cardinality Estimator models and compare the estimates.



Step 1: Insert 10,000 rows.

Insert into [Production].[TransactionHistory]([ProductID]
      ,[ReferenceOrderID]
      ,[ReferenceOrderLineID]
      ,[TransactionDate]
      ,[TransactionType]
      ,[Quantity]
      ,[ActualCost]
      ,[ModifiedDate])
SELECT Top 10000
       [ProductID]
      ,[ReferenceOrderID]
      ,[ReferenceOrderLineID]
      ,[TransactionDate]
      ,[TransactionType]
      ,[Quantity]
      ,[ActualCost]
      ,[ModifiedDate]
  FROM [Production].[TransactionHistory]
  WHERE [ModifiedDate] BETWEEN '20070101' and '20080101'

 
Step 2: Run query against both the models


  SELECT *
  FROM [Production].[TransactionHistory]  WHERE ProductID = 714  OPTION (QUERYTRACEON 9481)



  SELECT *
  FROM [Production].[TransactionHistory]
  WHERE ProductID = 714



Note that the hint "OPTION (QUERYTRACEON 9481)" will force the query to use old CE 70 in SQL Server 2014.Observe the estimates for both.




CE 70:





CE 120:




They are the same. Let us observer the histogram below.




Notice that the histogram contains value "714" as a range hi key and it estimates 645 rows. Estimate on the query plan is slightly higher. As SQL Server knows that approximately 10% of the rows have changed, it makes an adjustment ( 645 + 10% of 645 ) on the both the Cardinality Estimator models.


Step 3 :


Now let us observe for another value


  SELECT *
  FROM [Production].[TransactionHistory]   WHERE ProductID = 746  OPTION (QUERYTRACEON 9481)



  SELECT *
  FROM [Production].[TransactionHistory]
  WHERE ProductID = 746


Let us compare the estimates now


CE 70:



CE 120:




Now we see a difference in the estimates. Let us check the histogram for the value.




The Value doesn't exist as a Range Hi Key and hence "Avg Range Rows" is used for the estimate. CE 70 uses exactly the value estimated by the histogram but CE 120 (new CE) makes the adjustment to the estimate from Histogram, even if the estimate is a approximate guess via Average Range rows. So, CE 70 estimates the value to be 206 as shown above, while CE 120 estimates it to be slightly higher, at 224.15


So, the takeaway from the post is Cardinality Estimator 120 will adjust the estimate obtained from histogram when statistics are out of date and even when the value has to be guessed via "Avg Range rows".