Sometimes I like to cite “Remember remember the fifth of November” in case, when I trip the mine that I teach others to avoid.

This time it happened, when reviewing some database structure and related code. So, here it goes.

The database had a table, which can be scripted as shown below. As you can probably note, we are talking about a table that holds queue of jobs that have to be performed.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
CREATE TABLE [dbo].[Jobs]
  ( 
    [Id] [int] IDENTITY(1,1) NOT NULL, 
    [Priority] [tinyint] NOT NULL, 
    [DateInserted] [datetime] NOT NULL, 
    [DateAllocated] [datetime] NULL, 
    [DateFetched] [datetime] NULL, 
    [Status] [tinyint] NOT NULL, 
    [SerializedData] [xml] NULL, 
    [ParentId] [int] NULL, 
    [ReferenceID] [bigint] NULL, 
    CONSTRAINT [PK_Jobs] PRIMARY KEY CLUSTERED ( [Id] ASC )
  ) GO

ALTER TABLE [dbo].[Jobs] ADD  CONSTRAINT [DF_Jobs_DateInserted]  DEFAULT (getdate()) FOR [DateInserted]
GO

ALTER TABLE [dbo].[Jobs] ADD  CONSTRAINT [DF_Jobs_Status]  DEFAULT (1) FOR [Status]
GO

The use case for the table was simple:

  • One part of the application inserts new jobs based on customer activities and by this sets the default date and status (1)
  • Then service fetches jobs, by taking Id’s of specified amount of new (status=1) jobs and setting appropriate date on DateFetched and updating status to (2)
  • Later at some point XML data will be retrieved using the Id’s retrieved previously

Initially, the code was LINQ 2 SQL in the program itself and was something like this:

1
2
3
4
var q = ctx.Jobs
            .Where(j => j.Status == 1)
            .Select(j => new { j.Id, j.Priority, j.DateInserted, j.Status })
            .Take(count);

For the testing purposes, I’ve loaded the table with 100k rows, out of which only 401 had status=1.

1
2
SELECT TOP (100) Id, [Priority], DateInserted FROM Jobs
WHERE Status=1

Running the query without any additional indexes, you will get the results, but there will be definitely a table scan with a little bit more of logical reads than you could expect.

00-InitialQueryPlan

(100 row(s) affected)Table 'Jobs'. Scan count 1, logical reads 648

Now, because the second step is looking for the new records (status=1), which normally should be small number of records comparing to the whole table size, so I immediately thought about creating non-clustered index for the status column, filter on the status=1 and include appropriate values (to have a small covering index). So I created one:

1
2
CREATE NONCLUSTERED INDEX IX_Jobs_NewJobs 
ON Jobs ([Status]) INCLUDE ([Priority], [DateInserted]) WHERE ([Status]=1)

This of course had a nice effect on the previous query:

01-AfterIndexCreated

(100 row(s) affected)Table 'Jobs'. Scan count 1, logical reads 2

For a moment I thought, that the rest should be fine, because query and LINQ statement were fairly straight forward with not much magic in them, but somehow, I’ve decided to see what query LINQ does produce.

The query was almost the same with one exception – parameterization, where the parameter value was of course equal to (1):

1
2
3
SELECT TOP (100) [t0].[Id], [t0].[Priority], [t0].[DateInserted], [t0].[Status]
FROM [dbo].[Jobs] AS [t0]
WHERE [t0].[Status] = @p0

However the shocking discovery for me at that point was the query plan:

00-InitialQueryPlan

Yes, SQL server was NOT using the non-clustered index I’ve created moments ago. WTF!
I’ve also checked the query actually shown by the SQL Server Profiler:

1
2
3
4
exec sp_executesql N'SELECT TOP (5) [t0].[Id], 
[t0].[Priority], [t0].[DateInserted], [t0].[Status]
FROM [dbo].[Jobs] AS [t0]
WHERE [t0].[Status] = @p0',N'@p0 int',@p0=1

Again, result was obviously the same.

Ok, it was obviously related to the parameterization of the query rather than some LINQ issue. It just happened that LINQ actually parameterizes (and for good) queries.

Quick Bing check resulted in http://msdn.microsoft.com/en-us/library/cc280372.aspx, which has clear statement on the subject: “In some cases, a parameterized query does not contain enough information at compile time for the query optimizer to choose a filtered index. It might be possible to rewrite the query to provide the missing information

And this was the moment, when I cited the famous phrase. Was I sleeping during my own sessions, where I was talking about query plans, their caching, etc., like described here?

The problem becomes obvious, when you think that there are other possible values for which query plan using the non-clustered index would become invalid.

Solution

In my case, because of this and some other issues, I’ve decided to use stored procedure, which basically does all the job (retrieving the data and updating the state in one nice transaction).

1
2
3
UPDATE TOP (@BatchSize) Jobs 
SET Status = 2, DateFetched = GETUTCDATE() OUTPUT inserted.Id, inserted.[Priority], inserted.DateInserted
WHERE Status = 1

Works like a charm Smile