harePoint 2010 finally saw the recommended item limit for lists and libraries essentially lifted (Microsoft says lists can contain up to 30 million items!), but what happens when we need to query items in these large lists?
I’m not just being dramatic. We recently encountered a scenario in which a CAML query, executed by the server object model against a list containing a measly 15,000 items, failed immediately with the following exception:
Microsoft.SharePoint.SPQueryThrottledException: The attempted operation is prohibited because it exceeds the list view threshold enforced by the administrator.
The default value of the List View Threshold property for a web application is 5000 items. You can find this setting in Central Administration –> Manage Web Applications, by selecting a web application, clicking the down-arrow beneath the “General Settings” ribbon button, and then selecting “Resource Throttling”:
The “List View Threshold” web application setting in Central Administration
Given that 15,000 is relatively far from 30,000,000 (by a factor of 2000, but who’s counting?), I just assumed that this “list view threshold” was set arbitrarily low. Knowing our document library would probably max out at about 120,000 items, I considered simply bumping the threshold and calling it a day.
But researching the List View Threshold further yields some interesting information, notably the following:
- The 5,000 item limit isn’t arbitrary; it’s set carefully by Microsoft.
- It really shouldn’t be changed.
- The threshold isn’t the problem, but neither is the query. (Anecdotally: so what is?)
Microsoft’s documentation on querying large lists is nothing if not expansive, but it’s also confusing. The whitepaper offers this fun little nugget of knowledge:
Some operations, such as non-indexed queries or adding a column to a list, take time and resources that are proportional to the number of items in the list. […] As the list size increases, these operations take longer and use more resources. Rather than let these operations run unbounded, the list view threshold blocks them.
SharePoint blocks queries that return more than 5,000 items because it takes a long time to perform the query. It would follow that bumping the threshold, as I had planned, is a bad idea:
Even changing the limit from the default of 5,000 to 10,000 has a significant effect on performance. Rather than raising or lowering the list view threshold to improve performance, we recommend that you do not change the default list view threshold and focus instead on making sure queries perform well.
But here’s the catch: my query can’t “perform well” if it isn’t allowed to execute. Later on, Microsoft gets right to the point:
List view threshold exceptions can be common, especially immediately after upgrade. It might seem simpler to resolve these issues by changing the list view threshold. We strongly recommend that you do not change the list view threshold.
Microsoft offers some advice regarding how to improve a query such that it is allowed to execute. We can always override the QueryThrottleMode property of the SPQuery object to ignore the list view threshold as follows:
While similar to bumping the web application’s list view threshold, this option is slightly more sandboxed. In the same “Resource Throttling” menu in Central Administration as the standard List View Threshold setting, there is a second setting, called “List View Threshold for Auditors and Administrators”:
This threshold is different from the standard threshold setting due to the fact that it can only be used when the query is run programmatically, under a service account with appropriate permissions, and explicitly overridden:
Allowing the object model override specifies whether service accounts can use the list view threshold for auditors and administrators. A farm administrator must enable the object model override and programmatically specify that a list is an exception. Then, programmers with appropriate permission can programmatically request that their query or list use the higher list view threshold size for auditors and administrators to take advantage of it.
But this accomplishes the same end as bumping the list view threshold in Central Administration, which is to simply allow the query to execute without regard for the inevitable performance degradation.
The same goes for setting a time window within which such queries can run (and because our use case requires all users to be able to execute this query at any time, this was out of the question). For more information on the “daily time window” override, see the section in the aforementioned whitepaper.
We could get a little wilder by permanently disabling the list view threshold for the list itself (which can only be accomplished via the object model and, you guessed it, is also discouraged):
You can presume why that is a bad idea. So we are left with a few options for improving our query:
Execute the query against a limited view
SPQuery CAML queries are executed against the default view of a list unless otherwise specified; one way to avoid the list view threshold exception is to query against a view that returns fewer items.
You can specify the view against which an SPQuery object executes by using the second constructor when initializing your SPQuery object:
The query will then be executed against the view you specify instead of the default set of all list items.
This option was off the table for us (our use case required us to query against all items anyway), but querying against a view that preemptively limits the base number of items against which an SPQuery selects (to a number beneath the list view threshold) will cause the SPQueryThrottledException to disappear.
Index columns in the list
SharePoint 2010 introduced the ability to index specific columns in a list. On a bit of a whim, we decided to see if indexing the integer column on which we were querying would allow the query to run– and it worked.
It turns out that Microsoft has an entire MSDN article dedicated to Query Throttling and Indexing, which explains that adding a column index allows queries on large lists to run un-throttled. This article more succinctly states the operation of the list view threshold in relation to queries:
The list view threshold does not apply simply to the number of results returned by your query. Instead, it restricts the numbers of database rows that can be accessed in order to complete execution of the query at the row level in the content database.
The article proposes a scenario similar to ours, in which a query is executed against a list of 10,000 items :
If you were to build a query that returns the first 100 items [in a list of 10,000] sorted by a non-indexed Title field, the query would have to scan all 10,000 rows in the content database in order to determine the sort order by title before returning the first 100 items.
So what happens when we add an index to the Title field?
This would enable SharePoint to determine the top 100 items sorted by title from the index without scanning all 10,000 list items in the database. The same concepts that apply to sort operations also apply to where clauses and join predicates in list queries.
To index a column in a list, navigate to the list settings and click the “Indexed columns” link in the Columns section:
Click the “Indexed columns” link at the bottom of the “Columns” section on the list settings page.
On the next page, click “Create a new index.” Select the column you want to index; you also have the option of index a secondary column (think: dual-column primary keys in SQL):
You can index on any column, but it should be the main field in the “Where” clause that you use to select list items in your SPQuery.
After indexing the column “My Integer Column” as above, the following query will be permitted to execute against a large list:
To wrap things up, here’s the deal with querying large lists:
- SharePoint allows farm administrators to override the list view thresholds such that time-consuming, resource-hogging queries are permitted to execute, but doing so will not improve query performance and is generally discouraged.
- Rather than bumping the List View Threshold or the List View Threshold for Auditors and Administrators, you should modify your query so that it runs faster. The best ways to do this are:
- Execute the query against a limited view rather than the default list view containing all items by specifying the view name in the new SPQuery object constructor.
- If you can’t execute your query against a limited view, create a column index for each column contained in the <Where> clause of your SPQuery.
As always, my colleagues at PSC Group deserve a big thank-you for working with me on this fun little problem.
May your future SPQuery-against-large-lists endeavors be slightly less bad!