Data Warehouse Performance – Indexes

We are now dealing with index fragmentation on our data warehouse. After loading, updating, deleting, inserting, etc. hundreds of MB worth of data over the past month, the indexes that we initially created for the DW have become severely fragmented. This is one of the causes of the performance issues in the Data Warehouse.

Here is a snippet of code that will give you a (fairly) quick idea of how fragmented your indexes are. This code ran for 10 minutes (off-hours) on a production system with about 80 indexes on the database:

SELECT * FROM sys.dm_db_index_physical_stats
(DB_ID(N'db_DataWarehouse'), NULL, NULL, NULL , NULL);

If you want to look more specifically at an individual table’s indexes, pass:
OBJECT_ID(N'MyTable')
as the 2nd parameter above (instead of the NULL).

The main column here that you are interested in is: avg_fragmentation_in_percent. The avg of our averages is 72% for our entire DW, which will warrant some fixing. (This is mainly because we have neglected to do routine index maintenance, which I will discuss in a later post.)

If you can afford the CPU-time against the DW, I would also suggest passing an argument to the last parameter of the query above. The last parameter to that DMV (Dynamic Management View – search it on BOL for more info), determines the Scanning Mode. The default is LIMITED, which will not return data for columns such as avg_page_space_used_in_percent (which is very important!). This is because getting this statistic requires more time, and SQL Server will make you force it to get that kind of info. The other 2 modes are SAMPLED and DETAILED. Let’s run this query again in DETAILED mode, and only showing a few select columns:

SELECT
i.[Name],
index_type_desc,
avg_fragmentation_in_percent,
avg_page_space_used_in_percent
FROM sys.dm_db_index_physical_stats(DB_ID(N'db_DataWarehouse'), NULL, NULL, NULL , 'DETAILED') dmv
INNER JOIN sys.indexes i ON i.index_id = dmv.index_id and i.[object_id] = dmv.[object_id]

With the results of this query you want to pay attention to non-clustered indexes with the page_space_used below 80%. In a typical Data Warehouse environment, you will not want to pad your indexes. Instead, you want your index pages to be 100% (or as close as possible) full. This will result in an overall lower amount of pages to read from, which will make your queries faster. The flip side of this is your INSERTs will be slower because new leafs will have to be created often. This will also lead to index re-balancing (which will slow down your INSERTs even more), but a typical DW processes all of this information during a maintenance window anyways. I will discuss in a later post how to improve Data Warehouse loads, but for now let’s just assume you do mass loading directly into the tables.

So you’ve found an index (or two, three, four, etc.) that have fragmentation. You have 2 choices: #1 is to rebuild the index which will drop the index and rebuild it from scratch; #2 is to reorganize the index. I would suggest performing a index re-orgs once a week, and index rebuilds monthly.

Option #1 (Rebuilding) is the most effective at fixing fragmentation issues, but is also the most resource (and time) intensive.

Option #2 (Reorganizing) takes the existing index structure and organizes the index pages in order. By doing this, index scans are improved because the data is contiguous. Reorganizations also compacts pages, and empty pages are released and free up disk space.

Here’s how to reorganize an index:

ALTER INDEX [indexname] ON [TableName] REORGANIZE

And to rebuild an index:

ALTER INDEX [indexname] ON [TableName] REBUILD;

I hope that helps,

- Reagan

Edit: SQL Server 2005 BOL (Books Online) suggests that you rebuild when your fragmentation level is > 30%, and reorganize when it is < 30%.

VN:F [1.9.1_1087]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.1_1087]
Rating: 0 (from 0 votes)

Leave a Reply

You must be logged in to post a comment.