I will be speaking at the Greater Fort Worth SQL Server Users group this Wednesday. The talk will be on Troubleshooting with the DMV’s. If you are a DFW SQL’er come on out. Directions and more info can be found here.
If you don’t come, I am sending my boy after you with blow torch and a pair of pliers.
In part 1, I talk about what I consider feeble attempts at implementing a reporting server through log shipping, mirroring\snapshots and, to a lesser extent, replication. Unless you invest in a real ETL solution, I argued that it is better to run a mixed workload. I talked about the architectural advantages of running mixed workloads in part 2. In a nutshell, doubling the hardware and cutting the data in half.
In this final post, we will talk about new features in SQL Server 2008 and some features that have been around a while that can help with mixed workloads.There are also some bad practices that could be the right answer that we won’t talk about but let us mention triggers, table valued functions, 20 table outer joins, some correlated subqueries and table variables. These are options but usually not good ones. In the right circumstances, they could be right like an end of year report.
Here are the main tools in your arsenal:
Resource Governor
Chances are you do not want to limit throughput of your OLTP queries. The resource governor does not do a good job with these queries anyway because their duration is usually so short. However, let’s say you have reports that run by executives. You can put them in a workload group that gives them as much resources as possible without affecting OLTP traffic. You may also have a less important group of reports from the marketing or sales teams that you can limit further. One caveat to the resource governor is it cannot limit, disk IO so if that is your bottleneck, this will not help much.
Covering Filtered indexes
Filtered indexes are a great new feature in SQL Server 2008. When optimizing for reporting queries on your OLTP system, you are probably going to be touching a lot of rows so covering the query is important. For example, the order fulfillment team works off a report of unfulfilled orders that pulls in order data, customer data, shipping data etc. In this case, you would add covering filtering indexes on each of those tables. The filtered indexes reduce write overhead on your OLTP writes and reduces read overhead of your reporting.
Indexed Views
Indexed views take filtered indexed view a step further. You can create indexes on multiple tables. Think of it as denormalization alongside your OLTP optimized schema. In the previous order fulfillment example, we can basically persist that report and have it updated in real time. There is more overhead to your OLTP transactions so weigh the pro’s and con’s. Test if possible.Unfortunately, you cannot defer changes to your indexed views but I believe there is a feature request for this on Connect and I will tell you about a workaround shortly.
Partitioning and Compression
This is the dynamic duo when mixing workloads. Unfortunately, the nitty gritty details would require their own post. For example, one mixed workload may benefit from compression on the hottest partition while the older data should be uncompressed. However, another workload may benefit from the opposite. The key here is really understanding your workload, data and hardware limitations. Most importantly, plan then TEST, TEST, TEST! Once you partition, you loose online operations so if you do it wrong, you are stuck.
Persisted Computed Columns
This is an easy one. It is a simple trade off. Writes take a little more CPU and space in exchange for reduced CPU time when you report. Take your orders table, for example. You could calculate and save shipping costs when you insert the rows. If it adds a few milliseconds to the insert but shaves seconds off the hourly open orders report that the execs are looking at, it may be an easy decision.
Archival
This might not always be possible depending on your data. It may not be necessary if you have finely tuned indexes. However, it could make a night and day difference. If you need the data, UNION ALL’ing the production table with the archive table has little overhead. I do suggest you keep the archive database on the same server unless it will rarely be accessed. Trying to do this with linked servers is bad.
After hours denormalization
This is basically precreating reports during off hours. Think of it as indexed view with deferred updates. You can UNION with the OLTP tables if you need realtime data in your report. In an ideal world, touching less rows in the OLTP table and then UNIONing with the denormalized data will result in the best of both worlds if you need real time data.
The final word
As the concurrency and size of data scales, both a pseudo reporting database and a mixed work load scenario will not meet business requirements. A business requirement of real time data may dictate a mixed workload. There may be plenty of workloads where scaling out and scaling up both meet performance demands. I just wanted to play devil’s advocate and let you know there is another option when planning reporting.
Gail talks about bookmark lookups…. err.. key lookups in this post. So are they good or bad? Well, like many things in SQL, it depends. The main factor is the number of rows returned. A few rows are fine but the cost rises sharply with larger result sets. There comes a point where the threshold is crossed and a scan is more efficient. This is because a scan leverages sequential IO while a lookup does random IO.
Here are the results of the code at the end of the post.
As you can see, at 250 rows, we have crossed the threshold and it is cheaper do a scan. If you are passing in a literal, the optimizer can detect this and switch to a scan. If it is a stored proc or parameterized SQL, a plan is cached the first time it is run. Problems happen when the result size greatly varies depending on the parameter. There are ways around this all with their pro’s and con’s. Here are some:
Here is the sample code that can repro these numbers on SQL Server 2008.
CREATE TABLE #temp( id INT IDENTITY(1,1) PRIMARY KEY CLUSTERED, c1 CHAR(1000) DEFAULT( 'blah'), c2 INT) INSERT INTO #temp(c2) VALUES(1) GO INSERT INTO #temp(c2) VALUES(5) GO 5INSERT INTO #temp(c2) VALUES(10) GO 10INSERT INTO #temp(c2) VALUES(25) GO 25INSERT INTO #temp(c2) VALUES(50) GO 50INSERT INTO #temp(c2) VALUES(100) GO 100INSERT INTO #temp(c2) VALUES(250) GO 250INSERT INTO #temp(c2) VALUES(500) GO 500INSERT INTO #temp(c2) VALUES(1000) GO 1000INSERT INTO #temp(c2) VALUES(1000) GO 1000CREATE INDEX ix ON #temp(c2) --The baselineSET STATISTICS io ONSELECT *FROM #temp WITH (INDEX=1) WHERE c2 = 1--1 row returnedSELECT *FROM #tempWHERE c2 = 1--5 rows returnedSELECT *FROM #tempWHERE c2 = 5--10 rows returnedSELECT *FROM #tempWHERE c2 = 10--25 rows returnedSELECT *FROM #tempWHERE c2 = 25--50 rows returnedSELECT *FROM #tempWHERE c2 = 50--100 rows returnedSELECT *FROM #tempWHERE c2 = 100--250 rows returned--Must begin using hints because the optimizer can tell that a scan is betterSELECT *FROM #temp WITH (INDEX=ix, forceseek) WHERE c2 = 250--1000 rows returned--Must begin using hints because the optimizer can tell that a scan is betterSELECT *FROM #temp WITH (INDEX=ix, forceseek) WHERE c2 = 1000
It has often been said that even the most write intensive OLTP databases have an average of 80% reads and 20% writes. I am thinking interactivity has increased in applications because technology has allowed it to get more write intensive but I am just guessing.
Run this on your OLTP Crown Jewels and post the results. While not very scientific, it will give us some anecdotal data. The query could also be useful when sizing new hardware.
I came up with 97/3, 74/26 and 60/40 on three high volume databases.
--This query has minimal impact. --Looks at index metadata to determine read\write ratio since the last restart of the instance.SELECT CAST(SUM(user_seeks+user_scans+user_lookups) AS decimal)/CAST(SUM(user_updates)+SUM(user_seeks+user_scans+user_lookups) AS decimal) AS ReadPercent, CAST(SUM(user_updates) AS decimal)/CAST(SUM(user_updates) + SUM(user_seeks + user_scans + user_lookups) AS decimal) AS WriteRatioFROM sys.dm_db_index_usage_statsPost your results!email it! | | | Permalink 15 CommentsUpdated Unused Index Query Posted by Jason Massie on Tuesday, January 13, 2009 at 5:41 PM to SQL Server 2008, SQL Server 2005, SQL performance tuning, tsql, Indexes 2580 Views | 4 Comments | I have cleaned up and updated my unused index query. It returns a list of indexes ordered by ascending reads. If the instance has been up for a long time, it is probably safe to drop indexes with zero or close to zero reads. If the reads are low and the writes are high, this may help improve your transactions per second count. Otherwise, you are just cleaning up unused space. Use caution though. A missing index is worse than an unused index. SELECT objectname=OBJECT_NAME(s.OBJECT_ID) , indexname=i.name, i.index_id , reads=user_seeks + user_scans + user_lookups , writes = user_updates , p.rowsFROM sys.dm_db_index_usage_stats s JOIN sys.indexes i ON i.index_id = s.index_id AND s.OBJECT_ID = i.OBJECT_ID JOIN sys.partitions p ON p.index_id = s.index_id AND s.OBJECT_ID = p.OBJECT_IDWHERE OBJECTPROPERTY(s.OBJECT_ID,'IsUserTable') = 1 AND s.database_id = DB_ID() AND i.type_desc = 'nonclustered'AND i.is_primary_key = 0AND i.is_unique_constraint = 0AND p.rows > 10000ORDER BY reads, rows DESCI will also be putting this query up at the SSP WIKI.email it! | | | Permalink 4 CommentsFinding Index Scans due to Implicit Conversions Posted by Jason Massie on Monday, January 12, 2009 at 7:00 PM to SQL Server 2008, SQL Server 2005, query optimizer, SQL performance tuning, tsql, Indexes, Procedure Cache, DMVs 2451 Views | 8 Comments | It is fairly widely known that implicit conversions of data types can cause scans. However, you may not be able to control what goes in your database, you may have inherited an app and we just are not perfect.On top of that some can be tricky. For example, did you know that scope_identity and @@identity both return numeric data types. An AVG of a SMALLINT column returns an INT.Here is a simple repro which the same query does an index scan instead of an index seek CREATE TABLE t1( id INT IDENTITY(1,1) PRIMARY KEY CLUSTERED, c1 VARCHAR(100) ) GOINSERT INTO t1(c1) SELECT 1GO 10000INSERT INTO t1(c1) SELECT 0GO 10CREATE INDEX ix ON t1 (c1) CREATE PROC p1@blah NVARCHAR(100) ASSELECT id FROM t1WHERE c1 = @blahEXEC p1 N'0'CREATE PROC p2 @blah VARCHAR(100) ASSELECT id FROM t1WHERE c1 = @blahEXEC p2 '0'DROP TABLE t1DROP PROC p1DROP PROC p2 I found a sample query from Umachandar Jayachandran and made some mod’s to it. It basically scans your procedure cache for index scans containing implicit conversions.I have posted it at the SQLServerPedia Wiki here. It is the 2nd query listed.Note: Implicit conversions may not always result in a scan.email it! | | | Permalink 8 CommentsNever Index a BIT? Posted by Jason Massie on Tuesday, December 09, 2008 at 8:03 AM to SQL Server 2008, SQL performance tuning, Indexes 1994 Views | 4 Comments | Never say never, right? For example, you have an orders table. It has an IsShipped bit column which is what marks an order as done to the business. You might have a more complex version of this example so employees can see the outstanding order count in the app:CREATE TABLE#temp ( c1INT IDENTITY(1,1) PRIMARY KEY CLUSTERED, c2 VARCHAR(10) DEFAULT 'bob', c3 bit ) INSERT INTO#temp(c3) VALUES (1) GO 5000000 INSERT INTO#temp(c3) VALUES (0) GO 500 --13,000 IO'sSET STATISTICS io ON SELECTCOUNT(*) FROM #temp WHEREc3 = 0 CREATE INDEXix ON#temp(c3) include (c2) --6 IO's SET STATISTICS io ON SELECTCOUNT(*) FROM #temp WHEREc3 = 0 In SQL 2008, we can get a little better with filtered indexes but really all we are saving is space.CREATE INDEXix2 ON#temp(c3) include (c2) WHEREc3 = 0 --4 IO'sSET STATISTICS ioON SELECT COUNT(*) FROM #temp WHEREc3 = 0email it! | | | Permalink 4 Comments5 Quick Tips for the Query using the “Wrong” Index Posted by Jason Massie on Thursday, October 30, 2008 at 10:59 PM to SQL Server 2008, SQL Server 2005, query optimizer, Indexes 1537 Views | 1 Comments | I quote “wrong” because with the info the optimizer has, it thinks it has the right index. :)1. Look for cardinality problems. You can do this by looking at the actual rows versus estimated rows in the query plan. If they are off, chances are you are not following the best practices listed here.2. Conversions in the WHERE clause can cause issues. Be sure to look at query plan for implicit conversions3. Complicated logic in the WHERE clause can also cause problems. Consider indexed persisted computed columns or indexed views.4. Constraints give the optimizer more info when choosing a plan with the “right” index. It can take a PK or unique constraint into consideration but it does not know about similar logic in a trigger.5. Large datasets are being returned but the index is not covering. An index with just the columns in the WHERE clause is ok if just a few rows are returned but it does not take much for an index scan to be more efficient.#1 is the most important one because the link gives 5 or 7 more reason why the “wrong” index is chosen.P.S. If all else fails, update stats and if that does not work, make the index covering.email it! | | | Permalink 1 CommentsIndexed Views vs. Filtered Indexes Part 2 Posted by Jason Massie on Monday, August 25, 2008 at 9:21 AM to SQL Server 2008, SQL performance tuning, tsql, Indexes 1274 Views | 1 Comments | In the first part, we looked at the a very simple example with a single table SELECT using covering indexes. This one is a little more complex. We are still covering but we are joining two tables, ordering and grouping. Filtered indexes do not support joins so we have to create two indexes. In this test, the indexed view wins but filtered indexes come a respectable 2nd place compared to normal indexes. The DDL(2.5M in tblitem and 25M in tblitemdetails) CREATE TABLE [dbo].[tblItem]( [Itemid] [int] IDENTITY(1,1) NOT NULL, [ItemName] [varchar](100) NULL, [Datecreated] [datetime] NULL, [Closed] [bit] NULL, PRIMARY KEY CLUSTERED ( [Itemid] ASC ) ON [PRIMARY] ) ON [PRIMARY] GO CREATE TABLE [dbo].[tblItemDetails]( [ItemDetailsID] [int] IDENTITY(1,1) NOT NULL, [ItemID] [int] NULL, [Color] [varchar](20) NULL, [Size] [varchar](20) NULL, [Flavor] [varchar](20) NULL,PRIMARY KEY CLUSTERED ( [ItemDetailsID] ASC) ON [PRIMARY]) ON [PRIMARY] GO SET ANSI_PADDING OFFGO ALTER TABLE [dbo].[tblItemDetails] WITH CHECK ADDCONSTRAINT [fk01] FOREIGN KEY([ItemID])REFERENCES [dbo].[tblItem] ([Itemid])GO ALTER TABLE [dbo].[tblItemDetails] CHECK CONSTRAINT [fk01]GO --Create normal and filtered indexescreate index ix01 on tblitem(Datecreated, ItemName) create index ix02 on tblitem(Datecreated, ItemName) where Datecreated > '2007-08-22' and itemid > 0 and itemid < 50 create index ix01 on tblitemdetails(itemid, color) create index ix02 on tblitemdetails(itemid, color) where itemid > 0 and itemid < 50 --Create indexed viewsalter view ivw01 with schemabinding as select a.ItemID, b.Color, COUNT_big(*) as cntfrom dbo.tblItem a join dbo.tblItemDetails b on a.Itemid= b.ItemID where Datecreated > convert(datetime, '2007-08-22', 101) and a.itemid > 0 and a.itemid < 50group by a.ItemID, b.Color create unique clustered index ix01 on ivw01(itemid)create index ix02 on ivw01(itemid, color) Test Queriesset statistics io on --Using a normal covering index select a.ItemID, b.Color, COUNT_big(*) as cntfrom dbo.tblItem a with (index=ix01) join dbo.tblItemDetails b with (index=ix01) on a.Itemid= b.ItemID where Datecreated > convert(datetime, '2007-08-22', 101) and a.itemid > 0 and a.itemid < 50group by a.ItemID, b.Colororder by cnt --using a filtered index select a.ItemID, b.Color, COUNT_big(*) as cntfrom dbo.tblItem a with (index=ix02) join dbo.tblItemDetails b with (index=ix02) on a.Itemid= b.ItemID where Datecreated > convert(datetime, '2007-08-22', 101) and a.itemid > 0 and a.itemid < 50group by a.ItemID, b.Colororder by cnt --using an indexed view select * from ivw01order by cntIO numbers:Normal covering indexes(8 row(s) affected)Table 'tblItemDetails'. Scan count 8, logical reads 24, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.Table 'tblItem'. Scan count 1, logical reads 991, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.Filtered covering indexes(8 row(s) affected)Table 'tblItem'. Scan count 1, logical reads 99, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.Table 'tblItemDetails'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.Indexed View(8 row(s) affected)Table 'ivw01'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.Note: By increasing the result set 100 fold(itemid > 0 and itemid < 5000), we see very similar results except with merge joins. Query Plans:The query plans are different. Most notably being the scans for the filtered index and indexed view vs. the normal index. This is fine though. They as just a fraction of the size of the normal index. ConclusionThe mere fact that the indexed view runs with 50 times less IO overhead does not make it the automatic choice. What if you are on standard edition? Filtered indexes work on standard edition. Also we filtered on a date column. What if you need a real time rolling total? You have to use a normal index. Know you tools and then test, test and test some more. email it! | | | Permalink 1 CommentsFiltered Indexes VS. Indexed Views Posted by Jason Massie on Thursday, August 21, 2008 at 10:29 PM to SQL Server 2008, Indexes 1517 Views | 1 Comments | Note: This is a simple example and YMMV. I may test queries with lookups, hashes, aggregations etc later. Let me know if you have an interesting test case. I decided to run a quick test comparing covering filtered indexes vs. covering indexed views vs. a normal covering index. Read performance-wise, it is a statistical draw. The DDL and queries USE [demodb]GO CREATE TABLE [dbo].[tblItem]( [Itemid] [int] IDENTITY(1,1) NOT NULL, [ItemName] [varchar](100) NULL, [Datecreated] [datetime] NULL, [Closed] [bit] NULL,PRIMARY KEY CLUSTERED ( [Itemid] ASC) ON [PRIMARY]) ON [PRIMARY] GO create index ix01 on tblitem(Datecreated, ItemName) create index ix02 on tblitem(Datecreated, ItemName) where Datecreated > '2007-08-22' go create view ivw01with schemabindingas select ItemName, Itemid from dbo.tblItemwhere Datecreated > convert(datetime, '2007-08-22', 101) create unique clustered index ix01 on ivw01(itemid)create index ix02 on ivw01(ItemName) set statistics io on --Using a normal covering indexselect ItemName, Itemid from tblItem with (index=ix01)where Datecreated > convert(datetime, '2007-08-22', 101) --using a filtered indexselect ItemName, Itemid from tblItem with (index=ix02)where Datecreated > convert(datetime, '2007-08-22', 101) --using an indexed viewselect * from ivw01The graphical query plansThere are no surprises in the query plan. It does a scan on the filtered index but that doesn't matter since it is ...err... filtered.Statistics IO OutputNormal covering index:(346129 row(s) affected)Table 'tblItem'. Scan count 1, logical reads 991, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.Filtered Index:(346129 row(s) affected)Table 'tblItem'. Scan count 1, logical reads 990, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.Indexed view:(346129 row(s) affected)Table 'tblItem'. Scan count 1, logical reads 991, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.Other ConsiderationsSome other things to consider is the fact that the filtered index is a fraction of the size of the normal index. The indexed views are an enterprise feature but filtered indexes are not. The optimizer may have a hard time choosing a filtered index but these problems are not present with indexed views. So the right choice may vary with different apps even if the schema is the same.email it! | | | Permalink 1 CommentsNTSSUG June Presentation Posted by Jason Massie on Thursday, July 17, 2008 at 2:37 PM to query optimizer, SQL performance tuning, Indexes, PASS, Data collection 1196 Views | 0 Comments | Here are the slides and scripts from my June presentation at the North Texas SQL Server Users Group. The presentation was on the Data Collector in SQL Server 2008 but it was short so I also touched on filtered indexes, query hashes and troubleshooting. Enjoy and let me know if you have any questions. Also Peter Debetta is presenting TONIGHT at 6:30 on the following topics: "We'll be discussing the various ways you can secure your SQL Server data via encryption, including the new Transparent Data Encryption feature of SQL Server 2008." Stop by.email it! | | | Permalink 0 CommentsStupid XML SSMS Trick Posted by Jason Massie on Tuesday, July 15, 2008 at 5:40 PM to SQL Server 2008, SQL Server 2005, tsql, Indexes 1377 Views | 2 Comments | This might be HAWTALTA for some of you. I just thought of doing this today even though I have been using xml showplan since I was knee high to a junebug. You can make query results clickable just by casting them as XML. For example: select cast('select 1' as xml) By clicking the resulting "select 1", a new query editor window opens with "select 1" ready to execute. This would be useful for scripts that generate scripts. Here is a practical example. (Forgive the formatting. I cannot get too wide in the post.) SELECT sys.objects.name, (avg_total_user_cost * avg_user_impact) * (user_seeks + user_scans) as Impact, cast('CREATE NONCLUSTERED INDEX ~NewNameHere~ ON ' + sys.objects.name + ' ( ' + mid.equality_columns + CASE WHEN mid.inequality_columns IS NULL THEN '' ELSE CASE WHEN mid.equality_columns IS NULL THEN '' ELSE ',' END + mid.inequality_columns END + ' ) ' + CASE WHEN mid.included_columns IS NULL THEN '' ELSE 'INCLUDE (' + mid.included_columns + ')' END + ';' as xml) AS CreateIndexStatement, mid.equality_columns, mid.inequality_columns, mid.included_columnsFROM sys.dm_db_missing_index_group_stats AS migs INNER JOIN sys.dm_db_missing_index_groups AS mig ON migs.group_handle = mig.index_group_handle INNER JOIN sys.dm_db_missing_index_details AS mid ON mig.index_handle = mid.index_handle INNER JOIN sys.objects WITH (nolock) ON mid.object_id = sys.objects.object_idWHERE (migs.group_handle IN(SELECT TOP (500) group_handleFROM sys.dm_db_missing_index_group_stats WITH (nolock)ORDER BY (avg_total_user_cost * avg_user_impact) * (user_seeks + user_scans) DESC))ORDER BY 2 DESC Here are what some of the results look like. Update : You also have to associate the XML file extension with the SQL editor to be executable. email it! | | | Permalink 2 CommentsReminder: SSWUG Virtual Conference Starts Tomorrow Posted by Jason Massie on Monday, June 23, 2008 at 2:35 PM to SQL Server 2008, SQL performance tuning, Indexes, SSRS 878 Views | 0 Comments | Just a quick reminder that the SSWUG virtual conference starts tomorrow. I hope you can attend. They have given me a discount code to distribute: VIPJM2008DIS I do not get any extra cha-ching from you using it. I just state that so this post doesn't sound spammy. Actually hearing my recorded voice makes me cringe so this will definitely be interesting for me. :) Here are the sessions I am doing: Filtered Indexes and Statistics in SQL 2008 SQL Reporting Services for the DBA SQL Profiler: Configuration, Analysis, and SQL Server 2008 Enhancements http://vconferenceonline.com/sswug/demo.aspemail it! | | | Permalink 0 CommentsAttn DFW SQL'ers Posted by Jason Massie on Wednesday, June 18, 2008 at 2:58 PM to SQL Server 2008, SQL performance tuning, Indexes, PASS, Data collection, RC0 1215 Views | 0 Comments | Reminder: You can catch me at the North Texas SQL Server User's Group tomorrow night. The Data Collection presentation that I put together may be short so I have a few backup slides and demo's on filtered indexes, queries hashes and the 2 second SQL Server 2008 upgrade. Directions and details can be found at http://northtexas.sqlpass.org Hope to see you there.email it! | | | Permalink 0 CommentsSSWUG Virtual Conference Discount Code Posted by Jason Massie on Thursday, June 12, 2008 at 6:49 PM to SQL Server 2008, Indexes, SSRS, Professional Development 866 Views | 0 Comments | I just got a discount code for the SSWUG conference. The code is: VIPJM2008DIS . It is good for 10% off. I am not a sales person so watch the commercial. Hope to "see" you there. email it! | | | Permalink 0 CommentsSQL Server goes GREEN. Posted by Jason Massie on Thursday, April 10, 2008 at 10:42 AM to Indexes, Offtopic 628 Views | 0 Comments | Greg Linwood filed a connect with MSFT to remove nonclustered indexes (NCI) from the backups of SQL Server. The idea being less backup overhead and less power. That is a good idea. :) Check out his blog and then vote at the connect site.email it! | | | Permalink 0 CommentsRE: Filtered indexes not compatible plan reuse Posted by Jason Massie on Friday, March 21, 2008 at 11:40 PM to SQL Server 2008, query optimizer, Indexes, CTP6, Procedure Cache 1732 Views | 1 Comments | Before the original post, I had submitted a connect item for something similar.It turns out that filtered indexes may not be used when auto-parameterization(or for stored proc) occurs without a recompile hint. This is sort of understandable but I can think of workarounds and I suspect additional logic will be added in the future as mentioned below. Here is the feedback from a connect item I submitted."Thanks for your feedback. The query in question, select lastname from Person.Contact where ContactID <=(100-20) is auto-parameterized by SQL Server into the following form, using the standard auto-parameterization rules: select lastname from Person.Contact where ContactID <=(@1-@2) These rules are designed to minimize compile time cost for simple queries like this, possibly at the expense of additional optimizations such as using a filtered index. Admittedly, it is a difficult tradeoff. In a future release, we'll consider extending the design to make a better decision in a cost-based way. For now, this behavior is by design."alt head: You can't have your cake and eat it too.Technorati Tags: SQL Server 2008,CTP6,filtered index,bugfix,IwantMyCakeemail it! | | | Permalink 1 CommentsSQL shorts Posted by Jason Massie on Friday, March 21, 2008 at 1:27 PM to SQL Server 2008, SQL Shorts, SQL Server 2005, Windows Server 2008, tsql, Indexes, Offtopic, Virtualization, CTP6 2395 Views | 1 Comments | We regret to inform you of a passing in the SQL community. Read more here.Rick Heiges has a post on my favorite new feature in SQL Server 2008, filtered indexes. So does Decipherinfosys along with a good description on the difference of indexes and statistics if you need some background.SQLBlogcasts has gotten an upgrade and Tony posts some great stats. Congrats!Not SQL per say but the Hyper-V release candidate has been released. Speaking of Hyper-V, Sriram posts his slide decks on virtualizing SQL. Part 1 and Part 2.This is a great starting point for SQL Server 2008 as is this. These come by way of the MSDN\Technet update blog.The SQL Server 2005 sp3 debate continues.The SQL ISV team posts a performance improving cursor rewrite sample. However, it is not ANSI compliant :) which is odd since most ISV tsql code needs to be portable.Paul Nielson will be releasing a DVD.While we are at it, check out the new SQL Server social network.alt head: Got ADD?Technorati Tags: SQL Server,SQL Server 2008,SQL Server 2005,SQL Server 2000,tsql,cursors,virtualization,filtered indexes,social networking email it! | | | Permalink 1 CommentsFiltered indexes not compatible plan reuse?? Posted by on Sunday, March 16, 2008 at 5:31 AM to SQL Server 2008, query optimizer, Indexes, CTP6, Procedure Cache 2437 Views | 0 Comments | So I was messing around with filtered indexes tonight. Yes, beta software is my idea of a wild Saturday night. Hey, I played some Smash Bros first. ;) I was trying to figure out what happens if a plan is created using a covering but filtered index and try to reuse it with a parameter that is covered vertically but not horizontally. Let's look at this example: create database dbtest01;--Force parameterizationalter database dbtest01 set parameterization forced; use dbtest01--create test table 1create table t1(id int identity(1,1),c1 int,c2 int) --insert dummy datadeclare @ctr int = 2000000while @ctr > 0begininsert into t1(c1, c2) values (@ctr, @ctr)select @ctr=@ctr-1end --create clustered indexescreate clustered index CI on t1(id) --Let create a covering indexes that is pretty extreme.create index ix01 on t1(c1) where c1 = 1990001 --Test queries--ctrl + mdbcc freeproccacheset statistics io on --Does a CI scan.select c1 from t1 where c1 = 1990001 --Nope, index hints returns an error--Query processor could not produce a query plan because of the hints defined in this query. Resubmit the query without specifying any hints and without using SET FORCEPLAN.select c1 from t1 with (index=ix01)where c1 = 1990001 --Ahh, recompile works. Now it uses the filtered indexselect c1 from t1 where c1 = 1990001option(recompile) --What about simple parameterization and a stored proc?alter database dbtest01 set parameterization simple; create proc proc1@p1 intasselect c1 from t1 where c1 = @p1 --It does the CI scanexec proc1 1990001 alter proc proc1@p1 intasselect c1 from t1 where c1 = @p1option(recompile) --It uses the filtered index :)exec proc1 1990001email it! | | | Permalink 0 CommentsSQL Server 2008 filtered indexes in 5 minutes Posted by Jason Massie on Saturday, February 23, 2008 at 11:11 PM to SQL Server 2008, SQL performance tuning, tsql, Indexes, CTP6, In 5 minutes 3679 Views | 1 Comments | My jaw literally dropped when I saw it. If this works as advertised, it has the potential of changing everything. The days of over indexing will be over. Dynamic indexing off of the missing and unused index DMV's could be possible especially with added support. I also think this will better accomplish what people tried to do with partitioning for performance reasons in SQL Server 2005.It will be really interesting to see how it gets applied in production and where this leads in the next versions. Maybe an autoindex checkbox will replace the DBA :)This is a small and quick example. We will have to see how it scales to larger environments. use adventureworks--Let's nullify a random 400update Production.WorkOrderset EndDate = nullwhere WorkOrderID in (select top 400 WorkOrderID from Production.WorkOrder where enddate is not null) --Let's give a random 200 a recent date to mimic prod dataupdate top (200) Production.WorkOrderset duedate = getdate()-10where enddate is null --Base query --This is query type that should be simple yet common --Let's get open WO's that will be due soonset statistics io onselect p.Name, wo.OrderQty, wo.DueDatefrom Production.WorkOrder wo JOIN Production.Product p on wo.ProductID=p.ProductIDwhere wo.EndDate is null and wo.DueDate >= '2007-11-24' --CI Scan with loop join--Table 'WorkOrder'. Scan count 1, logical reads 530, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.--Now let's create a covering indexcreate index ix01 on Production.WorkOrder(EndDate,DueDate, ProductID) include (OrderQty)--Run the base queryset statistics io on select p.Name, wo.OrderQty, wo.DueDate from Production.WorkOrder wo JOIN Production.Product p on wo.ProductID=p.ProductID where wo.EndDate is null and wo.DueDate >= '2007-11-24'--Does NCI seek--Table 'WorkOrder'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. --Clean updrop index [Production].[WorkOrder].[ix01] --Create Filter indexcreate index ix01 on Production.WorkOrder(EndDate,DueDate, ProductID) include (OrderQty) where EndDate is null and DueDate >= '2007-11-24'--Run base queryset statistics io on select p.Name, wo.OrderQty, wo.DueDatefrom Production.WorkOrder wo JOIN Production.Product p on wo.ProductID=p.ProductIDwhere wo.EndDate is null and wo.DueDate >= '2007-11-24' --It uses it. We have a real small sample but it performs better--Table 'WorkOrder'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. --Now let's look at size --Create unfiltered index for comparisoncreate index ix02 on Production.WorkOrder(EndDate,DueDate, ProductID) include (OrderQty)--note the new syntaxdeclare @dbid int = db_id()declare @objid int = object_id('[Production].[WorkOrder]')select * from sys.dm_db_index_physical_stats(@dbid, @objid, null, null, 'detailed') ps join sys.indexes ion ps.object_id=i.object_id and ps.index_id=i.index_idand i.name in ('ix01', 'ix02') and i.type_desc='NONCLUSTERED'--We are looking at 2 page vs. 301 pagesThat is a huge difference in size. The major point is not the fact that we retrieve the same while taking up so much less space on disk but so much less space in memory as well. We are going to get into this much more! Technorati Tags: SQL Server 2008,Filtered indexes,CTP6email it! | | | Permalink 1 CommentsThe problem with local variables Posted by Jason Massie on Friday, January 25, 2008 at 9:32 AM to SQL Server 2005, query optimizer, SQL performance tuning, Indexes 1235 Views | 0 Comments | Have you ever been writing a query and just cannot get it to use the right index. This could be one of the reasons why. Let's use this query with local variables as our example.declare @Start datetimedeclare @End datetimeselect @Start = '2004-08-01 00:00:00.000'select @End = '2004-07-28 00:00:00.000'select ProductID from sales.SalesOrderDetail where ModifiedDate >= @End and ModifiedDate <= @StartIt generates this plan:but we have an index on ModifiedDate. There are many reason why SQL would not use this index but, for this post, we will assume we have eliminated them. Finally, we hard code the dates and we get this plan.So why is it doing this? The reason is because the query optimizer cannot accurately use the statistics to estimate how many rows are returned with local variables. Let's look at how we can tell there is a problem with the cardinality estimates. In the query with the local variables, the optimizer thinks we are getting 10918.5 rows so we do the index scan. In the query with hard coded literals, the estimated rows and actual rows are the same and accurate.How can we fix this? There are several way. This is another situation that makes a case for stored procedures or parameterized queries.create proc pDemo01 @Start datetime, @End datetimeasselect ProductID from sales.SalesOrderDetailwhere ModifiedDate >= @End and ModifiedDate <= @StartThe stored proc generates the proper plan. However, you will run into the same problem if you modify the parameter within the stored proc like select @start = @start-90. In this case, to should use sp_executeSQL. What if you cannot use a stored proc because it is a 3rd party app or some other reason? A covering index is probably the solution. Once we create this index, it will always be used:create index ix01 on sales.SalesOrderDetail(ModifiedDate) include (ProductID)We could use a plan guide or an index hint with a forceseek(SQL 2008) but performance will be really bad when we really do need to get 10k rows. The same problem can happen with stored proc's but that is another post.To get deeper into this subject, check out this.email it! | | | Permalink 0 CommentsThe forceseek hint in SQL Server 2008 Posted by Jason Massie on Thursday, December 06, 2007 at 10:38 PM to Windows Server 2008, query optimizer, SQL performance tuning, tsql, Indexes 1284 Views | 0 Comments | Have you every had a query that should totally be using an existing index and doing a seek but it keeps doing a clustered index scan? In SQL Server 2008, you can force it to do a seek. That may or may not be a good thing. Let's take a look at an AdventureWorks example.--logical reads 1246select ProductID, OrderQty from sales.SalesOrderDetailwhere ProductID > 776This query generates this plan. Now let's add the hint.--logical reads 271264select ProductID, OrderQty from sales.SalesOrderDetail with (forceseek)where ProductID > 776 This query generates this plan. The second query plan contains the index seek but logical reads are 20 times more. This is because it has to do lookup for OrderQty. Obviously using the hint would be a bad thing. So when would be a good time to use this hint? Let's look at this query create proc p1 @p int as select c1, c2 from t1 where c3 = @p t1 has 1 million rows 99% of c3 =1 .5% of c3 = 2 .5% of c3 = 3 99% of the queries run with @p = 2 or 3 The problem is if the query compiles with c3 = 1. The optimizer will choose a CI scan. In SQL 2005, we could recompile each time at the statement level or we could use an optimize for hint. Now we have another tool in the bag of tricks.Use with caution.email it! | | | Permalink 0 CommentsTake it a step further with the unused and missing index DMV's Posted by Jason Massie on Sunday, November 25, 2007 at 10:33 AM to SQL Server 2005, query optimizer, SQL performance tuning, tsql, memory bottleneck, Indexes 3589 Views | 0 Comments | In SQL 2000 I wanted to find unused indexes but found it to be difficult. I had to capture a huge trace that fully represents the workload and “trust” that the ITW knows what it’s talking about. You could also use the scan started trace event filtered by dbid, objectid, indexid. This procedure was very tedious. In SS2005, a quick query of the dynamic management views lets you know which indexes are not being used. You can do this in a few minutes what had previously taken days or weeks. This alone is a very powerful feature. I am proposing that you take it a step further. We are going to use a little logic and the missing index DMV’s to combine indexes and remove indexes that are still used but redundant. The steps in this process would look like this:1. Remove unused indexes with the unused index script2. Get your list of tables to analyze.3. Remove redundant but used indexes.4. Revue missing index DMV’s for mistaken index drops.5. Combine indexes that where it is logical to do so.6. Revue missing index DMV’s for mistaken index drops.This methodology is most effective and viable when these conditions are met: · The server has been online and thus collecting stats for a long time. · The server is not pushing a hardware bottleneck. If so, this should be done during maint window. · The server is enterprise edition and the tables allow online operations (i.e. no LOB data or partitions).What are the benefits of removing unused indexes? · Reduced writes during updates · Reduced space usage · Reduced backup\restore space and time · Reduced index maintenance timeWhat are the benefits of removing unused indexes, used but redundant indexes and combining indexes? · Less memory footprint for the same amount of data · A different angle to index tuning · Indexes are more likely to be covering · Reduced writes during updates · Reduced space usage · Reduced backup\restore space and time · Reduced index maintenance timeNow, let’s get down to business… Remove Unused IndexesThis is pretty straightforward and relatively safe. You want to make sure that your server has been up long enough to get good index usage. This includes infrequent operations like month end reporting, etc.--Unused indexesdeclare @dbid int select @dbid = db_id() select object_name(s.object_id) as ObjName, i.name as IndName, i.index_id , user_seeks + user_scans + user_lookups as reads, user_updates as writes , sum(p.rows) as rowsfrom sys.dm_db_index_usage_stats s join sys.indexes i on s.object_id = i.object_id and i.index_id = s.index_id join sys.partitions p on s.object_id = p.object_id and p.index_id= s.index_id where objectproperty(s.object_id,'IsUserTable') = 1 and s.index_id> 0 and s.database_id = @dbid group by object_name(s.object_id), i.name, i.index_id, user_seeks+ user_scans + user_lookups, user_updatesorder by reads, writes desc You can most likely drop any indexes with zero or close to zero reads. The more rows, the more space you will reclaim. The more writes, the better write performance you get by removing them. If the index is not being read but writes are minimal and rows are low, there is little benefit of dropping it. Keep in mind that not having an index that you need is a lot worse than having an index you don’t need in most situations. I suggest starting off with a conservative approach.Now that we have gotten rid of the low hanging fruit, we can get deeper. I get a list of tables by size and work my way down.select object_name(object_id), max(rows)from sys.partitionsgroup by object_name(object_id)order by 2 descAfter running that query we see that tEvent is the largest table and should provide nice gains. This is where we start. Let’s use this schema for our example.create table tEvent(EventID int primary key clustered,EventType int,EventName varchar(100),EventDetailID int,CustomerID int,CompanyID int,DateOpen datetime,DateClose datetime) create index ind1 on tEvents(EventDetailID); create index ind2 on tEvents(CustomerID); create index ind3 on tEvents(EventDetailID, CustomerID, CompanyID); create index ind4 on tEvents(CompanyID, DateClose, EventType); create index ind5 on tEvents(CustomerID, DateClose, EventType); Removing used but redundant indexesIf we do a sp_helpindex, we can see that ind3 should satisfy queries currently using ind1 and ind2. The keyword is “should”. It’s a fairly safe bet so we drop ind1 and ind2. Now we should monitor the missing index DMV to see if there is any negative impact. We should also see the read count substantially increase by using the unused index query filtered by tEvent. Here is the missing index monitoring query:--Missing indexesSELECT sys.objects.name, (avg_total_user_cost * avg_user_impact) * (user_seeks + user_scans) as Impact, mid.equality_columns, mid.inequality_columns, mid.included_columnsFROM sys.dm_db_missing_index_group_stats AS migs INNER JOIN sys.dm_db_missing_index_groups AS mig ON migs.group_handle = mig.index_group_handle INNER JOIN sys.dm_db_missing_index_details AS mid ON mig.index_handle = mid.index_handle INNER JOIN sys.objects WITH (nolock) ON mid.object_id = sys.objects.object_idWHERE (migs.group_handleIN(SELECT TOP (5000) group_handleFROM sys.dm_db_missing_index_group_stats WITH (nolock)ORDER BY (avg_total_user_cost * avg_user_impact) * (user_seeks + user_scans) DESC)) and objectproperty(sys.objects.object_id, 'isusertable')=1and name = 'tEvent'ORDER BY 2 DESC , 3 descAs long as the indexes do not come back up in this report, we should be ok. Now we are servicing the same queries with 2 less indexes. Those are index pages that are no longer taking up buffer pool space! Combining indexesThe same concept can be applied to the following scenario but a little more “feel” and understanding of how the app accesses the data is needed. create index ind4 on tEvents(CompanyID, DateClose, EventType); create index ind5 on tEvents(CustomerID, DateClose, EventType);We know a former DBA added these. It is logical for us, based on our knowledge of the app and the cardinality of the data, to try to replace these indexes with this one.create index ind6 on tEvents(CustomerID, CompanyID, DateClose, EventType) with (online=on, maxdop=8);This index should satisfy all queries using both indexes. Again, we want go back to the missing index report to see if SQL thinks it needs one of those indexes.Now work your way down your list of tables by row count. I was able to reduce the size of a 200GB database by 15% in addition to removing the unused indexes. That’s a big gain! The only problem I ran into was changing the name of an index that had a hint. There were a few indexes that I had to add back but since my approach was conservative it was nothing drastic and completely online.Let me know if you have any other tips or questions regarding this topic.email it! | | | Permalink 0 Comments
--This query has minimal impact.
--Looks at index metadata to determine read\write ratio since the last restart of the instance.SELECT CAST(SUM(user_seeks+user_scans+user_lookups) AS decimal)/CAST(SUM(user_updates)+SUM(user_seeks+user_scans+user_lookups) AS decimal) AS ReadPercent, CAST(SUM(user_updates) AS decimal)/CAST(SUM(user_updates) + SUM(user_seeks + user_scans + user_lookups) AS decimal) AS WriteRatioFROM sys.dm_db_index_usage_stats
Post your results!
I have cleaned up and updated my unused index query. It returns a list of indexes ordered by ascending reads. If the instance has been up for a long time, it is probably safe to drop indexes with zero or close to zero reads. If the reads are low and the writes are high, this may help improve your transactions per second count. Otherwise, you are just cleaning up unused space. Use caution though. A missing index is worse than an unused index.
SELECT objectname=OBJECT_NAME(s.OBJECT_ID) , indexname=i.name, i.index_id , reads=user_seeks + user_scans + user_lookups , writes = user_updates , p.rowsFROM sys.dm_db_index_usage_stats s JOIN sys.indexes i ON i.index_id = s.index_id AND s.OBJECT_ID = i.OBJECT_ID JOIN sys.partitions p ON p.index_id = s.index_id AND s.OBJECT_ID = p.OBJECT_IDWHERE OBJECTPROPERTY(s.OBJECT_ID,'IsUserTable') = 1 AND s.database_id = DB_ID() AND i.type_desc = 'nonclustered'AND i.is_primary_key = 0AND i.is_unique_constraint = 0AND p.rows > 10000ORDER BY reads, rows DESC
I will also be putting this query up at the SSP WIKI.
It is fairly widely known that implicit conversions of data types can cause scans. However, you may not be able to control what goes in your database, you may have inherited an app and we just are not perfect.
On top of that some can be tricky. For example, did you know that scope_identity and @@identity both return numeric data types. An AVG of a SMALLINT column returns an INT.
Here is a simple repro which the same query does an index scan instead of an index seek
CREATE TABLE t1( id INT IDENTITY(1,1) PRIMARY KEY CLUSTERED, c1 VARCHAR(100) ) GOINSERT INTO t1(c1) SELECT 1GO 10000INSERT INTO t1(c1) SELECT 0GO 10CREATE INDEX ix ON t1 (c1) CREATE PROC p1@blah NVARCHAR(100) ASSELECT id FROM t1WHERE c1 = @blahEXEC p1 N'0'CREATE PROC p2 @blah VARCHAR(100) ASSELECT id FROM t1WHERE c1 = @blahEXEC p2 '0'DROP TABLE t1DROP PROC p1DROP PROC p2
I found a sample query from Umachandar Jayachandran and made some mod’s to it. It basically scans your procedure cache for index scans containing implicit conversions.
I have posted it at the SQLServerPedia Wiki here. It is the 2nd query listed.
Note: Implicit conversions may not always result in a scan.
Never say never, right? For example, you have an orders table. It has an IsShipped bit column which is what marks an order as done to the business. You might have a more complex version of this example so employees can see the outstanding order count in the app:
CREATE TABLE#temp ( c1INT IDENTITY(1,1) PRIMARY KEY CLUSTERED, c2 VARCHAR(10) DEFAULT 'bob', c3 bit )
INSERT INTO#temp(c3) VALUES (1) GO 5000000
INSERT INTO#temp(c3) VALUES (0) GO 500
--13,000 IO'sSET STATISTICS io ON SELECTCOUNT(*) FROM #temp WHEREc3 = 0
COUNT(*)
CREATE INDEXix ON#temp(c3) include (c2)
--6 IO's SET STATISTICS io ON SELECTCOUNT(*) FROM #temp WHEREc3 = 0
In SQL 2008, we can get a little better with filtered indexes but really all we are saving is space.
CREATE INDEXix2 ON#temp(c3) include (c2) WHEREc3 = 0
--4 IO's
SET STATISTICS ioON SELECT COUNT(*) FROM #temp WHEREc3 = 0
I quote “wrong” because with the info the optimizer has, it thinks it has the right index. :)
1. Look for cardinality problems. You can do this by looking at the actual rows versus estimated rows in the query plan. If they are off, chances are you are not following the best practices listed here.
2. Conversions in the WHERE clause can cause issues. Be sure to look at query plan for implicit conversions
3. Complicated logic in the WHERE clause can also cause problems. Consider indexed persisted computed columns or indexed views.
4. Constraints give the optimizer more info when choosing a plan with the “right” index. It can take a PK or unique constraint into consideration but it does not know about similar logic in a trigger.
5. Large datasets are being returned but the index is not covering. An index with just the columns in the WHERE clause is ok if just a few rows are returned but it does not take much for an index scan to be more efficient.
#1 is the most important one because the link gives 5 or 7 more reason why the “wrong” index is chosen.
P.S. If all else fails, update stats and if that does not work, make the index covering.
In the first part, we looked at the a very simple example with a single table SELECT using covering indexes. This one is a little more complex. We are still covering but we are joining two tables, ordering and grouping. Filtered indexes do not support joins so we have to create two indexes. In this test, the indexed view wins but filtered indexes come a respectable 2nd place compared to normal indexes.
The DDL(2.5M in tblitem and 25M in tblitemdetails)
CREATE TABLE [dbo].[tblItem](
[Itemid] [int] IDENTITY(1,1) NOT NULL,
[ItemName] [varchar](100) NULL,
[Datecreated] [datetime] NULL,
[Closed] [bit] NULL,
PRIMARY KEY CLUSTERED
(
[Itemid] ASC
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[tblItemDetails](
[ItemDetailsID] [int] IDENTITY(1,1) NOT NULL,
[ItemID] [int] NULL,
[Color] [varchar](20) NULL,
[Size] [varchar](20) NULL,
[Flavor] [varchar](20) NULL,
[ItemDetailsID] ASC
SET ANSI_PADDING OFF
ALTER TABLE [dbo].[tblItemDetails] WITH CHECK ADDCONSTRAINT [fk01] FOREIGN KEY([ItemID])
REFERENCES [dbo].[tblItem] ([Itemid])
ALTER TABLE [dbo].[tblItemDetails] CHECK CONSTRAINT [fk01]
--Create normal and filtered indexes
create index ix01 on tblitem(Datecreated, ItemName)
create index ix02 on tblitem(Datecreated, ItemName) where Datecreated > '2007-08-22' and itemid > 0 and itemid < 50
create index ix01 on tblitemdetails(itemid, color)
create index ix02 on tblitemdetails(itemid, color) where itemid > 0 and itemid < 50
--Create indexed views
alter view ivw01
with schemabinding
as
select a.ItemID, b.Color, COUNT_big(*) as cnt
from dbo.tblItem a join dbo.tblItemDetails b on a.Itemid= b.ItemID
where Datecreated > convert(datetime, '2007-08-22', 101) and a.itemid > 0 and a.itemid < 50
group by a.ItemID, b.Color
create unique clustered index ix01 on ivw01(itemid)
create index ix02 on ivw01(itemid, color)
Test Queries
set statistics io on
--Using a normal covering index
from dbo.tblItem a with (index=ix01) join dbo.tblItemDetails b with (index=ix01) on a.Itemid= b.ItemID
order by cnt
--using a filtered index
from dbo.tblItem a with (index=ix02) join dbo.tblItemDetails b with (index=ix02) on a.Itemid= b.ItemID
--using an indexed view
select * from ivw01
IO numbers:
Normal covering indexes
(8 row(s) affected)Table 'tblItemDetails'. Scan count 8, logical reads 24, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.Table 'tblItem'. Scan count 1, logical reads 991, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Filtered covering indexes
(8 row(s) affected)Table 'tblItem'. Scan count 1, logical reads 99, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.Table 'tblItemDetails'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Indexed View
(8 row(s) affected)Table 'ivw01'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Note: By increasing the result set 100 fold(itemid > 0 and itemid < 5000), we see very similar results except with merge joins.
Query Plans:
The query plans are different. Most notably being the scans for the filtered index and indexed view vs. the normal index. This is fine though. They as just a fraction of the size of the normal index.
Conclusion
The mere fact that the indexed view runs with 50 times less IO overhead does not make it the automatic choice. What if you are on standard edition? Filtered indexes work on standard edition. Also we filtered on a date column. What if you need a real time rolling total? You have to use a normal index. Know you tools and then test, test and test some more.
Note: This is a simple example and YMMV. I may test queries with lookups, hashes, aggregations etc later. Let me know if you have an interesting test case.
I decided to run a quick test comparing covering filtered indexes vs. covering indexed views vs. a normal covering index. Read performance-wise, it is a statistical draw.
The DDL and queries
USE [demodb]
create index ix02 on tblitem(Datecreated, ItemName) where Datecreated > '2007-08-22'
go
create view ivw01
select ItemName, Itemid from dbo.tblItem
where Datecreated > convert(datetime, '2007-08-22', 101)
create index ix02 on ivw01(ItemName)
select ItemName, Itemid from tblItem with (index=ix01)
select ItemName, Itemid from tblItem with (index=ix02)
The graphical query plans
There are no surprises in the query plan. It does a scan on the filtered index but that doesn't matter since it is ...err... filtered.
Statistics IO Output
Normal covering index:
(346129 row(s) affected)Table 'tblItem'. Scan count 1, logical reads 991, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Filtered Index:
(346129 row(s) affected)Table 'tblItem'. Scan count 1, logical reads 990, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Indexed view:
Other Considerations
Some other things to consider is the fact that the filtered index is a fraction of the size of the normal index. The indexed views are an enterprise feature but filtered indexes are not. The optimizer may have a hard time choosing a filtered index but these problems are not present with indexed views. So the right choice may vary with different apps even if the schema is the same.
Here are the slides and scripts from my June presentation at the North Texas SQL Server Users Group. The presentation was on the Data Collector in SQL Server 2008 but it was short so I also touched on filtered indexes, query hashes and troubleshooting. Enjoy and let me know if you have any questions.
Also Peter Debetta is presenting TONIGHT at 6:30 on the following topics:
"We'll be discussing the various ways you can secure your SQL Server data via encryption, including the new Transparent Data Encryption feature of SQL Server 2008."
Stop by.
This might be HAWTALTA for some of you. I just thought of doing this today even though I have been using xml showplan since I was knee high to a junebug. You can make query results clickable just by casting them as XML. For example:
select cast('select 1' as xml)
By clicking the resulting "select 1", a new query editor window opens with "select 1" ready to execute. This would be useful for scripts that generate scripts. Here is a practical example. (Forgive the formatting. I cannot get too wide in the post.)
SELECT sys.objects.name
, (avg_total_user_cost * avg_user_impact) * (user_seeks + user_scans) as Impact
, cast('CREATE NONCLUSTERED INDEX ~NewNameHere~ ON '
+ sys.objects.name + ' ( ' + mid.equality_columns +
CASE WHEN mid.inequality_columns IS NULL THEN ''
ELSE CASE WHEN mid.equality_columns IS NULL
THEN '' ELSE ',' END + mid.inequality_columns END + ' ) '
+ CASE WHEN mid.included_columns IS NULL
THEN '' ELSE 'INCLUDE (' + mid.included_columns + ')'
END + ';' as xml) AS CreateIndexStatement
, mid.equality_columns
, mid.inequality_columns
, mid.included_columns
FROM sys.dm_db_missing_index_group_stats AS migs
INNER JOIN sys.dm_db_missing_index_groups AS mig
ON migs.group_handle = mig.index_group_handle
INNER JOIN sys.dm_db_missing_index_details AS mid
ON mig.index_handle = mid.index_handle
INNER JOIN sys.objects WITH (nolock)
ON mid.object_id = sys.objects.object_id
WHERE (migs.group_handle IN
(SELECT TOP (500) group_handle
FROM sys.dm_db_missing_index_group_stats WITH (nolock)
ORDER BY (avg_total_user_cost * avg_user_impact) * (user_seeks + user_scans) DESC))
ORDER BY 2 DESC
Here are what some of the results look like.
Update : You also have to associate the XML file extension with the SQL editor to be executable.
Just a quick reminder that the SSWUG virtual conference starts tomorrow. I hope you can attend. They have given me a discount code to distribute: VIPJM2008DIS I do not get any extra cha-ching from you using it. I just state that so this post doesn't sound spammy.
Actually hearing my recorded voice makes me cringe so this will definitely be interesting for me. :)
Here are the sessions I am doing:
Filtered Indexes and Statistics in SQL 2008
SQL Reporting Services for the DBA
SQL Profiler: Configuration, Analysis, and SQL Server 2008 Enhancements
http://vconferenceonline.com/sswug/demo.asp
Reminder: You can catch me at the North Texas SQL Server User's Group tomorrow night. The Data Collection presentation that I put together may be short so I have a few backup slides and demo's on filtered indexes, queries hashes and the 2 second SQL Server 2008 upgrade. Directions and details can be found at http://northtexas.sqlpass.org
Hope to see you there.
I just got a discount code for the SSWUG conference. The code is: VIPJM2008DIS . It is good for 10% off. I am not a sales person so watch the commercial. Hope to "see" you there.
Greg Linwood filed a connect with MSFT to remove nonclustered indexes (NCI) from the backups of SQL Server. The idea being less backup overhead and less power. That is a good idea. :) Check out his blog and then vote at the connect site.
Before the original post, I had submitted a connect item for something similar.
It turns out that filtered indexes may not be used when auto-parameterization(or for stored proc) occurs without a recompile hint. This is sort of understandable but I can think of workarounds and I suspect additional logic will be added in the future as mentioned below. Here is the feedback from a connect item I submitted.
"
Thanks for your feedback. The query in question, select lastname from Person.Contact where ContactID <=(100-20) is auto-parameterized by SQL Server into the following form, using the standard auto-parameterization rules: select lastname from Person.Contact where ContactID <=(@1-@2) These rules are designed to minimize compile time cost for simple queries like this, possibly at the expense of additional optimizations such as using a filtered index. Admittedly, it is a difficult tradeoff. In a future release, we'll consider extending the design to make a better decision in a cost-based way. For now, this behavior is by design.
alt head: You can't have your cake and eat it too.
We regret to inform you of a passing in the SQL community. Read more here.
Rick Heiges has a post on my favorite new feature in SQL Server 2008, filtered indexes.
So does Decipherinfosys along with a good description on the difference of indexes and statistics if you need some background.
SQLBlogcasts has gotten an upgrade and Tony posts some great stats. Congrats!
Not SQL per say but the Hyper-V release candidate has been released. Speaking of Hyper-V, Sriram posts his slide decks on virtualizing SQL. Part 1 and Part 2.
This is a great starting point for SQL Server 2008 as is this. These come by way of the MSDN\Technet update blog.
The SQL Server 2005 sp3 debate continues.
The SQL ISV team posts a performance improving cursor rewrite sample. However, it is not ANSI compliant :) which is odd since most ISV tsql code needs to be portable.
Paul Nielson will be releasing a DVD.
While we are at it, check out the new SQL Server social network.
alt head: Got ADD?
Technorati Tags: SQL Server,SQL Server 2008,SQL Server 2005,SQL Server 2000,tsql,cursors,virtualization,filtered indexes,social networking
So I was messing around with filtered indexes tonight. Yes, beta software is my idea of a wild Saturday night. Hey, I played some Smash Bros first. ;) I was trying to figure out what happens if a plan is created using a covering but filtered index and try to reuse it with a parameter that is covered vertically but not horizontally. Let's look at this example:
create database dbtest01;
--Force parameterization
alter database dbtest01 set parameterization forced;
use dbtest01
--create test table 1
create table t1
(id int identity(1,1),
c1 int,
c2 int)
--insert dummy data
declare @ctr int = 2000000
while @ctr > 0
begin
insert into t1(c1, c2) values (@ctr, @ctr)
select @ctr=@ctr-1
end
--create clustered indexes
create clustered index CI on t1(id)
--Let create a covering indexes that is pretty extreme.
create index ix01 on t1(c1) where c1 = 1990001
--Test queries
--ctrl + m
dbcc freeproccache
--Does a CI scan.
select c1 from t1
where c1 = 1990001
--Nope, index hints returns an error
--Query processor could not produce a query plan because of the hints defined in this query. Resubmit the query without specifying any hints and without using SET FORCEPLAN.
select c1 from t1 with (index=ix01)
--Ahh, recompile works. Now it uses the filtered index
option(recompile)
--What about simple parameterization and a stored proc?
alter database dbtest01 set parameterization simple;
create proc proc1
@p1 int
where c1 = @p1
--It does the CI scan
exec proc1 1990001
alter proc proc1
--It uses the filtered index :)
My jaw literally dropped when I saw it. If this works as advertised, it has the potential of changing everything. The days of over indexing will be over. Dynamic indexing off of the missing and unused index DMV's could be possible especially with added support. I also think this will better accomplish what people tried to do with partitioning for performance reasons in SQL Server 2005.
It will be really interesting to see how it gets applied in production and where this leads in the next versions. Maybe an autoindex checkbox will replace the DBA :)
This is a small and quick example. We will have to see how it scales to larger environments.
use adventureworks
--Let's nullify a random 400
update Production.WorkOrder
set EndDate = null
where WorkOrderID in (select top 400 WorkOrderID from Production.WorkOrder where enddate is not null)
--Let's give a random 200 a recent date to mimic prod data
update top (200) Production.WorkOrder
set duedate = getdate()-10
where enddate is null
--Base query --This is query type that should be simple yet common --Let's get open WO's that will be due soon
select p.Name, wo.OrderQty, wo.DueDate
from Production.WorkOrder wo JOIN Production.Product p on wo.ProductID=p.ProductID
where wo.EndDate is null and wo.DueDate >= '2007-11-24'
--CI Scan with loop join
--Table 'WorkOrder'. Scan count 1, logical reads 530, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.--Now let's create a covering index
create index ix01 on Production.WorkOrder(EndDate,DueDate, ProductID) include (OrderQty)
--Run the base query
set statistics io on select p.Name, wo.OrderQty, wo.DueDate from Production.WorkOrder wo JOIN Production.Product p on wo.ProductID=p.ProductID where wo.EndDate is null and wo.DueDate >= '2007-11-24'
--Does NCI seek
--Table 'WorkOrder'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
--Clean up
drop index [Production].[WorkOrder].[ix01]
--Create Filter index
create index ix01 on Production.WorkOrder(EndDate,DueDate, ProductID) include (OrderQty) where EndDate is null and DueDate >= '2007-11-24'
--Run base query
set statistics io on select p.Name, wo.OrderQty, wo.DueDate
--It uses it. We have a real small sample but it performs better
--Table 'WorkOrder'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. --Now let's look at size --Create unfiltered index for comparison
create index ix02 on Production.WorkOrder(EndDate,DueDate, ProductID) include (OrderQty)
--note the new syntax
declare @dbid int = db_id()
declare @objid int = object_id('[Production].[WorkOrder]')
select * from sys.dm_db_index_physical_stats(@dbid, @objid, null, null, 'detailed') ps join sys.indexes i
on ps.object_id=i.object_id and ps.index_id=i.index_id
and i.name in ('ix01', 'ix02') and i.type_desc='NONCLUSTERED'
--We are looking at 2 page vs. 301 pages
That is a huge difference in size. The major point is not the fact that we retrieve the same while taking up so much less space on disk but so much less space in memory as well. We are going to get into this much more!
Technorati Tags: SQL Server 2008,Filtered indexes,CTP6
Have you ever been writing a query and just cannot get it to use the right index. This could be one of the reasons why. Let's use this query with local variables as our example.
declare @Start datetime
declare @End datetime
select @Start = '2004-08-01 00:00:00.000'
select @End = '2004-07-28 00:00:00.000'
select ProductID from sales.SalesOrderDetail where ModifiedDate >= @End and ModifiedDate <= @Start
It generates this plan:
but we have an index on ModifiedDate. There are many reason why SQL would not use this index but, for this post, we will assume we have eliminated them. Finally, we hard code the dates and we get this plan.
So why is it doing this? The reason is because the query optimizer cannot accurately use the statistics to estimate how many rows are returned with local variables. Let's look at how we can tell there is a problem with the cardinality estimates. In the query with the local variables, the optimizer thinks we are getting 10918.5 rows so we do the index scan. In the query with hard coded literals, the estimated rows and actual rows are the same and accurate.
How can we fix this? There are several way. This is another situation that makes a case for stored procedures or parameterized queries.
create proc pDemo01 @Start datetime, @End datetime
select ProductID from sales.SalesOrderDetail
where ModifiedDate >= @End and ModifiedDate <= @Start
The stored proc generates the proper plan. However, you will run into the same problem if you modify the parameter within the stored proc like select @start = @start-90. In this case, to should use sp_executeSQL. What if you cannot use a stored proc because it is a 3rd party app or some other reason? A covering index is probably the solution. Once we create this index, it will always be used:
create index ix01 on sales.SalesOrderDetail(ModifiedDate) include (ProductID)
We could use a plan guide or an index hint with a forceseek(SQL 2008) but performance will be really bad when we really do need to get 10k rows. The same problem can happen with stored proc's but that is another post.
To get deeper into this subject, check out this.
Have you every had a query that should totally be using an existing index and doing a seek but it keeps doing a clustered index scan? In SQL Server 2008, you can force it to do a seek. That may or may not be a good thing. Let's take a look at an AdventureWorks example.
--logical reads 1246
select ProductID, OrderQty from sales.SalesOrderDetail
where ProductID > 776
This query generates this plan.
Now let's add the hint.
--logical reads 271264
select ProductID, OrderQty from sales.SalesOrderDetail with (forceseek)
This query generates this plan. The second query plan contains the index seek but logical reads are 20 times more. This is because it has to do lookup for OrderQty. Obviously using the hint would be a bad thing. So when would be a good time to use this hint? Let's look at this query create proc p1 @p int as select c1, c2 from t1 where c3 = @p t1 has 1 million rows 99% of c3 =1 .5% of c3 = 2 .5% of c3 = 3 99% of the queries run with @p = 2 or 3 The problem is if the query compiles with c3 = 1. The optimizer will choose a CI scan. In SQL 2005, we could recompile each time at the statement level or we could use an optimize for hint. Now we have another tool in the bag of tricks.Use with caution.
In SQL 2000 I wanted to find unused indexes but found it to be difficult. I had to capture a huge trace that fully represents the workload and “trust” that the ITW knows what it’s talking about. You could also use the scan started trace event filtered by dbid, objectid, indexid. This procedure was very tedious.
In SS2005, a quick query of the dynamic management views lets you know which indexes are not being used. You can do this in a few minutes what had previously taken days or weeks. This alone is a very powerful feature.
I am proposing that you take it a step further. We are going to use a little logic and the missing index DMV’s to combine indexes and remove indexes that are still used but redundant. The steps in this process would look like this:
1. Remove unused indexes with the unused index script
2. Get your list of tables to analyze.
3. Remove redundant but used indexes.
4. Revue missing index DMV’s for mistaken index drops.
5. Combine indexes that where it is logical to do so.
6. Revue missing index DMV’s for mistaken index drops.
This methodology is most effective and viable when these conditions are met:
· The server has been online and thus collecting stats for a long time.
· The server is not pushing a hardware bottleneck. If so, this should be done during maint window.
· The server is enterprise edition and the tables allow online operations (i.e. no LOB data or partitions).
What are the benefits of removing unused indexes?
· Reduced writes during updates
· Reduced space usage
· Reduced backup\restore space and time
· Reduced index maintenance time
What are the benefits of removing unused indexes, used but redundant indexes and combining indexes?
· Less memory footprint for the same amount of data
· A different angle to index tuning
· Indexes are more likely to be covering
Now, let’s get down to business…
Remove Unused Indexes
This is pretty straightforward and relatively safe. You want to make sure that your server has been up long enough to get good index usage. This includes infrequent operations like month end reporting, etc.
--Unused indexes
declare @dbid int
select @dbid = db_id()
select object_name(s.object_id) as ObjName
, i.name as IndName
, i.index_id
, user_seeks + user_scans + user_lookups as reads
, user_updates as writes
, sum(p.rows) as rows
from sys.dm_db_index_usage_stats s join sys.indexes i on s.object_id = i.object_id and i.index_id = s.index_id
join sys.partitions p on s.object_id = p.object_id and p.index_id= s.index_id
where objectproperty(s.object_id,'IsUserTable') = 1 and s.index_id> 0 and s.database_id = @dbid
group by object_name(s.object_id), i.name, i.index_id, user_seeks+ user_scans + user_lookups, user_updates
order by reads, writes desc
You can most likely drop any indexes with zero or close to zero reads. The more rows, the more space you will reclaim. The more writes, the better write performance you get by removing them. If the index is not being read but writes are minimal and rows are low, there is little benefit of dropping it. Keep in mind that not having an index that you need is a lot worse than having an index you don’t need in most situations. I suggest starting off with a conservative approach.
Now that we have gotten rid of the low hanging fruit, we can get deeper. I get a list of tables by size and work my way down.
select object_name(object_id), max(rows)
from sys.partitions
group by object_name(object_id)
order by 2 desc
After running that query we see that tEvent is the largest table and should provide nice gains. This is where we start. Let’s use this schema for our example.
create table tEvent
(EventID int primary key clustered,
EventType int,
EventName varchar(100),
EventDetailID int,
CustomerID int,
CompanyID int,
DateOpen datetime,
DateClose datetime)
create index ind1 on tEvents(EventDetailID);
create index ind2 on tEvents(CustomerID);
create index ind3 on tEvents(EventDetailID, CustomerID, CompanyID);
create index ind4 on tEvents(CompanyID, DateClose, EventType);
create index ind5 on tEvents(CustomerID, DateClose, EventType);
Removing used but redundant indexes
If we do a sp_helpindex, we can see that ind3 should satisfy queries currently using ind1 and ind2. The keyword is “should”. It’s a fairly safe bet so we drop ind1 and ind2. Now we should monitor the missing index DMV to see if there is any negative impact. We should also see the read count substantially increase by using the unused index query filtered by tEvent. Here is the missing index monitoring query:
--Missing indexes
INNER JOIN sys.dm_db_missing_index_groups AS mig ON migs.group_handle = mig.index_group_handle
INNER JOIN sys.dm_db_missing_index_details AS mid ON mig.index_handle = mid.index_handle
INNER JOIN sys.objects WITH (nolock) ON mid.object_id = sys.objects.object_id
WHERE (migs.group_handleIN
SELECT TOP (5000) group_handle
ORDER BY (avg_total_user_cost * avg_user_impact) * (user_seeks + user_scans) DESC)
)
and objectproperty(sys.objects.object_id, 'isusertable')=1and name = 'tEvent'
ORDER BY 2 DESC , 3 desc
As long as the indexes do not come back up in this report, we should be ok. Now we are servicing the same queries with 2 less indexes. Those are index pages that are no longer taking up buffer pool space!
Combining indexes
The same concept can be applied to the following scenario but a little more “feel” and understanding of how the app accesses the data is needed.
We know a former DBA added these. It is logical for us, based on our knowledge of the app and the cardinality of the data, to try to replace these indexes with this one.
create index ind6 on tEvents(CustomerID, CompanyID, DateClose, EventType) with (online=on, maxdop=8);
This index should satisfy all queries using both indexes. Again, we want go back to the missing index report to see if SQL thinks it needs one of those indexes.
Now work your way down your list of tables by row count. I was able to reduce the size of a 200GB database by 15% in addition to removing the unused indexes. That’s a big gain! The only problem I ran into was changing the name of an index that had a hint. There were a few indexes that I had to add back but since my approach was conservative it was nothing drastic and completely online.
Let me know if you have any other tips or questions regarding this topic.