This is the first of the topics from my presentation “ Gone in 60 nano seconds “
As a quick summary this is why additional columns can cause degraded performance.
Background
- I put this brief list of points together after having a carefully tuned query degraded by the addition of a column onto one of the tables and into the join.
- Performance tuning is more than just adding indexes, and the performance of an application can be affected by many factors, but above all you want to be able to give your users a happy experience.
- Queries which perform slowly can frustrate users as well as give your application ( e.g. database – e.g. SQL server ) a bad name, which we don’t want. Users constantly hitting refresh for slow running and/or heavy io queries compound the problem further.
- When I put these points one response was what to do if there was a valid business reason to add a column to a table, my view is that there never is a valid “business reason” because you the developer are in control of the schema and have the choice on how to handle data or more likely extra attribute(s).
- It’s never going to be clear cut, all I wanted was that before adding extra columns to a table some thought was applied to the possible adverse affects of the action.
1. Most important - if a query has been optimised with an index for a table and an additional column is added to that query for that table then the index becomes invalid.
There are two possible outcomes
a) A table scan
b) a key lookup which means at best two reads of the table for each row. This may not sound so bad but if the lookup will always be a clustered index seek/scan - any clustered index seek/scan always returns/reads the entire row to cache even if only one bit value is required. If a table is wide and has a large (?) number of rows the impact can be such that the entire query may take many times longer even to the extent of timing out.
This may seem extreme but consider this simple example, a company table only has 1.6 million rows and is fairly narrow with a max row size of 1076 bytes.
A bit field is added, the actual table size does not alter much.
Picking an index that has a max row size of 60 bytes, one page read ( one io ) will contain approx 140 rows.
If we don't use this index and need a clustered index seek ( best option ) 140 rows will actually require 20 reads ( 7 rows per page ) so instead of 8k of data being read we will have read 160k.
A bookmark will use a typical index and the clustered index, in this case 41 reads, however there's always a possibility of a scan 1.8GB of data ( 229,000 page reads ) - a long way from 1 io and 8k.
A client had this type of issue by adding a bit field to indicate a row being deleted , so queries were changed to include " and deleted = 0 " but indexes were not changed. This is what inspired my SQL Bits presentation “ Can I index a bit? “
( btw a not in the where will always produce a full scan )
2. The column may not produce a useful index
Generally columns added to an existing table will contain something to be checked or retrieved. Should this new column not backfill variable values to existing rows then say taking a Sales table as an example, with say 11 million rows. Should existing rows set a default value or worse still null then we will have 11 million rows with one value and slowly added new rows having values.
The optimiser will ignore an index on this column, generally a secondary index requires better than 90% selectivity for the optimiser to use it ( there's always exceptions of course )
3. The table grows so that there are less rows per page.
Rows fit within a 8kb page, therefore a row of approx 1000 bytes will fit 8 per page, 11 million rows will occupy 1,375,000 pages
If the row size changes so that we can only fit 7 rows per page we'll need 1,571,429 pages, an increase of 1.56GB ( 1 page = 8kb )
a select returning 1,000 rows will now need 143 io compared to 125 io previously, this may not seem much but sometimes it's the small things that hurt.
4. The column will probably be nullable
Indexes on nullable columns ( or including nullable columns ) are 3 bytes a row wider. On our example Sale table this means any index including the added column will be approx 32MB larger just for the null
This is pretty small stuff in database terms, when I presented on index analysis I was working with tables of 400 million rows and more, here a null on an indexed column made a size difference of 1.1GB - scalability is important.
5. The table just gets larger
Adding an integer will increase our example Sale table by approx 42Mb, a datetime 84MB, a guid 168MB ( you should never use guids as keys if there is any other option available )
Adding columns has the potential to increase the io on every query using that table.
6. May cause locking and blocking
It may also increase the number of locks taken and may cause escalation to table locking.
Every access of a table/row requires locks, every lock requires resource, there's only so much resource. At some point SQL Server may escalate any type of locking to optimise resource, if additional columns cause table and/or index growth ( as point 3 ) then the number of locks may kick over the escalation threshold and a table lock occurs, shared read locks stop updates and deletes.
Just a few things to consider if you're thinking of adding a column.
© www.grumpyolddba.co.uk September 2010