Filtering by joining is difficult to optimise

Joining data is one of the most complex parts of SQL optimisation. Especially when you are dealing with large tables and you are trying to visualise what set processing the query engine will do.

You might say you don't need to do all that, you can just use the Database Tuning Adviser. Well you're wrong. The database tuning advisor only helps out indexing a database. It doesn't help out when you have bad SQL. Do you have an app that fires a query for each line in an order, or calculates the order totals for all customers but only returns one of them

All these situations are about database querying design. The two examples above are quite simple to see and fix, but what happens when you just have a very large query that takes ages.You've got covering indexes, and key aligned indexes as much as possible but its still slow and you are only returning a few rows. Well what is most likely happening is that within the query you are processing more data than you need to. Trying to resolve this is very complex and oftens comes down to joining.

There are a number of ways to reduce the number of rows processed during a query,

1. use a where clause with a literal

2. join to another table

option 1 is very easy for the engine to process and can often mean the rows never make it out of the storage operator, index seek, index scan. The second is more difficult. If you have two tables with 1 million rows and when joined you only get 10 records. The Query processor has to likely process all rows in both tables.You maybe lucky and have key aligned indexes (indexes where the key columns are in the same order and match those of the join clause) which means you can do a merge join, if you haven;t got key aligned indexes then you will likely have a hash join. In both case you still have to read all the data. What happens when you add a 3rd table into the equation. that makes things more complex.

The summary is that even though you only need 10 rows if you are joining tables to filter it is likely you have to process all dat in both/all tables. If you can't do anything about the query/table design you need lots of memory and if your data is greater than your memory you need fast storage


Published 16 February 2010 19:59 by simonsabin

Comments

13 February 2010 00:46 by Simon Sabin UK SQL Consultant's Blog

# Getting your joins wrong can really impact performance

I've just posted about filtering with joins is difficult to optimize ( sqlblogcasts.com/.../Filtering-by-joining-is-difficult-to-optimise.aspx ). This post is along the same lines and is about the opposite, when

17 February 2010 03:20 by SqlServerKudos

# Filtering by joining is difficult to optimise

Kudos for a great Sql Server article - Trackback from SqlServerKudos

10 March 2010 13:26 by Simon Sabin UK SQL Consultant's Blog

# Getting your joins wrong can really impact performance

I've just posted about filtering with joins is difficult to optimize ( sqlblogcasts.com/.../Filtering-by-joining-is-difficult-to-optimise.aspx ). This post is along the same lines and is about the opposite, when