May 2006 - Posts

I've had so many referrals from google on this error because of my previous blog entry on the T-SQL Value assignment SET vs SELECT I've decided to expand into this specific error and in what situations you get it, what causes it and how to get round it.

Msg 512, Level 16, State 1, Line 12

Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.

Some test data...

create table #test (
   mypk int not
null,
   fullname varchar(100) not
null
)

insert #test ( mypk, fullname ) values( 1, 'Tony Rogerson' )
insert #test ( mypk, fullname ) values( 2, 'Tony Rogerson'
)
insert #test ( mypk, fullname ) values( 3, 'Simon Sabin'
)
insert #test ( mypk, fullname ) values( 4, 'Trevor Dwyer'
)

The error message is pretty self-explanatory once you remember what a subquery is. A subquery is a query within your main query, for example the query below reproduces the behaviour and gives the message...

select *
from
#test
where mypk = ( select
mypk
               from
#test
               where fullname = 'Tony Rogerson' )

The sub-query is the bit between the braces, the sub-query in itself is fine and valid, its because we are using an operator that expects comparison against a constant rather than a set. Our sub-query is returning 2 rows and because we are using '=' equals expects a single value (1 row, 1 column).

The correct thing to do depends on your problem, we can do a number of things we can change the single value operator to an operator that can deal with sets, so we change from '=' to 'IN' and now everything works fine.

select *
from
#test
where mypk IN ( select
mypk
                from
#test
                where fullname = 'Tony Rogerson' )

This might not be what you want, the above query has changed in behaviour because it can now return multiple mypk, so if you where expecting a look up on a single mypk then you are stuffed. We can get round that by using the TOP keyword, however, think about what you are doing - it may not be valid, what determines which mypk gets selected? As the query below demonstrates we can get a random mypk, its going to be either 1 or 2.

select *
from
#test
where mypk = ( select
TOP 1 mypk
               from
#test
               where fullname = 'Tony Rogerson' )

When using TOP in a sub-query you can also use ORDER BY, this is the only time you can use ORDER BY in a sub-query, to digress, people started to use an undocumented behaviour where you can order a view but SQL 2005 now breaks that behaviour so my message is clear never use ORDER BY and TOP in a view and expect the view to be ordered - it won't be!!

select *
from
#test
where mypk = ( select
TOP 1 mypk
               from
#test
               where fullname = 'Tony Rogerson'
               order by mypk DESC
)

Again using this solution depends on the business requirement you are trying to resolve.

Sub-queries can exist in many places, on the SELECT clause, in a CASE statement in many of the builtin functions like DATEDIFF.

I think my only advice if you get this is to check that your using keys correctly, I'm a very pro on using surrogate keys so make sure you never get into this situation in the first place.

Remember =, !=, <, <= , >, >= all expect single values (0 or 1 row, 1 column), EXISTS and IN are set operators and as such expect a set of data (0 or more rows).

Tony.

 

I've need to do this a lot of times when outputing a fixed length data export where for instance a numeric column is 9 characters long and you need to prefix with leading 0's. This type of data export dates back to PL/1, COBOL era.

For a two character string you can do this...

declare @number tinyint

set @number = 2

select case when len( @number ) = 1 then '0' else '' end + cast( @number as varchar(2) )

For a number that will be greater than two characters in length you can do this...

declare @number int
declare
@string varchar(10
)
declare @size_of_fixed_string
tinyint

set @size_of_fixed_string = 10
set @number = 40

print replicate( '0', @size_of_fixed_string )

set @string = left( replicate( '0', @size_of_fixed_string ), @size_of_fixed_string - len( @number ) ) + cast( @number as varchar(10) )

print @string

 

It's taking me about a day to build this but I've got a fully working Windows 2003 cluster running on my development machine here.

My development machine is a Dual Core AMD 64bit with 4GB RAM and 2 SATA disks.

I use Windows 2003 Server R2 on it and also have Virtual Server 2005 R2 on it.

This article explains the steps to go through: http://www.microsoft.com/technet/prodtechnol/virtualserver/deploy/cvs2005.mspx, note, when creating the virtual disks you have a choice of IDE or SCSI bus - use IDE otherwise you won't be able to create the shared SCSI bus later on.

If you need help setting up the SYSPREP.INF (as I did being a developer) then check out: http://support.microsoft.com/default.aspx?scid=kb;en-us;302577.

Also, when you create the 'quorum' make it big, not the 500MB they say but enough to hold the databases you want on the shared disks, I know its not good practice putting stuff on the quorum but i can't see any other way to have multiple shared disks.

Once you have the cluster set up and working then follow these instructions: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/howtosql/ht_clustering_51rm.asp.

Remember to install SP4 at the end of it.

Once you've got it all installed remember to shut everything down and take a copy of the VHD and config files so you can play away!

Now, off to upgrade my 2000 to 2005 :)

Tony.

I wanted to compare Oracle Express with SQL Server 2005 Express editions but Oracle's aggreement definitely discourages me, quote from the eula....

Other
Upon 45 days written notice Oracle may audit the use of the program.  You agree to cooperate with Oracle's audit and provide reasonable assistance and access to information. You agree that Oracle shall not be responsible for any of your costs incurred in cooperating with the audit.

What on earth is that about? The words shove it come to mind....

 

I've been using it for a couple of days now in SQL Management Studio and it seems to work really well! It's not getting in the way and adds value most of the time.

On my super duper workstation its very quick, remarkedly actually, not sure on a slower box.

Anyway, its worth a try.

You can download it from here : http://www.red-gate.com/products/SQL_Prompt/index.htm

From Red-Gate's website:

SQL Prompt provides Intellisense® style auto-completion for Microsoft SQL Server editors. It will tell you the exact formatting you should use for your SQL commands, while you are writing them and will therefore help you write fast, perfectly formatted SQL statements. SQL Prompt improves the productivity of all SQL script creation. SQL Prompt can be downloaded free until 1st September 2006 and we are providing forum support on the SQL Prompt support forum.

SQL Prompt simply sits behind the scenes and provides unobtrusive help when you press Ctrl-Space or when you type "." after a table/view/alias name.

Features include :

  • Table/View name completion
  • Column name completion
  • Stored procedure name completion
  • USE completion
  • JOIN/JOIN ON completion
  • Auto-uppercasing of keywords
  • Auto-popup after keywords

SQL Prompt works with Microsoft Query Analyzer, SQL Server 2005 Management Studio, Visual Studio 2005, Visual Studio .NET 2003, SQL Server 2000 Enterprise Manager, UltraEdit32. Please note that Intellisense is a registered trademark of the Microsoft Corporation, we have a dedicated web page describing Red Gate's relationship with Microsoft.

 

Posted 23 May 2006 09:40 by tonyrogerson | with no comments
Filed under:

Its been a while but I needed to do this for a client this week where I needed to pick up a list of database backups and restore each one, so instead of hard wiring all 280+ I used the file system to do the donkey work.

First of all you need the Indexing Service installed and running.

Now, create a linked server to access Index Server:

EXEC sp_AddLinkedserver 'IDXServer', 'Indexing Service', 'MSIDXS', 'System'

You can now access the File System by doing simple queries for instance this query lists all the .TXT files in the c:\windows\system32 directory:

SELECT *
FROM OPENQUERY( IDXServer,
 
   'SELECT Directory, 
           FileName, 
           Size 
    FROM SCOPE('' "c:\" '') 
    WHERE FileName LIKE ''%.txt''
      AND Directory = ''c:\windows\system32'' '
)

Hopefully you can see the power and usefulness of this, notice you can get the Size of the files, I remember working the ATE booths at Tech-Ed in Barcelona last year and one of the things I did was to create a stored procedure that created a fact table based on the system you could then analyse your file system in terms of size of files, by extension etc... I know, annorak but I think I'll do the example again because it was really useful and shows how to build a cube, you can even create KPI's etc...

 

On occasion advice is given never to use SELECT when you are assigning values because it allows more than one row to be returned and that puts a question on what values the variables get assigned to. SET on the other hand will fail if more than one row is returned.

 

For example, the query below using SELECT actually returns 3 rows and @reserved gets set to the value of (randomly) one of those 3 rows.

declare @reserved int

select @reserved = reserved
from sysobjects so
   inner join sysindexes si on si.id = so.id
where so.name = 'sysobjects'
  and so.type = 'S'

print '@@rowcount = ' + cast( @@rowcount as varchar(10) )
print '@reserved = ' + cast( @reserved as varchar(10) )

On the other hand using SET will give an error and @reserved will be left as it was before the SET statement executed, I must admit I thought it would get set to NULL until I tried it!

declare @reserved int
set @reserved =
0
set @reserved = (
 
      select
reserved
      from sysobjects so
            inner join sysindexes si on si.id = so.
id
      where so.name = 'sysobjects
'
          and so.type = 'S' )

print '@@rowcount = ' + cast( @@rowcount as varchar(10) )
print '@reserved = ' + cast( @reserved as varchar(10)
)

Msg 512, Level 16, State 1, Line 3

Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.

@@rowcount = 0

 

The SET statement is great if you are only assigning one value and that’s where I tend to use it, however if you want to assign many values from a singleton select to many variables then you’d need to have multiple SET statements and multiple executions of the same query.

 

Check out we are using SELECT to assign multiple variables, but we still get the logic error of multiple rows being returned…

declare @reserved int,
        @rowcnt int,
        @used int

select @reserved = reserved,
       @rowcnt = rowcnt,
       @used = used
from sysobjects so
   inner join sysindexes si on si.id = so.id
where so.name = 'sysobjects'
  and so.type = 'S'

print '@@rowcount = ' + cast( @@rowcount as varchar(10) )
print '@reserved = ' + cast( @reserved as varchar(10) )
print '@rowcnt = ' + cast( @rowcnt as varchar(10) )
print '@used = ' + cast( @used as varchar(10) ) 

Now to do the same with SET you’d need to do this…

declare @reserved int,
        @rowcnt int,
        @used int

set
@reserved = ( 
   
select reserved
   from sysobjects so
      
inner join sysindexes si on si.id = so.id
   where so.name = 'sysobjects'
     and so.type = 'S' )

set @rowcnt = ( 
   select rowcnt
   from sysobjects so
      
inner join sysindexes si on si.id = so.id
where so.name = 'sysobjects'
  and so.type = 'S' )

set @used = ( 
   select used
   from sysobjects so
      inner join sysindexes si on si.id = so.id
   where so.name = 'sysobjects'
     and so.type = 'S' )

print '@@rowcount = ' + cast( @@rowcount as varchar(10) )
print '@reserved = ' + cast( @reserved as varchar(10) )
print '@rowcnt = ' + cast( @rowcnt as varchar(10) )
print '@used = ' + cast( @used as varchar(10) )

I don’t like having to rerun the query, it causes additional IO and CPU just for the sake of getting round the multi row problem that SELECT has, but, there is another solution!

declare @reserved int,
        @rowcnt int,
        @used int

select @reserved = reserved,
       @rowcnt = rowcnt,
       @used = used
from sysobjects so
      inner join sysindexes si on si.id = so.id
where so.name = 'sysobjects'
  and so.type = 'S'

if @@rowcount > 1
   raiserror( 'Could not assign because multiple rows returned', 16, 1 )

print '@@rowcount = ' + cast( @@rowcount as varchar(10) )
print '@reserved = ' + cast( @reserved as varchar(10) )
print '@rowcnt = ' + cast( @rowcnt as varchar(10) )
print '@used = ' + cast( @used as varchar(10) )

Ok, so its not really the same as SET, remember SET does not effect the original value if the SET does not work it just continues processing which to my mind is a bad thing anyway. I guess you’d really want to trap @@ERROR > 0 when using SET and trap @@TRANCOUNT > 1 and @@ERROR when using SELECT so either way you need to do some error checking!

 

567 pages of Itzik, Lubor Kollar and Dejan Sarka.

The book is an absolute must for anyone migrating too or using SQL Server 2005.

Good explanations and examples throughout the book, the source code for the examples if you want to take a peak can be found here: http://www.sql.co.il/books/insidetsql2005/

 

Hopefully this will put to bed the lobby that recommends you use COALESCE for queries that have optional specified parameters, the real world example would be a search form where the user can optionally choose from a number of drop down lists and then execute and what you then have to do in the query is to ignore the columns (parameters) they don't want to search upon.

The statement in the title was taken from one of --CELKO-- posts who is the biggest culprit doing this, apparently there are some tricks in DB2 and Ingress to allow a more efficient plan but in SQL Server no such features exist. Even the OPTION( RECOMPILE ) which sniffs the parameter values has no effect here.

Enter our test data - the scraped posts from all the SQL Server usenet groups which at the time I took my database copy was 753868 rows which amounts to 134MBytes, this table doesn't contain the messages but just the tree and some information around the post, create table below:-

CREATE TABLE [dbo].[mb_message](
   [id] [int] NOT
NULL,
   
[entry_date] [datetime] NOT
NULL,
   
[mb_forum_id] [int] NOT
NULL,
   [parent_mb_message_id] [int]
NULL,
   [parent_entry_date] [datetime] NOT
NULL,
   [flattened_tree] [varchar](900) COLLATE Latin1_General_CI_AS NOT
NULL,
   [message_depth] [tinyint] NOT
NULL,
   [message_reads] [int] NOT
NULL,
   [last_posting] [datetime]
NULL,
   [nntp_message_id] [varchar](200) COLLATE Latin1_General_CI_AS
NULL,
   [top_parent_mb_message_id] [varchar](900) COLLATE Latin1_General_CI_AS
NULL,
   [is_private] [char](1) COLLATE Latin1_General_CI_AS NOT
NULL,
   
[author_id] [int]
NULL
)
ON [PRIMARY]

There are no indexes on the table, its just a heap, so for the purpose of this example we create two indexes, one for each parameter we are going to use, note I'm not going to bother putting a clustered index on the table because I am simply illustrating the problem using COALESCE and its effect on index usage and having a clustered index will simply confuse the picture.

create index t1 on mb_message( mb_forum_id )
create index t2 on mb_message( top_parent_mb_message_id )

Query 1 - The query built dynamically would end up like this

declare @fid int
declare
@mid varchar(20
)
set @fid =
9
set @mid = '102389'

select top 10 *
from
mb_message
where mb_forum_id =
@fid
   and top_parent_mb_message_id = @mid

As you can see from the execution plan below it correctly uses the index and does an 'Index Seek' and the Logical Reads is just 5 pages (40KBytes).

2 Writing it using COALESCE

declare @fid int
declare
@mid varchar(20
)
set @fid =
9
set @mid = '102389'

select top 10 *
from
mb_message
where mb_forum_id = coalesce( @fid, mb_forum_id
)
    and top_parent_mb_message_id = coalesce( @mid, top_parent_mb_message_id
)
option( recompile )

The option( recompile ) is in there for good measure, it was introduced in SQL Server 2005 and does parameter sniffing but is of little use in this because of COALESCE use.

As you can see from the execution plan below it does not use the index, instead a table scan and costs us 13031 Logical Reads (101MBytes).

3. And finally the query using the ISNULL....

declare @fid int
declare
@mid varchar(20
)
set @fid =
9
set @mid = '102389'

select top 10 *
from
mb_message
where isnull( mb_forum_id, 99 ) = coalesce( @fid, isnull( mb_forum_id, 99 )
)
    and isnull( top_parent_mb_message_id, 99 ) = coalesce( @mid, isnull( top_parent_mb_message_id, 99 )
)
option( recompile )

Gives the same plan and logical reads as (2) above.

Summary

The only way in SQL Server, even 2005, to get an efficient, scalable and performant solution is to either use a ton of IF ELSE control of flow blocks (one per parameter combination) or use dynamic SQL. Erland Sommarskog gives an indepth discussion on dynamic SQL on his website.

Unfortunetly --CELKO-- fails to realise whilst a solution may look 'tidy', tidyness comes at a price and that price is performance and scalability. Would you rather your query read 40KBytes or 104,248KBytes, and all those extra pages that require reading may well be locked by people inserting/updating and deleting.

Reference

--CELKO-- reply to post on 17 May 2006 on microsoft.public.sqlserver.programming

--CELKO-- reply to post on 25 Apr 2006 on comp.databases.ms-sqlserver

Thanks to Dave McMahon of the NxtGenUG for your topic suggestion, I'd recommend people check Dave's group out - looks good!

Setting up SQL Server is a piece of cake, you download the software or put the DVD in and run Setup and it installs it all for you wizard fashion. Ok, you can do that but what do you need to consider before installing and what do you need to consider afterwards?

For your developer machine you probably aren't that bothered but if you are doing something for a client and you are in the hot seat for setting up SQL Server then you should do the best job you can if only to give your client a good experience of the brilliance of your solution ;).

Steps Before

Hardware
   How much memory
   Processor
Disk Configuration
   Size and speed
   SCSI or SATA
   RAID level
   Hot spare
   NT Allocation Size
   Placement of database files
   Controller Cache
Software and OS
   Version of OS
   Version of SQL Server
   32 or 64 bit
   Standalone SQL box or shared Application server

Steps After

Move Tempdb
Size Tempdb
Database files - seperate log and data
Set SQL Server max/min memory
Backups

Lets drill into some of these to give you some ideas.

When considering your hardware sit down and work out how many concurrent users, that is to say connections doing work at a single point in time. Its always worth spending the extra couple of hundred £ on a dual core processor, hyper threading in my opinion didn't really bring that much to the table with SQL Server, except confusion! Memory is so cheap its ridiculous so do a capacity planning excercise on your database and size the memory accordingly, remember to include applications and space for the OS, given 1GB of memory I'd personally always make sure that there is around 128MB of memory available to the OS (measured through task manager). Remember SQL Server uses a cache for data and procedure plans so it will eat memory (no its not a memory leak!). Give SQL Server a max or preferably a fixed memory size so its not relinquishing to other applications which will then cause you performance problems - remember, the more you can do in memory the quicker your application will be, if SQL Server has to keep going out to disk to service your query then performance will be slow.

After memory disks have got to be the most important part of a good SQL install, make sure you invest in a good IO subsystem otherwise you are doomed before you start. SATA is almost comparable to SCSI now-a-days as you can see by the massive drop in price of SCSI disks, where SATA has been let down until a year or two ago was disk access times but you can get 10Krpm disks now - Western Digital do the Raptor WD740G which is a 74GB 10Krpm SATA disk with a average seek time of 4.5 ms which is respectable and for just £102 that compared with the SCSI equivalent of the Seagate Cheetah for £125. If you have some spare cache the 15Krpm shaves another 1ms off the seek time, the Fujitsu do one for £225 and a 3.3ms seek time. Just 1.2ms difference per seek could aggregate to quite a lot on some systems that are disk bound. Personally, I take SATA for mid range systems and also for an online backup RAID 1 array but use SCSI 15Krpm for anything that requires the juice.

RAID 5 is dead and buried, people shouldn't be using it anymore, its a cost saving fault tolerence solution - check out the site Battle Against Any RAID Five for more information, I must remember to book mark that site - guessing at URL's always brings up something suprising, I typed barf.org first which is the American American Resistance Front. Hot spares are a good idea if you don't mind spending a few hundred on a redundant disk, the idea is that if any disk in your array should fail then the hot spare is brought into the array and fault tolerance is restored once the mirror is restored (all done while your system is online).

The biggest thing missed is the allocation size, if you do an install of Windows check out the allocation size on the C: (system drive) its just 4KBytes! Keep your databases off the system drive, make your system drive a good few GB's and put your databases on another logical volume that you have formatted with an allocation size that is balanced with your particular disk controller and RAID stripe, personally I use 64KBytes - this is a whole new topic that can be given pages....

Placing database files accordingly can have a dramatic effect on performance, consider a process that is write bound, one of my clients has devices storing event data in a SQL Server so inserts are happening at a phenominal rate of knotts. There are two parts to a database the data files and the log files, writing to the transaction log is sequential in nature and thus its good if the disk head is close or at the postition it last wrote, if it isn't then it starts costing you disk seek time (remember those 4.5ms average seek times.....). For big tables you can get good results splitting the clustered and nonclustered indexes onto seperate file groups on seperate volumes (preferably physical) - it has a big impact on the checkpoint process, thats the process that runs every minute or so writing dirty pages off to disk (remember, when you do an insert/update/delete then the data isn't written immediately, the pages are marked dirt in cache and written latter, its the log that is written immediately).

I always check and mention controller cache to clients, make absolutely sure the controller cache is battery backed up (on the controller) and that doesn't mean if you have a UPS then you'll be ok! It needs a battery on the controller and that battery needs to be ok. Get writing caching wrong and you'll get a slow system, but even worse you can corrupt your database.

Most new hardware bought now has a 64bit chip in it, if you are buying new software then make full use of your 64bit chip. You need to make sure that any printers or devices you have attached have a 64bit driver otherwise you'll have problems with those devices. The main benefit of 64bit is memory, you don't need to mess about with AWE to get SQL Server to use more than 2GB of RAM.

Consider your version of SQL Server carefully, there are many flavours now including a number of different versions of the free Express edition. The version comparison is interesting, alot of people might now get away with workgroup edition, some even Express Edition but remember there is no SQL Agent / job schedular with Express so you'll need to role your own mechanism for doing backups.

If you are sharing the server with applications including the middle tier then size your memory accordingly, don't squeeze SQL Server out, give it plenty of cache so your application queries are mostly resolved from cache rather than having to go out to disk.

In SQL Server 2005 tempdb plays a bigger part, it holds the version store which is used by the new concurrency features but more importantly the inserted and deleted tables that are used in triggers are held there. Always size tempdb properly, don't leave it at the default. Consider if your application will use any of the features that require tempdb usage and size accordingly.

Placement of files is critical as I've mentioned earlier, try and keep the LDF (transaction log) files on their own disks away from tempdb and the application MDF files.

Anyway, I think thats enough to think about. Sorry for the bullet point, brain dump nature of the post but its to give you food for thought, you can easily google for any of the above to get a full description, or just email me and I'll put meat on the bones.

Introduction

Collation specifies the rules for how strings are compared and how data is stored.

With SQL Server 7.0 came a new feature 'collation', before the introduction of collation we where stuck with the server collation, doing case insenitive comparisons required upper or lower casing the two inputs which led to performance problems.

Collation can be specified as a default server collation, at the database and column levels and since the introudction of SQL Server 2000 sql_variant at the value level - yes each value can have its own collation, consider the following example...

declare @sv1 sql_variant
declare
@sv2
sql_variant

set @sv1 = 'tony' collate Latin1_General_CI_AS
set @sv2 = 'tony' collate SQL_Latin1_General_CP1_CI_AS

if @sv1 = @sv2
      print
'the same'
else
      
print 'different'

set @sv1 = 'tony'
set @sv2 = 'tony'

if @sv1 = @sv2
      print
'the same'
else
      print 'different'

Collation effects all sorts of things too, get a migration wrong i.e. install your new copy of SQL Server with a windows collation rather than a legacy SQL collation and you could find problems using # tables. Tip: when doing a side by side/fresh install migration then use sp_helpdb and check that the collation of the system databases is the same between your old and new servers.

A recent discussion in the usenet forums asked if SQL Server worked like Oracle in that identifiers are case insensitive and data is case sensitive. Oracle simply upper cases all identifiers so there is no problem, whereas we SQL Server you get what you type and are dependent on the collation of the database.

My Opinion

Collation is a big pain the backside, I wish I could turn all the database level and below collation configuration off! I worked on a project where we used sql_variant and quite rightly too but the problem was the database was putting in one collation when you inserted through say Query Analyser but the application was doing a completely different one (SQL collation v Windows collation) so when we compared the values they where different even though they should be the same so you end up correcting the data on insert using the COLLATE operator or using that on the join clause (ouch).

Erland Sommarskog raises a good point that I agree with although I never do, basically the big problem with developing on a case insensitive development database is that if it ever comes to be released on a case sensitive one then you might find that you have spelt an identifier wrong (in terms of case), this can be got round by using all lower case for identifiers, something I practice myself. Erland suggests you test your design on a case sensitive collation before release - what a good idea!

Reference

SQL2005 Collation vs Oracle, Nasir, 11th May 2006

 

Posted 15 May 2006 06:27 by tonyrogerson | with no comments
Filed under:

There are so many occaisons where we want to hold just the date within SQL Server, the time is not part of the modelled entity. The trouble is we only have DATETIME and SMALLDATETIME to play with, these are the datatypes that you can use functions like DATEDIFF, DATEPART, DATENAME etc on, creating time dimensions in cubes etc...

Its my experience that most designers implement them as integer data type and we suffer this because we can then not use any of the good and powerful date functions.

Here is a way round that to please your logical model and yet still have the benefits of using the date functions, note, I've kept the columns in the table to a minimum to show you the technique so don't rant that there are no primary keys so its not a table blah blah blah....

create table trade (
   trade_datetime datetime not
null,
   settlement_date int not null check( isdate( settlement_date ) = 1
),
   sdt_settlement_date as convert( datetime, cast( settlement_date as char(8) ), 112 )
 
   )

create index ncidx_sdt_settlement_date on trade( sdt_settlement_date )

insert trade ( trade_datetime, settlement_date ) values( '2006-04-10T10:00:00', 20060411 )
insert trade ( trade_datetime, settlement_date ) values( '2006-04-10T10:00:00', 20060412
)
insert trade ( trade_datetime, settlement_date ) values( '2006-04-10T10:00:00', 20060413
)
insert trade ( trade_datetime, settlement_date ) values( '2006-04-10T10:00:00', 20060414 )

select diff = datediff( day, sdt_settlement_date, getdate() )
from
trade
where sdt_settlement_date between '20060411' and '20060412'

So, whats happening here?

The CHECK constraint uses the ISDATE function to make sure that only dates are entered into the integer column.

We use a calculated column to CONVERT the integer into a date, note: you cannot use CAST here because only the CONVERT with a style pattern gives a deterministic (precise) result.

Because the computed column is deterministic and precise we can index it!

Checking the plan on the query shows that the optimiser is using the index.

  |--Compute Scalar(DEFINE:([Expr1002]=datediff(day, [trade].[sdt_settlement_date], getdate())))
       |--Index Seek(OBJECT:([TCC_UKUG].[dbo].[trade].[ncidx_sdt_settlement_date]), SEEK:([trade].[sdt_settlement_date] >= 'Apr 11 2006 12:00AM' AND [trade].[sdt_settlement_date] <= 'Apr 12 2006 12:00AM') ORDERED FORWARD)

Ok, so we are only saving 4 bytes per column per row so not excessive unless you are in the millions of trades and then its a bit of a saving.

Another way of making sure there is no time component is this which to some is more elegant which on face value it is because you've less complexity in the table structure and you don't have to worry about the computed column requirements.

create table trade (
   trade_datetime datetime not
null,
   settlement_date datetime not null check( convert( char(12), cast( settlement_date as datetime ), 114 ) = '00:00:00:000'
)
)

insert trade ( trade_datetime, settlement_date ) values( '2006-04-10T10:00:00', '20060410' )
insert trade ( trade_datetime, settlement_date ) values( '2006-04-10T10:00:00', '2006-04-10T10:00:00' )

There are a number of requirements of using computed columns, specifically ansi settings for your application when using insert, update and delete so check books online before trying this at home.

 

Introduction

Application programmers and Business Intelligent professionals are faced with having to format data - taylored into what the users want. We have two choices as to where we do this processing, keep it in the database using the facilities of the database engine, for instance T-SQL, standard SQL dialect, CLR, XML or whatever the product has to offer or we can bring the data out of the database and down into the front end application or middle tier and format the data there (keep the database for store and retreive only).

My Opinion

The IT industry is full of rules and best practices unfortunetly some of these rules and best practices aren't based on current technology or business problems, in fact some of the rules and best practices are based on techniques adopted in the 70's and 80's on mainframes or early client server architecture.

No product is just a database anymore, sure SQL Server stores and retrieves data but it also offers us a lot more, in fact its moving more towards being the middle and data tiers in the three tier architecture now that SQL Server can be a web service and the inclusion of CLR.

Data formatting, be it paging, value concatenation should always been done where it is most efficient to do it. Consider (and benchmark) where its most efficient to do this, would you really drag 1 million rows into the middle tier or client browser only to get page 2 of 20 rows? It doesn't make sense.

Relating this to a well known expert, --CELKO--, he states that you should NEVER do formatting in the database and it should always be done in the front end. Think this through, take value concatenation for instance, say you need to create a list of values for a given product category, for instance for a given person show the mailing lists they belong to. In the database this will be held in rows, so if a person belongs to 5 mailing lists there will be 5 rows, now, say the user requires the values to be normalised so they are displayed on just one line entry. We have two choices, drag the 5 rows down to the front end or middle tier and use a 4GL to process the data or we can use some of the extensions available in SQL Server to do this.

Example

create table mailing_list (
   individual_name nvarchar(100) not
null,
   list_name nvarchar(10) not
null
)

insert mailing_list ( individual_name, list_name ) values( 'tony r', 'List A' )
insert mailing_list ( individual_name, list_name ) values( 'tony r', 'List B'
)
insert mailing_list ( individual_name, list_name ) values( 'tony r', 'List C'
)
insert mailing_list ( individual_name, list_name ) values( 'joe r', 'List A'
)
insert mailing_list ( individual_name, list_name ) values( 'joe r', 'List B'
)
insert mailing_list ( individual_name, list_name ) values( 'alex r', 'List A' )

select distinct 
   individual_name
,
   list = substring(
 
      ( select ', ' + list_name as
[text()]
         from
mailing_list m2
         where m2.individual_name = m1.
individual_name
         for xml path(''), elements
)
      
, 3, 100
)
from mailing_list m1

Gives this result :-

alex r      List A

joe r       List A, List B

tony r      List A, List B, List C

Now, just how easy was that! It only takes a few lines of SQL and you have also saved a lot of network traffic back out to the middle tier or front end.

So, my point is this: whatever you do - always think through what you are doing, don't just follow 'rules' blindly!

References

Quote from 'Row Numbering Unpredicable', Apr 11th, 2006

Original post by Chris Smith

I need to create a stored procedure that returns the row number (for
paging) AFTER the data has been sorted with an order by.  The source is
a