July 2006 - Posts
I really like this new constant update of documentation coming from the SQL team
The BOL can be found here "SQL Server 2005 July Books Online"
Some of the samples have changed in the SQL Server Samples. They can all be found here "SQL Server 2005 Samples"
If you appreciate programming then this is a must read. http://thedailywtf.com/default.aspx
Most of us will be able to chortle with amusement due to havnig come across many of these in our working life (if not having written them ourselves :) )
For those of you that were at the first UK BI usergroup meeting you will have been my whirl wind example of processing an unstructured data file using ssis. The packages are now available.
The presentation is here http://sqlblogcasts.com/files/7/ssispresentations/entry902.aspx
and the package is here http://sqlblogcasts.com/files/8/ssisdemos/entry901.aspx
The Packages cover,
- Parsing unstructured data
- Multiple source files
- Managing lookups
- Incremental processing
The sample file is http://sqlblogcasts.com/files/8/ssisdemos/entry903.aspx and will need to be put in a folder
C:\UserGroup\SSIS - Parsing Unstructured Data\ErrorLogs
Enjoy and any questions please feel free to ping me.
Those of you that have tried posting comments of the email me will have noticed it was bust.
It is now fixed so feel free to contact me with your questions/comments
I recently looked into how ping and tracert worked only know that they use something called ICMP. Its fairly straight forward (well the basics) so thought I would provide a dummies explanation so don't use this a a foundation of a PHD on ICMP.
Ping is the transfer of a packet of data to a destination. It gets passed from router to router until it gets to the destination.
For the ping to work each router needs to be configured to pass on the packet of data and the destination needs to be configured to respond to the ping.
Many people disable the passing of ping and responding to ping requests for security reasons.
Tracert is simply multiple pings.
The packet of data used in a ping includes a number called the TTL.
The router only passes the packet on to the next ip address if the TTL is greater than 1, if it isn't it returns a TTL expired status and the ip address of the router on which it expire.
When the packet is passed on by a router it decreases the TTL by 1.
This means that if you do a ping with increasing TTL values starting at 1 you should find each router that is used to get to the destination ip address.
So this is what tracert does. It actually does each step 3 times so you can get an average
Try it your self "ping www.site.com -i1" will return the ip address corresponding to the first line in the "tracert www.site.com". (include -a to get the name of the router) In this case my firewall (which isn't named)
The final point to note is that
- a router may not decrease the TTL and so you may not get all the routers.
- a router may be configured to pass on the ping request but not respond,
this will result in a timeout in the tracert but subsequent rows returning
an ip address
- the route each ping takes may be different which can result in the same
router appearing twice. (the internet is just a spider web of networks)
You may wonder why a SQL bod is posting about this well keep posted and you will find out.
I've just finished watching the webcast on building a wareshouse according to Kimball principles. In it they generate an audit record by inserting a record and then using IDENT_CURRENT to get the identity value.
A few months ago some one on the forums was screaming because the IDENT_CURRENT was not working on a 64 bit server it was returning NULL.
My point is they both wanted the last identity generated for a table, but both missed the point that IDENT_CURRENT is not specific to a session but is server wide. This means you could be getting the identity value generated for an insert in another session. Pretty much like calling select MAX(identityCol) from table (but without the locking).
So just as people with triggers got burnt with the use of @@identity so are people with IDENT_CURRENT() whats more it probably won't show up as a real problem i.e. a failure, you will just have records associated with the wrong parent, your database will slowly become corrupt.
In both of these situations the user should be using SCOPE_IDENTITY to get the value generated by their last insert statement.
I was listening to a web cast by one of the infamous gurus of SQL and heard again that you should use nvarchar and nchar for your database columns and not varchar and char.
The reasons put forward were that
- your client will be in unicode and so if you don't there will always be conversion going on from non-unicode to unicode.
- you will be able to support languages such as Chinese etc that require unicode.
Again I was wondering how much should I take head and start using unicode because
- I work on central european systems
- the space required for unicode is double that of non-unicode
The only compelling reason I see would be if the performance of the conversion on the client out ways the size required to store the double bytes.
So I did a test. Very cruedly created 2 tables, populated them with 10,000 rows and read them from a .net client app and timed them.
What was the outcome, well on my laptop (so no network latency) the time to process the unicode data was twice that of the non-unicode data. That does make sense as SQL has to return twice the amount of data, that means reading twice the number of pages, send twice the amount of data down the stack and use up twice the amount of your valuable cache.
So it seems that the overhead of processing the double the amount of data far out ways any conversion that goes on.
When ever you design a database you will realise that the size of text based columns will dwarfs the size of your integer key columns, if you double the size of those columns the dwarfing becomes a real David v Goliath. imagine a address table with 2 integer key columns and 5x50 character address fields. thats 250 bytes of text and 8 bytes of integers, with unicode that becomes 500 bytes of text data, thats 16 rows per page. Your 1 million row table now takes up .5 Gb of your cache.
Whilst I know anyone that has used the CLR will be shouting, "but SQLCLR only support unicode", true but it doens't mean you have to use unicode for your columns.
So if you your system isn't likely to need unicode, (if you don't know or are not sure then it probably won't), don't use unicode for your columns. Save your cache, bandwidth and disk space.
In case you had missed it, SQLServerCentral have arranged a deal with Quest to get Litespeed for free.
You can read the review of litespeed here, which highlights the benefits of the compression that litespeed gives.
Because it has to write less dats it is generally quicker to backup, which is always a good thing.
I've had litespeed in my production and test environments for years but never in development. This will be great.