September 2007 - Posts

Recently I found this issue of deadlock variables in SSIS and finally made two solutions

1. Instead of declaring variables in Script properties (ReadOnlyVariables / ReadWriteVariables), better to make use of Dts.VariableDispenser.LockForRead / Dts.VariableDispenser.LockForWrite in the script to lock variables

2. I also solved my problem by running child packages out-of-proc (ExecuteOutOfProcess=TRUE in Execute Package Task), this is more to do with script caching.

But recently I saw something strange and found out that one of my script is making use of this type of code and I am getting this error again.....

"The script threw an exception: A deadlock was detected while trying to lock variables "variable names (comma separated)" for read access and variables "variable names (comma separated)" for read/write access. A lock cannot be acquired after 16 attempts. The locks timed out."

    Dim var As Variables

 

    Dts.VariableDispenser.LockForWrite("User::VariableName")

  

    Dts.VariableDispenser.GetVariables(var)

  

      'Problem Line......Just remove the reference to Dts.Variables

      Dts.Variables("User::VariableName").Value = "Some Value"

 

      'Correct Line.......make use of locally declared "var" variable

      var("User::VariableName").Value = "Some Value"

   

   var.Unlock()

  

To solve the situation I made a quick fix and change my Dts.Variables to local variable collection I defined in the beginning "var".

As we can see in this script that we are locking one variable for write. But after using the GetVariables we MUST use "var" collection to write data and should NOT use Dts.Variables and that fixed my problem.

So I think this is some sort of double locking issue, like if I lock the variable using "Dts.VariableDispenser" and if I try to access the same variable again with Dts.Variables instead of "var" then SSIS tries to lock that again......

 

 - Mohit Nayyar

Posted by mohitnayyar | with no comments
Filed under: ,

I just saw this new query tool from SQL Answers.

 

The beauty of this tool is to directly generate the output of any query into PDF / HTML / Excel / Word / CSV format. Please take a look here…....

 

http://www.sqlanswers.com/Software/SAQ/Default.aspx

 

Mohit Nayyar SQL Query Tool

 

This seems to be a lightweight tool to query multiple databases platforms like Microsoft SQL Server, Oracle, DB2, Microsoft Access and MySQL, but I just tested this for Microsoft SQL Server 2005 and I am really happy with this.

 

This makes things really easy when you need to send the output in PDF or HTML without any extra effort.

 

On the similar lines Microsoft also offers a “SQL Server Web Data Administrator” tool, which is very useful because of its basic web interface. You can find this here…….

 

http://www.microsoft.com/Downloads/details.aspx?familyid=C039A798-C57A-419E-ACBC-2A332CB7F959&displaylang=en

 

 

- Mohit Nayyar 

Posted by mohitnayyar | with no comments
Filed under:

Today I saw a new product coming soon to do DTS to SSIS migration called DtsXchange

Looks promising and this is also capable to do something extra which built-in DTS migration doesn’t do like Dynamic Properties. Have a look here http://www.pragmaticworks.com/dtsxchange.htm

Users can also migrate existing DTS packages into SSIS using the migration wizard provided by SQL Server 2005 but this wizard doesn’t cover complete DTS package. So in some cases manual effort is required, and SSIS doesn’t support some of the DTS features, so the user has to manually implement the functionality in new SSIS package.

You can find some known issues in migration here http://technet.microsoft.com/en-us/library/ms143462.aspx

But on top of this SSIS allows you to run the existing DTS package even without any change, using a wrapper called “Execute DTS 2000 Package Task” in the new SSIS package.

So if you don’t have the bandwidth to migrate the existing package then keep on running the old DTS packages in SSIS and start making new packages in SSIS.

Obviously you will miss the “All NEW SSIS” but it’s an easy work around. But I think you will miss a lot, because SSIS is NOT a new name of DTS but it's a truly enterprise level ETL tool. So its always better to migrate to SSIS as soon as possible.


- Mohit Nayyar

Posted by mohitnayyar | with no comments

At some point of time we all have to fine tune our old SQL Server queries or some of our colleague’s queries ;)

 

So I thought of writing something about common query performance problems and a checklist which can be used a make sure we didn’t miss anything.

 

1.

Large Result Set

Common mistake to include * instead of specific column names takes first place in investigation for performance issue.

 

We should always use specific column required by the application instead of asking all available columns.

 

Using * get worst when there are image/text/binary or very large data

columns available in the table.

 

Making use of TOP / TOP Percent / SET ROWCOUNT is another smart move to restrict the data based on technical solution.

2.

Missing Where Clause

We should always include the specific criteria when asking for rows.

Making use of WHERE clause is very beneficial when we have appropriate indexes on the columns in WHERE and SELECT clause.

3.

Lack of useful statistics

This is again very important point, statistics are the key for query optimizer to make an execution plan and get the data out.

 

SQL Server automatically creates useful statistics, but just makes sure that “Auto Create Statistics” is set to “TRUE” in database properties.

4.

Out-of-date statistics

This is again very important point, statistics are the key for query optimizer to make an execution plan and get the data out.

 

SQL Server automatically creates useful statistics, but just makes sure that “Auto Update Statistics” is set to “TRUE” in database properties.

5.

Lack of useful indexes

Again Indexes are the key to do any data I/O related operation. So just make sure you have all required indexes in place.

 

Just make sure you have created required clustered / non-clustered indexes based on the business requirement and technical solution implementation.

 

And try to create indexes on smallest possible numeric columns with respect to indexing requirement. I recommend reading BOL to understand the indexes thoroughly.

6.

Missing Joins

Yeah, that’s again is a good point to check. I have seen developers who simply put statements like “select * from table1, table2, table3”.

 

Never do that, just make sure you have followed the proper join and I recommend ANSI joins instead of “=, *=, =*” type joins. They are easy to read/manage and sort of required for future SQL Server versions.

7.

Blocking

Transactions’ fighting for resources is nothing new in any RDBMS application but we need to make sure that make use of all possible options to avoid excessive blocking.

8.

Query / Table Hints

They can be good or bad, based on the scenario. But it’s recommended that we should not use them extensively unless we have excellent understanding of technical implantation and future business requirement and data load trends.

SQL Server takes care of this very well, so better don’t use.

9.

Use best possible data type

We are talking about database systems, so taking care of what we are trying to store is very important in designing the whole system.

 

We should always choose the best possible smallest data type based on the scenario. Using INT for storing something like number 2000 is not a good option.

10.

Index Fragmentation

Hmm, not only having indexes solves the whole problem, we need to make sure that indexes are not fragmentated as well.

 

DBCC SHOWCONTIG and some of the latest DMVs are very useful in checking the fragmentation. And then re-building or defragmenting the index will solve the problem easily.

11.

Cursors

We should use cursors only when there is no option left and that too using READ ONLY, FAST FORWARD is recommended. And please make sure that we should only ask for smallest result-set possible using WHERE clause and with specific columns name instead of using star (*) in cursor definition.

 

Otherwise cursors put extra overhead on SQL Server and make extensive use of TempDB.

12.

Extensive temp table (tempdb) usage

Big bulky queries with lots of ordering, grouping clauses which result in huge result set can make use of TempDB extensively.

 

So try to use simple, smart queries which generate small result set and if this is not possible then make sure TempDb is on fastest possible disk system with setting a optimum size for the TempDB based on the technical solution.

 

Using TABLE data type is another good option for storing temporary result set. CTE (Common Table Expression), new to Microsoft SQL Server 2005 can also be used to achieve the same to some extent.

13.

Triggers

Try to use Constraints instead of triggers, like for referential integrity task using triggers is not a good option and that can be easily achievable by using constraints.

 

Triggers are more resource expensive and they work as integral part of transaction, which will makes transaction lengthy as well.

14.

Dynamic SQL Statements

Using Views, stored procedures are always beneficial over using dynamic SQL. And using dynamic SQL is an open invitation to SQL injection attacks.

 

And when there is no option left then we should make use of sp_ExecuteSQL instead of using EXEC command.

15.

Lack of Views / Stored Procedures

Accessing direct table is not a good practice in any engagement, and a developer should always access data through views and stored procedures.

 

SP also provides additional benefits like modular programming, security later, saves network traffic, and faster execution. SQL Server 2005 also introduces a statement level re-compilation, which is again very useful for bulky procedures.

16.

Use Distinct clause

Deleting duplicate data itself is a resource intensive operation, so making use of DISTINCT clause should be minimal in all queries.

17.

Missing SET NOCOUNT ON statement

Setting this option will stop sending the rows affected message to the calling client, which itself is a network intensive operation and moreover client applications doesn’t make use of those messages.

18.

Using UNION clause

Again using UNION will delete the duplicate rows in the result set, which is a resource intensive operation, so using UNION ALL is preferred way to doing the same thing.

19.

Multiple Network trips

Making use of stored procedures, views and small result set can save lots of network traffic.

 

Instead of sending a lengthy query to SQL server we should send procedure name with parameters to SQL Server and then get the smallest possible result set by using all possible options I explained here.

20.

Lack of useful data striping

Data partition is the very useful option to reduce the size of the table. We can do the same using horizontal or vertical partition.

 

This way we can save a lot of IO overhead and other related resources. There are various options available in SQL Server 2005 to do the same stuff and developers can also make their custom solutions based on business requirement.

 

Remember, small tables are always faster than bulky ones.

21.

Front-end Application issues like looping statements

Sometimes things are going well at SQL server end but because of some problem at front-end application we assume that SQL server is not doing his job.

 

So it’s always better to test the query at SQL Server end and then checking network resources before blaming poor SQL Server.

22.

Third party applications like anti-virus or some other services like IIS

Sometimes resource overhead because of other application running on same SQL Server is the main cause of poor performance.

 

So we should always check the memory and CPU utilization of other applications on the server.

23.

Slow network communication

Network bandwidth plays a very vital role in whole solution and this should be tested thoroughly in the solution.

24.

Inadequate memory in the server computer or not enough memory available for SQL Server

Setting appropriate settings for SQL Server is an art; SQL Server will NOT utilize all memory available until asked to do so.

 

Using 3GB, PAE, AWE options are the key to memory utilization for Microsoft SQL Server.

25.

High Processor usage by other applications

Sometimes resource overhead because of other application running on same SQL Server is the main cause of poor performance.

 

So we should always check the memory and CPU utilization of other applications on the server.

26.

Incorrectly configured system parameters

Setting SQL Server system parameters is a highly skilled job, and in some case it’s an art. Things works well on one server but this doesn’t out be true on another server because of changed business/technical requirement.

 

So it’s always better NOT to change any default option and even if we need to change some option then please read thoroughly about the option and test on development server before putting the same on production server.

27.

Disk bottlenecks

Disk IO is the key to any RDBMS application, because all we have is data and lots of data. So make sure you have used best possible RAID along with fastest possible Disk IO.

 

- Mohit Nayyar

 

 

 

Posted by mohitnayyar | with no comments

Recently I was proposing some of the high availability options for Microsoft SQL Server 2005, so thought of summarizing the same here....

SQL Server 2005 provides 4 basic options for high availability.

 

1.    SQL Clustering

2.    Database Mirroring

3.    Transactional Replication

4.    Log Shipping

 

Other than these 4 options some people also consider backup-restore and Database snapshots as other high availability options, which is really not true.....

 

"Database snapshots can be used as a recovery option from user/application specific errors or for reporting purposes when combined with other technologies. A snapshot is simply a read-only view of a database at a particular point in time, that’s why called - Database snapshots"

So let’s see all of them in more detail...

I will evaluate each high availability option on these parameters

 

1.

Failover type

Automatic or Manual

2.

Downtime period

Time to recover or get back the system in place

3.

Availability Scope

Database level or Server level

4.

Special hardware

Any special hardware required

5.

Standby Type

Hot / Warm

6.

Complexity

Configuration/maintenance complexity

7.

Accessibility

Is it possible to check the other server or can we make use of that anyway

8.

Client Access

Client redirection, Automatic or Manual

9.

Additional servers

Any additional Microsoft SQL Server required

10.

Distance limit

How far the servers can sit

11.

Automatic Role Change

Is it possible for the server to resume work automatically or need manual intervention

12.

Multiple storage location

Multiple data storage locations

13.

Hardware Setup Cost

How expensive the whole system will be

14.

SQL Server Editions support

What all options are available with each edition of Microsoft SQL Server 2005

 

 

Now lets take a closer look at the evaluating the different options…….

 

 

 

 

SQL Clustering

Database Mirroring

Transactional Replication

Log Shipping

1.     

Failover type

Automatic / Manual

Automatic / Manual

Manual

Manual

2.     

Downtime period

30 seconds

30 seconds

Manual effort required

Manual effort required

3.     

Availability Scope

Server

Database

Database / DB Objects

Database

4.     

Special hardware

Yes

No

No

No

5.     

Standby Type

Hot

Hot

Warm

Warm

6.     

Complexity

Complex

Little Complex

Little Complex

Easy

7.     

Data Accessibility (other than primary server)

No

Possible

Possible

Possible

8.     

Client Access

Automatic redirection

Automatic redirection

Manual effort required

Manual effort required

9.     

Additional servers (including base server)

At least >= 2 Servers

At least >= 3 Servers (automatic failover)

At least >= 2 Servers

At least >= 2 Servers

10.   

Distance limit

100 Miles

No Limit but depends upon network bandwidth

No Limit but depends upon network bandwidth

No Limit but depends upon network bandwidth

11.   

Automatic Role Change

Yes

Yes

No

No

12.   

Multiple storage location

No

Yes

Yes

Yes

13.   

Hardware Setup Cost

High, special certified hardware required

Medium

Low

Low

14.   

SQL Server Editions support

SQL Server Enterprise and Standard (2 node only)

SQL Server Enterprise and Standard (Restricted)

SQL Server Enterprise, Standard and Workgroup

SQL Server Enterprise, Standard and Workgroup

 

 

-       Mohit Nayyar