August 2010 - Posts

Partitioning- PART 2: Managing Partition using SPLIT, MERGE and SWITCH
30 August 10 09:33 AM | sarveshsingh | 2 comment(s)

In PART 1 I showed how to partition a table and the steps involved with examples. In this section i will show how you can manage partitions using SPLIT, MERGE and SWITCH.

SPLIT: is used to add new boundary points to a partition. The syntax is:
ALTER PARTITION FUNCTION PartitionFunctionName()
{
SPLIT RANGE (boundary_value)
}

Example 1:
Alter Partition Function PFN_Example()
Split Range (60)

I am using the partition function i created in PART 1. The above example adds another partition to the table. So any data greater than 60 in the table will be moved to the new partition.

MERGE: is used to remove a boundary point from the partition. The syntax is same as SPLIT. Replace SPLIT with MERGE.

ALTER PARTITION FUNCTION PartitionFunctionName()
{
MERGE RANGE (boundary_value)
}
Example 2:
Alter Partition Function PFN_Example()
MERGE Range (10)

The number of partitions for the table decreases by one by running the query in example 2. The data that existed in this partition is combined into a single partititon.

SWITCH: add rows to a table and removes rows from a table.

An INSERT or DELETE operation can be used to perform the same action. Then why use SWITCH? SWITCH is used because it does not cause blocking and lets you add any number of rows in less than a second.

Restrictions on a SWITCH operation:

1. A Full partition must be switched with an empty Partition

2. Both tables must use the same partition function and partition scheme.

3. Both tables must have the same structure for tables as well as indexes

4. The range of values in the partition being switched must not exist in the target table.

Syntax for SWITCH operation:

ALTER TABLE table_name
SWITCH [ PARTITION source_partition_number_expression]
TO [ schema_name.] target_table
[ PARTITION target_partition_number_expression]

DATA FLOW:

Let’s look at the flow of data when loading new data and removing old data. I have taken the concept from SQL SERVER DEEP DIVES, a book that I would highly recommend.

Below are the steps to load new data into a Partitioned Table.

1. Add a staging table in the same database and load the table with the new Data.

2. Add an Empty Partition at the front (Where you want to load the data) using SPLIT as shown in the figure below:

image

3. Use SWITCH to swap empty partition with Staging table. The data is now transferred to Partition 3, and staging table is empty as shown in the figure below:

image

This operation involves metadata only, so it only takes milliseconds to complete irrespective of the size of the table.

Below are the steps to remove old data from a Partitioned Table.

1. Create an empty staging table in the same database

image 

2. Swap the data with the old data using SWITCH operation. The result is shown below:

The STAGING table now contains the old data and the partition 1 is empty.

image

3. MERGE the Old Empty Partition with Partition 2 or another empty partition and effectively disappear. The staging table if needed can be copied or archived and the table truncated and reused later.

SUMMARY: Partitioning is quiet an involved topic, I just went through the basics in two Parts:

PART 1: CREATING PARTITIONS

PART 2 : MANAGING PARTITIONS USING SPLIT SWITCH AND MERGE.

I hope you will find this useful.
For more information on Partitioning please refer SQL Server Books Online.Also see http://msdn.microsoft.com/en-us/library/ms345146.aspx for a full discussion on Table Partitioning.

Filed under: , , ,
Partitioning – PART 1: Creating Partitions
27 August 10 11:26 AM | sarveshsingh | 2 comment(s)

We know that SQL server Databases grow with time. Large SQL Server databases can become difficult to manage due to their size. Usually such large databases have only a couple of large tables that take up most of this size. By using partitioning you can make these tables easier to manage and can improve query performance by splitting the table into multiple storage units called filegroups based on the criteria specified by the user.

I will explain this in two parts:

· Part 1: Creating Partitions

· Part 2: Managing Partitions

Part 1: CREATING PARTITIONS

To partition a table or index you follow the following steps as show below in the flow chart.

image

1st step - CREATING PARTITION FUNCTION:

Partition Function is the first step. It defines the boundary values or points for partitioning data. The syntax for creating partition is:

Create Partition function PartitionFunctionName (date type)
As Range [LEFT/RIGHT]
For Values ( Boundary values...)

NOTE: Data types that you cannot use are text, ntext, image, xml, timestamp,varchar(max), nvarchar(max), varbinary(max), alias data types, or CLR user-defined data types.

Example1:

Create Partition function PFN_Example (int)
as RANGE LEFT for values (10,20,30,40,50)

RANGE LEFT in the above example shows that the boundary point resides in left hand partition.

Partition

Values

1

-infinity to 10

2

11 to 20

3

21 to 30

4

31 to 40

5

41 to 50

6

51 to infinity

If you had specified RANGE RIGHT then the table would look like below:

Partition

Values

1

-infinity to 9

2

10 to 19

3

20 to 29

4

30 to 39

5

40 to 49

6

50 to infinity

2nd Step: CREATING PARTITION SCHEME

Partition scheme is used to maps Partitions to filegroups. The syntax to create Partition Scheme is:

CREATE PARTITION SCHEME PartitionSchemeName
AS PARTITION PartitionFunctionName
TO (filegroupname..)

Example2:

CREATE PARTITION SCHEME PS_Example
AS PARTITION PFN_Example
TO ([FG],[FG2],[FG3],[FG4],[FG5],[FG6])
Go

So a Partition Scheme simply specifies a name for a physical storage structure. In the partition function example we created 6 partitions. So any values in Partition 1 resides in FG1, values in Partition 2 resides in FG2 and so on.

3rd Step: CREATING A PARTITION TABLE.

The syntax for creating a Partition Table is:

CREATE TABLE tablename
({<column_definition>}[ <table_constraint>])
[ON {PartitionSchemeName (PartitionColumnName) ¦Filegroup ¦”default}]

Example3:

CREATE TABLE Orders
( ordered int identity(1,1) primary key clustered,
Orderdetails varchar(30) Not Null,
City varchar(30) Null)
on PS_Example(orderid)

In the above example it stores the table in scheme PS_Example. It uses orderid to determine which rows fall in which partition. If we consider all three examples it will look like below:

Create Partition function PFN_Example (int)
as RANGE LEFT for values (10,20,30,40,50)
go
CREATE PARTITION SCHEME PS_Example
AS PARTITION PFN_Example
TO ([FG],[FG2],[FG3],[FG4],[FG5],[FG6])
Go
CREATE TABLE Orders
( ordered int identity(1,1) primary key clustered,
Orderdetails varchar(30) Not Null,
City varchar(30) Null)
on PS_Example(orderid)

When the data is added to the table and as the data grows, orderid <=10 will be in partition 1 and will be in FG1, orderid between 11 and 20 in partition 2 and FG2 and so on.

Step 4: CREATE PARTITIONED INDEX

Example 4:

CREATE NONCLUSTERED INDEX idx_ordered_city on Orders(City)
on PS_Example(orderid)

SQL Server will store the portion of the index that falls in the criteria orderid<=10 in FG1, store the portion from 11 to 20 in FG2 and so on.

PARTITIONING AN EXISTING TABLE or INDEX

1. Create a Partition Function

2. Create a Partition Scheme

3. Drop the existing Clustered index

4. Re-create the clustered index on the partition scheme.

Summary: In this section you’ve learnt how to create partitions. In part 2  I will cover how you can manage partition using SPLIT, SWITCH and MERGE.

Fun with Dates
25 August 10 10:25 AM | sarveshsingh | with no comments

There’s many occasions when you would want to get the beginning of a particular month,  or end of the month or beginning of the week. Keeping this in mind i would like to share the concept to get this result.

This is achieved by first using the DATEDIFF function to calculate the difference in terms of whole days between anchor date and the date you specify ( I am using current_timestamp as an example). Then the DATEADD function is used to add diff days to the anchor.

select dateadd(dd, datediff(dd, anchor_date, current_timestamp), 'anchor_date')

If for example you wanted beginning of the current month then you can use:

select dateadd(dd, datediff(dd, '20010101', current_timestamp), '20010101')

If you wanted end of the month then change the anchor date to last day of the month as shown below:

select dateadd(dd, datediff(dd, '20011231', current_timestamp), '20011231')

Here’s some examples below.

declare @mydate datetime;
set @mydate = getdate();
select 'Beginning of this day',dateadd(dd, datediff(dd, '20100101', @mydate), '20100101')   
select 'Beginning of next day',dateadd(dd, datediff(dd, '20100101', @mydate) + 1, '20100101')
select 'Beginning of previous day',dateadd(dd, datediff(dd, '20100101', @mydate) - 1, '20100101')
select 'Beginning of this week (Monday)',dateadd(wk, datediff(wk, '20100503', @mydate), '20100503')
select 'Beginning of next week (Monday)',dateadd(wk, datediff(wk, '20100503', @mydate) + 1, '20100503')
select 'Beginning of previous week (Monday)',dateadd(wk, datediff(wk, '20100503', @mydate) - 1, '20100503')
select 'Beginning of this month',dateadd(mm, datediff(mm, '20100101', @mydate), '20100101')
select 'Beginning of next month',dateadd(mm, datediff(mm, '20100101', @mydate) + 1, '20100101')
select 'Beginning of previous month',dateadd(mm, datediff(mm, '20100101', @mydate) - 1, '20100101')
select 'Beginning of this quarter (Calendar)',dateadd(qq, datediff(qq, '20100101', @mydate), '20100101')
select 'Beginning of next quarter (Calendar)',dateadd(qq, datediff(qq, '20100101', @mydate) + 1, '20100101')
select 'Beginning of previous quarter (Calendar)',dateadd(qq, datediff(qq, '20100101', @mydate) - 1, '20100101')
select 'Beginning of this year',dateadd(yy, datediff(yy, '20010101', @mydate), '20010101')
select 'Beginning of next year',dateadd(yy, datediff(yy, '20010101', @mydate) + 1, '20010101')
select 'Beginning of previous year',dateadd(yy, datediff(yy, '20010101', @mydate) - 1, '20010101')

Filed under: ,
High CPU Headache
24 August 10 09:17 PM | sarveshsingh | with no comments

High CPU is a common performance issue in the world of SQL Server. You can easily identify this by opening task manager and looking at the performance tab. But i am going to share some DMV queries which will tell you what’s happening on the server behind the scenes as far as CPU is concerned.

SELECT SUM(signal_wait_time_ms) AS 'signal_wait_time_ms',
CAST(100.0 * SUM(signal_wait_time_ms)/
SUM (wait_time_ms)AS decimal(10,2)) AS 'signal_CPU_waits in percent',
SUM(wait_time_ms - signal_wait_time_ms) AS 'resource_wait_time_ms',
CAST(100.0 * SUM(wait_time_ms - signal_wait_time_ms) /
SUM (wait_time_ms) AS decimal(10,2)) AS 'resource waits in percent'
FROM sys.dm_os_wait_stats

If the Signal Wait > 20 it indicates the server is running on HIGH CPU. The figures generated by the query is since SQL Server was started or Statistics cleared by running:

DBCC SQLPERF('sys.dm_os_wait_stats', CLEAR);

You can also run the below query to confirm that you are experiencing high CPU if runnable_tasks_count>10.


SELECT scheduler_id, current_tasks_count, runnable_tasks_count
FROM sys.dm_os_schedulers
WHERE scheduler_id < 255

One of the things that i check is to see if the stored procedures have been recompiled. If it has, then it will clear out the cached plan and affect the total worker time which is nothing but the overall CPU cost of the query since it has been in the cache. Note the duration in cache column which shows the duration in minutes that the stored procedure was in the cache.

SELECT TOP 10 qt.text AS 'Stored procedure Name',
qs.total_worker_time AS 'TotalWorkerTime',
qs.total_worker_time/qs.execution_count AS 'AvgWorkerTime',
qs.execution_count AS 'Execution_Count',
ISNULL(qs.execution_count/DATEDIFF(Second, qs.creation_time,
GetDate()), 0) AS 'Calls/Second',
ISNULL(qs.total_elapsed_time/qs.execution_count, 0)
AS 'AvgElapsedTime',
qs.max_logical_reads, qs.max_logical_writes,
DATEDIFF(Minute, qs.creation_time, GetDate()) AS 'Duration in Cache'
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS qt
WHERE qt.dbid = db_id()

I also monitor Processor: % Processor Time,  System: Context switches\sec in performance monitor. Please look for more information about these counters here.

Index Usage
23 August 10 10:42 AM | sarveshsingh | with no comments
In versions prior to SQL 2005 it was very difficult to capture critical information about bad or missing indexes. But with DMV’s it has made life easier for the DBAs. In this section I will share the queries which I use to identify Bad and missing Indexes.

BAD INDEXES: If you have an index with a lot of writes and zero or very few reads then that means that you are not getting any benefit from maintaining this index and should consider dropping it. But if the reads go up then it's case of making a judgement call on the workload you've got and see what's best for your production environment.

Script to Identify Bad indexes

SELECT object_name(s.object_id) AS 'Table_Name',
i.name AS 'IndexName', i.index_id,
user_updates AS 'Total_Writes',
user_seeks + user_scans + user_lookups AS 'Total_Reads',
user_updates - (user_seeks + user_scans + user_lookups) AS 'RW_Difference'
FROM sys.dm_db_index_usage_stats AS s
INNER JOIN sys.indexes AS i
ON s.object_id = i.object_id
AND i.index_id = s.index_id
WHERE objectproperty(s.object_id,'IsUserTable') = 1
AND s.database_id = db_id() -- Enter your database name here.
AND user_updates > (user_seeks + user_scans + user_lookups )
ORDER BY 'RW_Difference' DESC, 'Total_Writes' DESC, 'Total_Reads' ASC;

MISSING INDEXES: If you are seeing a lot of user scans in your queries then you probably are missing some important indexes or you could be using a lot of aggregates. If you see a high index advantage with a recent Last_User_seek then consider adding an index.

ORDER BY HIGH INDEX ADVANTAGE:

SELECT user_seeks * avg_total_user_cost * (avg_user_impact * 0.01) AS advantage,
mg_stats.last_user_seek,
mid.statement AS 'TableName',
mid.equality_columns, mid.inequality_columns, mid.included_columns,
mg_stats.unique_compiles, mg_stats.user_seeks, mg_stats.avg_total_user_cost,
mg_stats.avg_user_impact
FROM sys.dm_db_missing_index_group_stats AS mg_stats WITH (NOLOCK)
INNER JOIN sys.dm_db_missing_index_groups AS mg WITH (NOLOCK)
ON mg_stats.group_handle = mg.index_group_handle
INNER JOIN sys.dm_db_missing_index_details AS mid WITH (NOLOCK)
ON mg.index_handle = mid.index_handle
ORDER BY advantage DESC;

ORDER BY LAST_USER_SEEK:
SELECT user_seeks * avg_total_user_cost * (avg_user_impact * 0.01) AS advantage,
mg_stats.last_user_seek, mid.statement AS 'TableName',
mid.equality_columns, mid.inequality_columns, mid.included_columns,
mg_stats.unique_compiles, mg_stats.user_seeks, mg_stats.avg_total_user_cost,
mg_stats.avg_user_impact
FROM sys.dm_db_missing_index_group_stats AS mg_stats WITH (NOLOCK)
INNER JOIN sys.dm_db_missing_index_groups AS mg WITH (NOLOCK)
ON mg_stats.group_handle = mg.index_group_handle
INNER JOIN sys.dm_db_missing_index_details AS mid WITH (NOLOCK)
ON mg.index_handle = mid.index_handle
ORDER BY mg_stats.last_user_seek DESC;

SUMMARY: After running these queries if I find that a common table is coming up with missing indexes, then I look at that table in more detail.
Your aim should be to improve system's overall productivity. So you need to find a balance to provide a system with selects that run faster and updates that still perform well. I would also like to add that in an OLTP system you should use ONLINE index operations to avoid locking and blocking issues and MAXDOP option to prevent CPU bottleneck during index Builds. Please share your experiences as comments.

Beauty of Merge and Output
21 August 10 12:08 AM | sarveshsingh | with no comments
Merge: Merge was introduced in SQL Server 2008. Users could modify data, applying different actions (INSERT, UPDATE, and DELETE) by a single merge statement based on conditional logic.

Demo

create table cricketTeam
(
Orderid int NOT NULL,
Batsman varchar(25) NOT NULL,
Bowler varchar(25) NOT NULL,
CONSTRAINT PK_cricketTeam PRIMARY KEY(Orderid)
);

insert into cricketTeam (Orderid,Batsman,Bowler)
values
(1,'sehwag','zaheer'),
(2,'sachin','bhaji'),
(3,'raina','praveen'),
(4,'yuvraj','ashish'),
(5,'dhoni','jadeja');

Lets look at the result of the cricketTeam
Select * from cricketTeam
Result:
orderid Batsman Bowler
1 sehwag zaheer
2 sachin bhaji
3 raina praveen
4 yuvraj ashish
5 dhoni jadeja

create table cricketTeam2
(
Orderid int NOT NULL,
Batsman varchar(25) NOT NULL,
Bowler varchar(25) NOT NULL,
CONSTRAINT PK_cricketTeam2 PRIMARY KEY(Orderid)
);

insert into cricketTeam2(Orderid,Batsman,Bowler)
values
(2,'cook','swan'),
(3,'raina','praveen'),
(5,'peterson','broad'),
(6,'morgan','Finn'),
(7,'collingwood','panessar');

Lets look at the result of cricketTeam2
Select * from cricketTeam2
Result:
Orderid Batsman Bowler
2 cook swan
3 raina praveen
5 peterson broad
6 morgan Finn
7 collingwood panessar

iLet's make it interesting. Let’s add orderid which is not in cricketTeam and update the existing ones.

Merge into cricketTeam as target
USING cricketTeam2 as source
on target.Orderid=source.Orderid
When MATCHED AND
( target.Batsman<>source.Batsman
or target.Bowler<>source.Bowler
) THEN
UPDATE SET
target.Batsman=source.Batsman,
target.Bowler=source.Bowler
When Not Matched Then
INSERT (orderid,Batsman,Bowler)
values(source.orderid, source. Batsman, source.Bowler);

let’s look at the result of the cricket team after running the merge
select * from cricketTeam
Result:
Orderid Batsman Bowler
1 sehwag zaheer
2 cook swan
3 raina praveen
4 yuvraj ashish
5 peterson broad
6 morgan Finn
7 collingwood panessar

Merge with Output: To identify which DML action produced the output,$action function can be used in the OUTPUT clause, and it will return a string representing the action (‘INSERT’,’UPDATE’,’DELETE’)

Demo:
Merge into cricketTeam as target
USING cricketTeam2 as source
on target.Orderid=source.Orderid
When MATCHED AND
(
target.Batsman<>source.Batsman
or target.Bowler<>source.Bowler
) THEN
UPDATE SET
target.Batsman=source.Batsman,
target.Bowler=source.Bowler
When Not Matched Then
INSERT (orderid,Batsman,Bowler)
values(source.orderid, source. Batsman, source.Bowler)
OUTPUT $action, inserted.orderid,
deleted.Batsman as oldBatsman,
inserted.Batsman as newBatsman,
deleted.Bowler as oldBowler,
inserted.Bowler as newBowler;

Let’s look at the result of merge with Output

Result:
$action orderid oldBatsman newBatsman oldBowler newBowler
UPDATE 2 sachin cook bhaji swan
UPDATE 5 dhoni peterson jadeja broad
INSERT 6 NULL morgan NULL Finn
INSERT 7 NULL collingwood NULL panessar

The Merge statement uses the Output clause to return the old and new values of the modified rows. With insert there were no old values so all references to deleted returned NULL. The $action whether insert or update produced the output.

This is brilliant, you know exactly what were the changes, all in one query. You can find more information about MERGE and OUTPUT in Books online.
http://msdn.microsoft.com/en-us/library/ms177564.aspx
http://msdn.microsoft.com/en-us/library/bb510625.aspx

Filed under: ,
NULL ISSUE in a SUBQUERY
19 August 10 10:16 PM | sarveshsingh | with no comments
I had an interesting training session with one of the customers dealing with Nulls in queries. Below is the demo of the issue when NULLS are involved.

create tables

CREATE TABLE [Cust]( [custid] [int] IDENTITY(1,1) NOT NULL, [companyname] [nvarchar](40) NOT NULL, CONSTRAINT [PK_Customers] PRIMARY KEY CLUSTERED ( [custid] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY]

CREATE TABLE [Orders]( [orderid] [int] IDENTITY(1,1) NOT NULL, [custid] [int] NULL, CONSTRAINT [PK_Orders] PRIMARY KEY CLUSTERED ( [orderid] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO

ALTER TABLE [Orders] WITH CHECK ADD CONSTRAINT [FK_Orders_Customers] FOREIGN KEY([custid]) REFERENCES [Cust] ([custid]) GO

INSERT INTO [Cust] ([companyname]) VALUES ('boots'), ('starbucks'), ('Tesco') GO

INSERT INTO Ord ([custid]) VALUES (1), (2) GO

select custid,companyname from Cust as c where custid not in (select o.custid from Ord as o)

Result:

3 Tesco

INSERT INTO Ord ([custid]) VALUES (Null) GO

select custid,companyname from Cust as c where custid not in (select o.custid from Ord as o)

Returns empty set.

Let’s see why this happens:

As a more real example, consider 3 NOT IN (1,2, NULL). This is same as NOT 3 IN (1, 2, NULL). This can be expanded as NOT (3=1 or 3=2 or 3=NULL). If you evaluate this you get NOT (FALSE or FALSE or UNKNOWN) which translates to UNKNOWN. Negating the UNKNOWN with the NOT operator still yields UNKNOWN.

To resolve this issue you will have to include not NULL as shown below:

select custid,companyname from Cust as c where custid not in (select o.custid from Ord as o where o.custid is not null)

You can use NOT EXISTS as well in the above scenario instead of NOT IN a shown below:

select custid, companyname from Cust as c where not exists (select * from Ord as o where o.custid=c.custid)

This is because EXISTS uses two-valued predicate logic. It always returns true or false and never UNKNOWN.

One more happy customer after my training :-)

Filed under: , , ,
Certification
19 August 10 03:05 PM | sarveshsingh | 2 comment(s)
In order to succeed in today's competitive job market, whether you are looking for new job or you want to advance in your current position, you need to be able to build and demonstrate your technology expertise and skills. This is where Microsoft certification plays such a major role. It helps you increase your skills at the same time gain the industry recognized qualification.

I recently passed MCITP 70-450 exam. I thought I’ll share the materials that I followed and the time I spent studying which might be useful to someone. I started preparing for 70-432 exam (a pre-requisite for 70-450) in January. It took me about 4 weeks to prepare. Number of hours varied from day to day as I used to study before starting work (coming in early), lunch time and after work. I just found it impossible to study at home because of family commitments, especially when my two year old wants all my attention. I passed this exam with a score of 929.

I started preparing for 70-450 in July. It took me about 3 weeks to prepare for this exam. I must admit that I got the much required motivation from the guys at coeo (www.coeo.com), which helped me to remain focussed. I would say I spent 2-3 hours every day preparing for this exam. I passed this with a score of 911.

The books/materials I followed: 1. 70-432 MCTS Self Paced Training. 2. Microsoft SQL Server 2008 Administrator's pocket Consultant 3. MVP-Deep-Dives 4. Self test software 5. Books online 6. Create the environment in the laptop and practice.

So what’s next? I'm preparing for the 70-433 exam now and hoping to complete the MCITP: Database Developer 2008 track by early next year ofcourse time permitting...

I hope you will find the information in this blog useful.

Filed under: , ,
My First Post : Joining coeo
19 August 10 09:27 AM | sarveshsingh | with no comments
Hello, My name is Sarvesh Singh and have been a technical consultant for the last three and a half years for CDC software providing application and SQL support for some of the big names in Food Manufacturing industry. (boots, Starbucks,Heinz to name a few). I am now leaving CDC to join coeo (www.coeo.com) as a DBA. I am really excited about joining Coeo and i am looking forward to the new challenge. I've finally got around setting up a blog and i'll be posting my experiences, challenges and knowledge that i have gained and will gain working with the SQL Server experts.
Filed under: