17 August 2006 11:29 tonyrogerson

SQL Server write behaviour with multiple files in a file group

You will have no doubt seen or heard branded about that its good to use multiple files for a file group but you’ve always wondered why?

 

The current recommendation from Microsoft is that you have a file per logical CPU for tempdb so as to balance load not only for object creation but for extent allocation and performance.

 

But what does SQL Server do when you use multiple files? SQL Server round robins the rows being inserted so the rows are distributed across the multiple files.

 

Consider this first database, notice the files have the same initial size and autogrowth-

 

USE [master]

GO

CREATE DATABASE [testfile2] ON  PRIMARY

( NAME = N'testfile2', FILENAME = N'E:\testfile2.mdf' , SIZE = 5MB , MAXSIZE = UNLIMITED, FILEGROWTH = 1024KB ),

 FILEGROUP [FileTest]

( NAME = N'testfile21', FILENAME = N'E:\testfile21.ndf' , SIZE = 1MB , MAXSIZE = UNLIMITED, FILEGROWTH = 1024KB ),

( NAME = N'testfile22', FILENAME = N'E:\testfile22.ndf' , SIZE = 1MB , MAXSIZE = UNLIMITED, FILEGROWTH = 1024KB )

 LOG ON

( NAME = N'testfile2_log', FILENAME = N'E:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\testfile2_log.LDF' , SIZE = 18560KB , MAXSIZE = 2048GB , FILEGROWTH = 10%)

 COLLATE Latin1_General_CI_AS

GO

EXEC dbo.sp_dbcmptlevel @dbname=N'testfile2', @new_cmptlevel=90

go

 

ALTER DATABASE testfile2

   MODIFY FILEGROUP FileTest DEFAULT;

 

We can now do a test to see how the file grows when we do an INSERT….

 

select * from ::fn_virtualfilestats(db_id(), NULL) where FileId > 2

go

select *

into SalesOrderDetail

from AdventureWorks.Sales.SalesOrderDetail

go

select * from ::fn_virtualfilestats(db_id(), NULL) where FileId > 2

go

 

I am using ::fn_virtualfilestats( <dbid>, <fileid> ) to capture the statistics against the file, thanks to Linchi Shea for pointing the best way to do that out.

 

On my test box measurements where

 

Before population

fileId            BytesOnDisk

3                 1048576      

4                 1048576      

After population

fileId            BytesOnDisk

3                 6291456

4                 6291456

 

We can see by the autogrowth that SQL Server has round robin’d between the two files equally because they had an equal amount of freespace in them.

 

Lets now look at a situation where the two files have different amounts of freespace, a situation that can result from a) autogrowth different, b) file added to the file group at different times.

 

Drop the original database and run this…

 

USE [master]

GO

CREATE DATABASE [testfile2] ON  PRIMARY

( NAME = N'testfile2', FILENAME = N'E:\testfile2.mdf' , SIZE = 5MB , MAXSIZE = UNLIMITED, FILEGROWTH = 1024KB ),

 FILEGROUP [FileTest]

( NAME = N'testfile21', FILENAME = N'E:\testfile21.ndf' , SIZE = 10MB , MAXSIZE = UNLIMITED, FILEGROWTH = 1024KB ),

( NAME = N'testfile22', FILENAME = N'E:\testfile22.ndf' , SIZE = 1MB , MAXSIZE = UNLIMITED, FILEGROWTH = 1024KB )

 LOG ON

( NAME = N'testfile2_log', FILENAME = N'E:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\testfile2_log.LDF' , SIZE = 18560KB , MAXSIZE = 2048GB , FILEGROWTH = 10%)

 COLLATE Latin1_General_CI_AS

GO

EXEC dbo.sp_dbcmptlevel @dbname=N'testfile2', @new_cmptlevel=90

go

 

ALTER DATABASE testfile2

   MODIFY FILEGROUP FileTest DEFAULT;

 

Notice we are still creating 2 files, but this time one starts with 10MB free and one with 1MB free – autogrowths are the same.

 

Re-running the select and insert gives this result…

 

Before population

fileId            BytesOnDisk

3                 1048576      

4                 10485760    

 

After population

fileId            BytesOnDisk

3                 3145728

4                 13631488

 

More data has been written to FileId 4 which is the one with more freespace.

 

So, the moral of the story – if you are going to use multiple files then be aware of how the round robin insert stuff works!

 

One last thing, another benefit of using multiple files is that each file has its own NT thread so its good when you have multiple logical CPU’s which is pretty much the norm now-a-days.

 

There are lots of other things I want to show – effects of shrinking, defraging etc… that will need to be another entry!

 

One last thing, the unit is rows, a single row can't span files - I think its done at the page level but I'm not sure - if anybody knows for sure perhaps they'd comment, otherwise I'll get back in the not too distant future with the answer....

 

 

Filed under:

Comments

# re: SQL Server write behaviour with multiple files in a file group

17 August 2006 13:16 by LearnSqlServer.com

Awesome post, Tony. I love reading your blog.

I have another argument for splitting your database into multiple files (once it crosses the 40-60GB barrier) that I thought I'd post. If your production servers have 300GB worth of storage but your development servers have only 100GB, then it matters how you create the data file(s).

If I make my 60GB database have only one data file, when I restore that file, I have to put it onto a single drive. The problem is that I have no single drive on the dev server to put it on (let's say we have three drives of 36GB ea.). Spltting the database into 2 or 3 data files makes it possible for me to restore my database on any server of just about any configuration.

You mentioned the advantages of splitting tempdb into multiple files. I'd love to see you explore that more :)

# re: SQL Server write behaviour with multiple files in a file group

17 August 2006 13:32 by tonyrogerson

Thanks - the kind feedback is appreciated!!

I've been trying to do the tempdb thing but the difficulty is proving it makes a difference; I suppose measuring the transactions/sec throughput on tempdb could do it; I'm seeing blocking which is one of the things multiple files is suppose to help reduce - multiple files have more IAM's so there is less contention when creating objects on mixed extents.... I suppose I'll have a play this afternoon....

# re: SQL Server write behaviour with multiple files in a file group

21 August 2006 12:07 by Olu Adedeji

Excellent blog Tony,

just wanted to say that the real benefit of the Tempdb multiple equally sized files thing can only be attained with the trace flag 1118 as this allows multiple PFS to be created for multiple concurrent searches in order to create IAM and objects on mixed/shared extents. I have seen the symptoms wait resource 2:1:3 or 2:1:1 in sysprocesses due to the contention issues during the search for free space in mixed extents

# re: SQL Server write behaviour with multiple files in a file group

21 August 2006 14:59 by LearnSqlServer.com

For anyone reading this later on: Olu is right about trace flag 1118 helping SQL Server 2000 but trace flag 1118 is really not needed for SQL 2005.

Here's a good article describing -T1118 and SQL 2005: http://www.scalabilityexperts.com/default.asp?action=article&ID=195

Scott Whigham (LearnSqlServer.com)

# Put TEMPDB on a RAM DRIVE; Short lived # tables - do they get written to disk?

24 August 2006 11:30 by Tony Rogerson's ramblings

Grab your attention? This entry came out of trying to reproduce PFS / GAM / SGAM contention and trying...

# re: SQL Server write behaviour with multiple files in a file group

13 September 2006 12:04 by masri999

Is this round robin applicable only for tempdb or other databases also .
Another Question:
Having a single physical disk with logical partitions  D,E, F etc ,
is it better to place Nonclustered Indexes in a  different file group and clustered indexes in primary file group.
Can you provide information how to place indexes (clustered and non clustered ) on different file groups ( or physical disks ) for performance

Srinivas

# SQL 2K5 &#8211; Multiple databases vs. Multiple files Drija

Pingback from  SQL 2K5 &#8211; Multiple databases vs. Multiple files Drija

# SQL 2K5 &#8211; Multiple databases vs. Multiple files - Admins Goodies

Pingback from  SQL 2K5 &#8211; Multiple databases vs. Multiple files - Admins Goodies

# SQL 2K5 &#8211; Multiple databases vs. Multiple files - Admins Goodies

Pingback from  SQL 2K5 &#8211; Multiple databases vs. Multiple files - Admins Goodies

# How can have SQL Server tail a log file and write new rows to a table? | Q Sites

Pingback from  How can have SQL Server tail a log file and write new rows to a table? | Q Sites