Is it important to declare your columns as NOT NULL - SimonS Blog on SQL Server Stuff

Is it important to declare your columns as NOT NULL

Yes Yes Yes.

Nullable columns cause extra data to be stored in your table or index. This extra data is to identify if a column value is NULL. The extra data is called a NULL bitmap. Unlike the data page of a table where there is always a null bitmap, a null bitmap only exists in an index if there are nullable columns in the keys or include columns. The size of the null bitmap is the integer portion of  2 + (number of columns + 7 / 8), i.e. 2 bytes plus 1 byte for each 8 columns, note number of columns not the number of nullable columns.

If you are indexing nullable columns your index contains this null bitmap. What that means is that for an index with 1 column the storage overhead could be an extra 33%. This will translate into extra IO in your queries and we all know that IO is bad.

On a similar note whether you have a clustered index or just a heap will affect the size of the index due to the way the index points to the data in the table. This gets even worse if you have a clustered index that isn't unique, because a uniqueifier value has to be stored to make the value unique (a clustered index has to uniquely identify the row in a table, if the key value isn't unique itself SQL has to store an extra value so the combined values are unique)

The following shows you the difference in index sizes between different combinations of not null and null columns with heaps and clustered tables.

I am using the sort of documented commands DBCC PAGE and DBCC IND to get access to this information. This procedure looks for the first page in the index and extracts the length of the rows and the types of data stored in the row.

Following the procedure is a set of tables and the results for the tables. I have limited the tests to integer keys.

You will see that if you have a not null column on a clustered table then your index could be less than half the size of an index on nullable column with a non unique clustered index.

Table Index rowtype Rows Average Size
[nulltable] ix_nulltable  NULL_BITMAP 4 12
[notnulltable] ix_notnulltable 4 9
[nullheap] ix_nullheap  NULL_BITMAP 5 16
[notnullheap] ix_notnullheap 4 13
[nullnonuniquetable] ix_nullnonuniquetable  NULL_BITMAP 1 12
[nullnonuniquetable] ix_nullnonuniquetable  NULL_BITMAP VARIABLE_COLUMNS 3 20
[notnullnonuniquetable] ix_notnullnonuniquetable 1 9
[notnullnonuniquetable] ix_notnullnonuniquetable  VARIABLE_COLUMNS 3 17
[notnullnonuniquetableid] ix_notnullnonuniquetableid 1 9
[notnullnonuniquetableid] ix_notnullnonuniquetableid  VARIABLE_COLUMNS 3 17

 

drop procedure up_GetIndexRowSize

go

create procedure up_GetIndexRowSize (@db sysname, @table sysname, @index sysname)

as

    begin

    if object_id('tempdb..#pageList') is  null

        create table #pageList(

                            PageFID  tinyint,

                              PagePID int,  

                              IAMFID   tinyint,

                              IAMPID  int,

                              ObjectID  int,

                              IndexID  tinyint,

                              PartitionNumber tinyint,

                              PartitionID bigint,

                              iam_chain_type  varchar(30),   

                              PageType  tinyint,

                              IndexLevel  tinyint,

                              NextPageFID  tinyint,

                              NextPagePID  int,

                              PrevPageFID  tinyint,

                              PrevPagePID int,

                              Primary Key (PageFID, PagePID));

    if object_id('tempdb..#pageDetails') is  null

        create table #pageDetails(ParentObject varchar(100)

                                , Object       varchar(100)

                                , Field        varchar(100)

                                , Value        varchar(100))

    begin

    declare @indid varchar(10)

        set @indid = cast((select index_id from sys.indexes where object_id=object_id(@table) and name = @index)as varchar(10))

        set @table = quotename(@table, '[')

        set @db    = quotename(@db , '[')

        insert into #pageList exec ('dbcc ind(' + @db + ',' + @table + ',' + @indid + ')')

        declare @pageid varchar(10)

        set @pageid = cast((select PagePID from #pageList where PrevPagePID = 0 and IndexLevel = 0) as varchar(10))

        insert into #pageDetails

        exec ('dbcc page(' + @db + ',1,' + @pageId + ',1) with tableresults')

        select    @table, @index, rowtype, count(1) Rows, avg(cast(Length as int))

          from (select substring(Object ,charindex(' ', Object),charindex(',', Object) - charindex(' ', Object)) row

                     , substring(Object ,charindex(' ', Object, charindex('Length',Object)) ,charindex(',', Object,charindex('Length',Object)) - charindex(' ', Object, charindex('Length',Object))) length

                     , value rowtype

                from #pageDetails

                where Field = 'Record Attributes') data

        group by rowtype

    end

end

go

drop table nulltable

drop table notnulltable

drop table nullheap

drop table notnullheap

drop table nullnonuniquetable

drop table notnullnonuniquetable

drop table notnullnonuniquetableid

go

-- NULL TABLE

create table nulltable (id int identity(1,1) primary key, col int null)

go

create index ix_nulltable on nulltable(col)

go

-- NOT NULL TABLE

create table notnulltable (id int identity(1,1) primary key, col int not null)

go

create index ix_notnulltable on notnulltable(col)

go

-- NOT NULL HEAP

create table notnullheap (id int identity(1,1) , col int not null)

go

create index ix_notnullheap on notnullheap(col)

go

-- NULL HEAP

create table nullheap (id int identity(1,1) , col int null)

go

create index ix_nullheap on nullheap(col)

go

-- NULL NON UNIQUE TABLE

create table nullnonuniquetable (id int , col int null)

go

create clustered index icx_nullnonuniquetable on nullnonuniquetable(id)

go

create index ix_nullnonuniquetable on nullnonuniquetable(col)

go

-- NOT NULL UNIQUE TABLE

create table notnullnonuniquetable (id int not null , col int not null)

go

create clustered index icx_notnullnonuniquetable on notnullnonuniquetable (id)

go

create index ix_notnullnonuniquetable  on notnullnonuniquetable (col)

go

-- NOT NULL UNIQUE TABLE WITH IDENTITY

create table notnullnonuniquetableid (id int identity(1,1) , col int not null)

go

create clustered index icx_notnullnonuniquetableid on notnullnonuniquetableid(id)

go

create index ix_notnullnonuniquetableid on notnullnonuniquetableid (col)

go

insert into nulltable (col) values (1)

insert into nulltable (col) values (2)

insert into nulltable (col) values (2)

insert into nulltable (col) values (2)

go

insert into notnulltable (col) values (1)

insert into notnulltable (col) values (2)

insert into notnulltable (col) values (2)

insert into notnulltable (col) values (2)

go

insert into notnullheap (col) values (1)

insert into notnullheap (col) values (2)

insert into notnullheap (col) values (2)

insert into notnullheap (col) values (2)

go

insert into nullheap (col) values (1)

insert into nullheap (col) values (2)

insert into nullheap (col) values (2)

insert into nullheap (col) values (2)

insert into nullheap (col) values (null)

go

insert into nullnonuniquetable (id, col) values (1,1)

insert into nullnonuniquetable (id, col) values (1,2)

insert into nullnonuniquetable (id, col) values (1,2)

insert into nullnonuniquetable (id, col) values (1,2)

go

insert into notnullnonuniquetable (id, col) values (1,1)

insert into notnullnonuniquetable (id, col) values (1,2)

insert into notnullnonuniquetable (id, col) values (1,2)

insert into notnullnonuniquetable (id, col) values (1,2)

go

set identity_insert notnullnonuniquetableid  on

insert into notnullnonuniquetableid (id, col) values (1,1)

insert into notnullnonuniquetableid (id, col) values (1,2)

insert into notnullnonuniquetableid (id, col) values (1,2)

insert into notnullnonuniquetableid (id, col) values (1,2)

set identity_insert notnullnonuniquetableid  off

exec up_GetIndexRowSize 'tempdb','nulltable','ix_nulltable'

exec up_GetIndexRowSize 'tempdb','notnulltable','ix_notnulltable'

exec up_GetIndexRowSize 'tempdb','nullheap','ix_nullheap'

exec up_GetIndexRowSize 'tempdb','notnullheap','ix_notnullheap'

exec up_GetIndexRowSize 'tempdb','nullnonuniquetable','ix_nullnonuniquetable'

exec up_GetIndexRowSize 'tempdb','notnullnonuniquetable','ix_notnullnonuniquetable'

exec up_GetIndexRowSize 'tempdb','notnullnonuniquetableid','ix_notnullnonuniquetableid'

[Update 6 March 2008 : following a kind review from a K Delaney I've corrected and clarified the first paragraph. Thanks Kalen)



-
Published 02 March 2008 20:42 by simonsabin

Comments

03 March 2008 06:45 by GBN

# re: Is it important to declare your columns as NOT NULL

What about the performance increase using the NULL bitmap?

Or the overhead or storing a (variable length) empty string instead of NULL?

03 March 2008 11:08 by GrumpyOldDBA

# re: Is it important to declare your columns as NOT NULL

I blogged this last year as part of my analysing indexes see http://www.grumpyolddba.co.uk/sql2005/working%20with%20indexes%204.mht  for an entire post rather than the parts I had to paste into the blog. It's substantial when you get into 400 million row tables .

11 March 2008 02:54 by Chinh Do

# re: Is it important to declare your columns as NOT NULL

Thanks for the analysis. This is good data to keep in mind when considering whether to make a db column nullable.

12 March 2008 17:06 by Ralph Wilson

# re: Is it important to declare your columns as NOT NULL

So, what do you propose for indicating that data is simply not available for a column (e.g. second address line, secondary email address, number of dependents, ).

Setting a default value may or may not make sense.  For instance, the number of dependents for someone who is not married may or may not be 0 but it may also have not been provided . . . so, setting the value to a default of 0 could result in the correct value _never_ being entered because Single/Dependents=0 is a reasonable result.  However, if one gets Single/Dependents=NULL then the results would indicate that the information has never been provided and someone needs to follow up on it.

Similarly, there is a difference between having a NULL value in a column named InsuredancePolicyNo and having some default value in it . . . especially if the column may have between 9 and 35 characters due to variances in the number of characters in a Policy Number.  (String the NULL could be more efficient than stuffing some arbitrary number of characters that meet the minimum required for a policy number.)

12 March 2008 18:01 by simonsabin

# re: Is it important to declare your columns as NOT NULL

This was more a point of don't be lazy and not other specifying not null. If the domain of your data requires NULL then fine but if it doesn't then make sure you set the column as NOT NULL.

Most databases the default is NULL if nothing is specified thus the reason for many columns incorrectly being set to NULL when they should be NOT NULL