This is Columnstore – Part 1
This weekend I finally spent some time getting up to speed on the new xVelocity Columnstore index feature in SQL Server 2012 Enterprise Edition.
This is part one of three blog articles about my initial experiences of them, but more interestingly, a quick look under the hood at how they work.
What are columnstore indexes?
In summary, the Columnstore indexes feature is both a type of non-clustered index AND a new query processing mode built into the SQL Server 2012 database engine. Their aim is to massively increase (approx. 10x) the performance of the types of queries used within a data warehouse, or in simpler terms – querying very large tables as quickly as possible.
There are many explanations of what columnstore indexes actually are, how they’re different to row-based indexes, and how SQL Server’s implementation works so I’ll point you to perhaps the best links for these topics:
- How do Column Stores work? - Thomas Kejser here
- Inside SQL Server’s implementation – Conor Cunningham here
How would I use them?
Hopefully, the two links above will give you an idea as to what the feature was designed to do and the circumstances in which you might use them. The next section focuses on how you would use them and for this series of blog articles I’m going to use the two simple tables shown below:
Having created and populated the tables, I used random number based values to create 35 million rows in the fact table for 10 products, I then created regular clustered indexes on them:
create table tblFact (rowID bigint identity(1,1), custID int, prodID int, Sale money);
create table DimProd (prodID int,prodName varchar(200));
create unique clustered index idx_cls_tblFact on tblFact(rowID);
create unique clustered index idx_cls_Prod on DimProd(prodID);
insert into DimProd (prodID, prodName)
values (1, 'Car'), (2, 'Van'), (3, 'Truck'),
(4, 'Taxi'), (5, 'Horse'), (6, 'Helicopter'),
(7, 'BizJet'),(8, 'JumboJet'),(9, 'Ferry'),
insert into tblFact
(custID, prodID, Sale)
(select convert(int,rand()*10), convert(int,rand()*100), convert(money,rand()*10000))
insert into tblFact (custID, prodID, Sale)
(select top 500000 custID, prodID, Sale from tblFact)
This will give you 30 million rows for about 1GB of disk space.
I then ran a simple query against the table that’s designed to simulate a simple star schema and the type of query a data warehouse might use:
from tblFact, DimProd
where tblFact.prodID = DimProd.prodID
group by DimProd.prodName
I could have added supporting non-clustered indexes, but traditional index design isn’t the point of this article, but with the 35 million rows I had in my fact table the query takes 4 seconds to run.
Adding a columnstore index
The next step is to actually add a columnstore index, this is quite a big step as:
- You can only have 1 columnstore index per table
- Creating the columnstore index makes the table read-only
The following command adds a columnstore index:
create columnstore index idx_cs_tblFact on tblFact(custID, prodID, Sale)
You can tell from the time it takes to create, on my SSD drive about 50 seconds, that there’s a lot of work going on to create it; for comparison, creating the equivalent regular non-clustered index takes about half the time.
If I then execute the query used above again, the query optimizer will use the columnstore index AND the new batch query processing mode that Conor mentioned, and the query will now execute in less than a second.
Although the performance improvements in our example might not seem great, 4 seconds down to 1 second, in further tests I’ve increased the fact table from 30 million to 90 million rows, yet didn’t seen a 3x increase in query runtime.
Finally, for this article, you can begin to see how columnstore indexes are both different and similar to normal indexes by looking at their properties in SSMS and sys.indexes with this query:
select * from sys.indexes where type = 6
Books OnLine lists Index type 6 books as a “Nonclustered columnstore index”, while Index type 5 interestingly is “Clustered xVelocity memory optimized columnstore index (Reserved for future use.)” This suggests there might be quite a lot more than just updatable columnstore indexes on the roadmap.
First and next articles
This first article was a quick introduction to creating and using a columnstore index, however, it should be enough to allow you to create your own demo environment and begin writing queries to see the performance benefits for yourself.
In the next two articles I’ll show you how columnstore indexes are physically stored within the database and when the query optimizer does and doesn’t decide to use them.
Of course, this is all just my first look at columnstore indexes, if you know something I’ve not mentioned, please feel free to comment on the post!