SSAS Partition Manager for both SSAS Tabular Models and SSAS Multidimensional Cubes

SsasPartitionManagerLogoToday I am proud to announce the release of SSAS Partition Manager that can dynamically create partitions in both SSAS Tabular models and SSAS Multidimensional cubes using a wide variety of partitioning schemes (yearly, monthly, weekly etc.).  

What is more, for OLAP cubes it correctly sets the MDX in the PartitionSlice property of the partition so you don't have to rely on SQL Server Analysis Services' auto-slice functionality which has known bugs and limitations.

SSAS Partition Manager is a production ready application which is very easy to configure and can be easily integrated into your deployment scripts and regular data warehouse processing jobs. Indeed, a full set of unit and integration tests has been prepared for the SSAS Partition Manager to ensure the quality of the code.

Deploy and Partition Your Tabular Model or Multidimensional Cube

SSAS Partition Manager provides the ability to both deploy a database (either Tabular or Multidimensional), add the partitions and process the newly deployed database all with one simple command.

Using command-line options you can:

  • Deploy an OLAP cube or Tabular model
  • Create partitions using a wide variety of partitioning schemes
  • Process the newly deployed OLAP or Tabular database using ProcessFull
  • Create partitions in all databases on the server and then process them all using ProcessFull
See SSAS Partition Manager Command-Line Options for further information.

Supports multiple Partitioning Schemes

Each measure group can have its own partitioning scheme and out of the box it supports the following partitioning schemes.
  • Yearly
  • Quarterly
  • Monthly
  • Weekly
  • Daily
  • Hybrid
  • Matrix
Hybrid partitioning schemes are where some years use one scheme and other years use another. For example, previous years could be partitioned by quarter whereas the current year is partitioned by day. Matrix partitioning schemes are where you partition by date and some other dimension.

Easy to configure

The SSAS Partition Manager is an entirely generic piece of C# code which will work in any situation. With a tiny amount of configuration, you can start using the SSAS Partition Manager with your multidimensional cube and/or tabular models very quickly. You should never need to modify the C# code. All configuration is done in T-SQL, so it is very much in the realm of the average business intelligence consultant. No specialist skills required!

The key to understanding the SSAS Partition Manager is to know that it creates partitions based on the content of a view in the SQL database called SsasPartitions which tells it what type of partitions to create and how many. When it connects to the cube or model, it expects to find a template partition which has its where clause set to WHERE 1=0 which it clones to form the new partition. That's it! The SSAS Partition Manager does the rest.

Program Logic

The SSAS Partition Manager starts by reading from its config file the connection string to the SQL Server database and a flag which indicates which server environment to connect to (i.e. DEV, UAT or PROD). It then connects to the SQL Server database and queries the SsasServer table for the name of the SSAS server using the server environment flag. Thus, DEV would connect to your localhost, UAT to your UAT server etc. etc.

The program then connects to the SSAS server using the Analysis Management Objects (AMO) library Microsoft.AnalysisServices.
The SSAS Partition Manager then queries the SsasPartitions view to find out what partitions are needed. This view lists all the partitions in all the measure groups in all the cubes/models in all the databases on the SSAS server. Thus the SSAS Partition Manager can handle a multi-cube/model, multi-database environment very easily.

The SSAS Partition Manager will add partitions to all cubes/models in all databases listed in the SsasPartitions view, so it will work out of the box even in multi-cube, multi-database environments. The SSAS Partition Manager will also delete partitions which are not listed in the SsasPartitions view with the exception of any template partition (i.e. a partition with the word template in its name). Thus you can implement a rolling-partition scheme whereby old data is removed from the cube automatically.

The program logic is comprehensively tested by a full set of unit and integration tests before each release to ensure the quality of the code, so you can rest assured it won't go wrong on your server.

Validation Phase

The SSAS Partition Manager has a validation phase which checks your configuration is correct. So for example, if you misspell the name of a measure group, a warning message will be written to the log. The validation covers every aspect of the configuration, including checking your template partition contains WHERE 1=0, so just check the log file to see if everything is correctly set up in your environment.

Database Objects to support the SSAS Partition Manager

The SSAS Partition Manager relies on the presence of a few key database objects. You can choose which database you put these in yourself. The main thing is that you customize the SsasDateDimWrapper view to work with your own date dimension or equivalent. All this is explained in Configuring the SSAS Partition Manager.

Replacement for OlapPartitionManager

SSAS Partition Manager replaces the OlapPartitionManager which I discussed in previous posts as it is more generic and caters for both Tabular and Multidimensional databases.

So read the documentation and go to the download page and start using it today!

SSAS: Automating OLAP cube deployment, dynamic partitioning and processing

Imagine been able to deploy your OLAP database, dynamically partition the cube and then process the new database all in one step from an .asDatabase file with minimal configuration.  Well now you can!  Today’s release of the OlapPartitionManager provides a set of features which allow you to do just that.  What is more, for the first time you can easily integrate the build and deployment of OLAP databases into your continuous integration tasks so you can write regression tests that check the figures in your cube.

OLAP cube deployment is usually done using the Analysis Services Deployment Wizard. However, the wizard only deploys the partitions that were part of the original cube design; it will not dynamically add the new partitions required by your partitioning scheme.

The OlapPartitionManager

I introduced the OlapPartitionManager in my previous post: Dynamic Partition Creation in SSAS multidimensional using the OlapPartitionManager generic C# AMO app.  The OlapPartitionManager allows each measure group to have its own partitioning scheme and out of the box it supports the following partitioning schemes.

  • Yearly
  • Quarterly
  • Monthly
  • Weekly
  • Daily
  • Hybrid
  • Matrix

Hybrid partitioning schemes are where some years use one scheme and other years use another. For example, previous years could be partitioned by quarter whereas the current year is partitioned by day. Matrix partitioning schemes are where you partition by date and some other dimension.  Note that the last two options require some configuration on your part which is all explained on Configuring the OlapPartitionManager.

Easy to configure

All of this power is easy to configure by changing two SQL views and the content of three tables - well within the realms of the average business intelligence consultant!

Deploying your cube

Today’s release takes the project a step further by allowing you to deploy a fresh copy of your OLAP database, partition all of the cubes and then process the database using one simple command-line such as:

OlapPartitionManager.exe /d AdventureWorks /a "C:\Dev\Samples\AdventureWorks.asdatabase" /p ProcessDeployed

The above command-line will deploy a cube called AdventureWorks using the AdventureWorks.asdatabase OLAP database definition file onto the SSAS server listed in the OlapServer table.  The OlapPartitionManager will then dynamically add all the partitions listed in the OlapPartitions view.  Finally it will process the new OLAP database.  That’s it!  Dead simple!  Exactly how to set everything up is all explained on Configuring the OlapPartitionManager.

There is no need for you to delve into the C# code behind the OlapPartitionManager as all the functionality is easily configurable as explained on Configuring the OlapPartitionManager.  However, if you do wish to delve a little deeper, you will notice that the new DeployDatabase method is simply a new implementation of some code I released as part of this blog post: SSIS: Deploying OLAP cubes using C# script tasks and AMO

Example command-lines

Deploy a new OLAP database and add partitions:

OlapPartitionManager.exe /d AdventureWorks /a "C:\Dev\Samples\AdventureWorks.asdatabase"

Deploy a new OLAP database, add partitions and process the newly deployed OLAP database:

OlapPartitionManager.exe /d AdventureWorks /a "C:\Dev\Samples\AdventureWorks.asdatabase" /p ProcessDeployed

Deploy a new OLAP database, override the standard connection string to the Data Mart, add partitions and process the newly deployed OLAP database:

OlapPartitionManager.exe /d AdventureWorks /a "C:\Dev\Samples\AdventureWorks.asdatabase" /c "Provider=SQLNCLI11.1;Data Source=localhost;Integrated Security=SSPI;Initial Catalog=AdventureWorksDW2014" /p ProcessDeployed

Perform a full process on all OLAP databases found in the OlapPartitions view:

OlapPartitionManager.exe /p ProcessAll

All of the arguments are explained on OlapPartitionManager Command-Line Options

So download your copy and start using the OlapPartitionManager today!

Posted by DrJohn with no comments

Dynamic Partition Creation in SSAS multidimensional using the OlapPartitionManager generic C# AMO app

There are plenty of blog entries out there that outline how to dynamically create partitions in SSAS multidimensional databases. However, all of them suffer the same drawback: they always hard-code some part of the logic into the programming language in the form of hard-coded names or hard-coded partition queries. None of them provide a generic solution that you could just pick up and apply to any project. Perhaps this is a reflection of the usual implementation language of choice: SSIS. Let's face it, SSIS is famous for been brittle; packages break when moved to a new environment. They are rarely reusable without modification. Generating SSIS packages from BIML has solved some of these problem, but even BIML does not handle C# script tasks well; it is rather like painting a room through a keyhole!

Like many business intelligence consultants, I take example scripts from project to project, each time extending their functionality a little more or adapting them to meet new requirements. For many years, I have used C# and AMO to create partitions and process cubes. Indeed, I have presented on this at SQLBits.  I do find the AMO object model extremely intuitive and C# much easier to code than doing the equivalent partition creation logic in an SSIS package.

After several iterations, I now have an entirely generic piece of C# code which will work in any situation.  The only configuration required is to change the content of a set of tables and edit two views so they generate the correct MDX member keys to match the date dimension in your cube. Fortunately, this is very simple and very much in the realm of the average business intelligence consultant as it is pure T-SQL.

Once configured the OlapPartitionManager will add partitions to all cubes in all databases listed in the OlapPartitions view, so it will work out of the box even in multi-cube, multi-database environments.  The application will also delete partitions that are no longer required, thus removing redundant data from the cube.

To illustrate, I have chosen to partition each measure group in the AdventureWorks sample cube by different criteria.  So the Internet Customers measure group is partitioned by year, the Internet Sales partition by quarter, Internet Orders and Reseller Orders partitioned by month, the Reseller Orders partitioned by week.  Finally, the Sales Order measure group is partitioned by a hybrid scheme whereby years 2014 onwards have monthly partitions and prior years are partitioned by quarter.  Of course, all this partitioning is completely over the top for the tiny amount of data held by the AdventureWorks sample cube, but does illustrate the power of the OlapPartitionManager.

All this is encapsulated in the OlapPartitionDefinitions view which allows you to review your configuration (shown below).

OlapPartitionDefinitions

Note that the PartitionStartDate is set to 2011-01-01 for all partition schemes, but this could easily be changed by editing the OlapMeasureGroup table.

The PartitionSliceIsRange column is a powerful new feature of my latest implementation and discussed below.

Partition Elimination

Correctly setting the partition slice property of each partition helps the storage engine eliminate the partitions that do not contain relevant data.  Referred to as partition elimination, this greatly increases the performance of MDX queries as not as much data needs to be read by the SSAS storage engine. This is one of the main reasons why cubes should be partitioned.  However, this does rely on the end-user using the dimension upon which the partitioning is based in their MDX queries.  Partitioning by the date dimension is the typical approach.  But be warned, cubes often contain several date dimensions all playing different roles (e.g. order date, delivery date, ship date etc.), so be careful to base your partitioning strategy on the right date dimension.  I have known situations where the cube designer inadvertently picked the wrong date dimension and so the storage engine could never employ partition elimination.  Big mistake!  Huge!

The problems with Auto-Slice

The auto-slice functionality in SSAS does not always get it right.  During processing, SSAS internally identifies the range of data that is contained in each partition by using the Min and Max DataIDs to set the partition slice.  Unfortunately, DataIDs are assigned during dimension processing as new members are encountered and we have no explicit control over how they are allocated, so because SSAS just looks at the minimum and maximum value of the DataID in the partition, you can end up reading partitions that don’t contain relevant data.  Thus it is always best practice to explicitly assign a value to the partition slice property of each partition.

Setting a valid Partition Slice

You can set the partition slice property of a partition to be a single MDX member, or a set of members.  However, MDX ranges operators (i.e. :) are not allowed.  So, for example, the following two statements work fine:

{[Date].[Calendar].[Month].&[2014]&[4]}

{[Date].[Calendar].[Date].&[20140101],[Date].[Calendar].[Date].&[20140102],[Date].[Calendar].[Date].&[20140103],[Date].[Calendar].[Date].&[20140104],[Date].[Calendar].[Date].&[20140105],-- etc. --,[Date].[Calendar].[Date].&[20140131]}

Whereas the following range statement will cause the partition processing to fail:

{[Date].[Calendar].[Date].&[20140101] : [Date].[Calendar].[Date].&[20140131]}

Of course the above MDX statements were generated from the Adventure Works cube date dimension.

If you get your partition slice wrong, you will get an error such as:  "Errors in the OLAP storage engine: The restrictions imposed on partition slice where violated".  Note that you will ONLY get such errors if the partition contains over 4096 rows which is the IndexBuildThreshold defined in msmdsrv.ini file.  In the Adventure Works database, partitions rarely get above 5000 rows, so the partition slice setting is ignored.  In reality, the Adventure Works database has such low data volumes there is no need to partition the cube, but then it is just a sample database!

Partition Definition Tables

For many years now, I have used a set of SQL tables to hold the definition of each partitioning scheme required in the cube.  A single view then combines the content of these tables with the date dimension to create a list of partitions that need to be created in the cube.  This partition definition view even generates the MDX required to set the partition slice property correctly.  Thus the OlapPartitionManager program is entirely generic; it simply connects to the SSAS database and compares the partitions it finds in each measure group with the list of desired partitions in the partition definition view: OlapPartitions

Instead of holding one row per partition in a static table, we get the partition definition view to combine the underlying partition definition tables to generate the list of partitions we need.  A CROSS JOIN to the date dimension simply fills in the gaps.  The view is entirely dynamic, giving you all the partitions you need for today.  Tomorrow the view will add on any new partitions as they are required by the partitioning scheme: monthly, quarterly, yearly.

OlapPartitions view

This view is read by the OlapPartitionManager program which compares the partitions in the cube with the ones listed by the view.  If the partition does not exist in the cube, then it is created.  Note that the C# program does not update existing partitions.  It simply creates new ones or deletes redundant partitions.

The clever thing about this view is not only how it uses date dimension to generate a list of all partitions up to the present day, but also uses it again to generate the list of keys for MDX ranges.  Note the content of the PartitionSlice column in the case of the Internet Orders measure group. This was generated because the PartitionSliceIsRange flag is set to True (1).

OlapPartitionsIsRange

Note that the view is entirely dynamic, giving you all the partitions you need for today.  Tomorrow the view will add on any new partitions as they are required by the partitioning scheme: monthly, quarterly, yearly.  This is because the OlapPartitions view only generates partitions between the PartitionStartDate and today's date.  Unfortunately the AdventureWorks date dimension table ends in 2014, so we never get any new partitions been created as we roll over to the new week, month, quarter or year.  Clearly this would not be the case in a true production cube!

Once the OlapPartitionManager has been run against the AdventureWorks cube, this is what you see in SSMS:

PartitionedMeasureGroupsInAdventureWorks

I provide the T-SQL for these database objects as a SQL Server Data Tools (SSDT) project called OlapPartitionDefinition which is part of the download for the OlapPartitionManager hosted on Codeplex.  Note that I include DimDate from the AdventureWorksDW2014 database just so my SSDT project compiles correctly.

The Populate_OLAP_Partition_Definitions.sql file populates the three partition definition tables with the partitioned measure groups from the AdventureWorks cube.  This can easily be adapted to your own cube(s).

Note that this table structure supports many measure groups in many cubes in many databases, so with one run of the OlapPartitionManager you can create partitions in hundreds of cubes, all with the correct partition slice set!

Configuring the OlapPartitionManager to work with your project

To get yourself up and running with the OlapPartitionManager this is all you have to do:

  1. Edit the OlapPartitonManager.exe.Config file to change the database connection string and identify the OlapServerType (i.e. DEV, UAT or PROD).
  2. Change the OlapDateDimWrapper view to work with your own date dimension table or equivalent
  3. Change the OlapPartitionPeriods view so that it contains the correct MDX members for your date dimension. You can also add your own partitioning schemes if you need something special.
  4. Change your OLAP cube project to have template partitions with the where clause set to WHERE 1=0
  5. Finally you may need to change the OlapPartitions view if you have some special requirements for the way the PartitionWhereClause is generated

All this is explained in detail on the following page: Configuring the OlapPartitonManager

Full validation logging

The first step taken by the OlapPartitionManager is to validate your configuration.  Thus it will check that all the measure groups exist in the relevant cubes in the relevant databases.  It even checks the measure groups have a template partition with its where clause set to WHERE 1=0.   All this is logged along with the creation/deletion of each partition.  So check the log files to ensure your initial configuration is correct.

Download from Codeplex

The OlapPartitionManager is available for download from Codeplex. I would be glad to have your feedback on the project via the project’s discussions tab.

SSAS: Utility to check you have the correct data types and sizes in your cube definition

This blog describes a tool I developed which allows you to compare the data types and data sizes found in the cube’s data source view with the data types/sizes of the corresponding dimensional attribute. 

Why is this important?  Well when creating named queries in a cube’s data source view, it is often necessary to use the SQL CAST or CONVERT operation to change the data type to something more appropriate for SSAS.  This is particularly important when your cube is based on an Oracle data source or using custom SQL queries rather than views in the relational database.   The problem with BIDS is that if you change the underlying SQL query, then the size of the data type in the dimension does not update automatically.  This then causes problems during deployment whereby processing the dimension fails because the data in the relational database is wider than that allowed by the dimensional attribute.

In particular, if you use some string manipulation functions provided by SQL Server or Oracle in your queries, you may find that the 10 character string you expect suddenly turns into an 8,000 character monster.  For example, the SQL Server function REPLACE returns column with a width of 8,000 characters.  So if you use this function in the named query in your DSV, you will get a column width of 8,000 characters.  Although the Oracle REPLACE function is far more intelligent, the generated column size could still be way bigger than the maximum length of the data actually in the field.

Now this may not be a problem when prototyping, but in your production cubes you really should clean up this kind of thing as these massive strings will add to processing times and storage space. Similarly, you do not want to forget to change the size of the dimension attribute if your database columns increase in size.

Introducing CheckCubeDataTypes Utiltity

The CheckCubeDataTypes application extracts all the data types and data sizes for all attributes in the cube and compares them to the data types and data sizes in the cube’s data source view.  It then generates an Excel CSV file which contains all this metadata along with a flag indicating if there is a mismatch between the DSV and the dimensional attribute.  Note that the app not only checks all the attribute keys but also the name and value columns for each attribute.
Another benefit of having the metadata held in a CSV text file format is that you can place the file under source code control.  This allows you to compare the metadata of the previous cube release with your new release to highlight problems introduced by new development.

You can download the C# source code from here: CheckCubeDataTypes.zip

A typical example of the output Excel CSV file is shown below - note that the last column shows a data size mismatch by TRUE appearing in the column

Posted by DrJohn with 2 comment(s)
Filed under: ,

SSAS: Utility to export SQL code from your cube's Data Source View (DSV)

When you are working on a cube, particularly in a multi-person team, it is sometimes necessary to review what changes that have been done to the SQL queries in the cube's data source view (DSV). This can be a problem as the SQL editor in the DSV is not the best interface to review code. Now of course you can cut and paste the SQL into SSMS, but you have to do each query one-by-one. What is worse your DBA is unlikely to have BIDS installed, so you will have to manually export all the SQL yourself and send him the files. To make it easy to get hold of the SQL in a Data Source View, I developed a C# utility which connects to an OLAP database and uses Analysis Services Management Objects (AMO) to obtain and export all the SQL to a series of files. The added benefit of this approach is that these SQL files can be placed under source code control which means the DBA can easily compare one version with another.

The Trick

When I came to implement this utility, I quickly found that the AMO API does not give direct access to anything useful about the tables in the data source view. Iterating through the DSVs and tables is easy, but getting to the SQL proved to be much harder. My Google searches returned little of value, so I took a look at the idea of using the XmlDom to open the DSV’s XML and obtaining the SQL from that. This is when the breakthrough happened. Inspecting the DSV’s XML I saw the things I was interested in were called

  • TableType
  • DbTableName
  • FriendlyName
  • QueryDefinition

Searching Google for FriendlyName returned this page: Programming AMO Fundamental Objects which hinted at the fact that I could use something called ExtendedProperties to obtain these XML attributes. This simplified my code tremendously to make the implementation almost trivial. So here is my code with appropriate comments. The full solution can be downloaded from here: ExportCubeDsvSQL.zip

 

using System;
using System.Data;
using System.IO;
using Microsoft.AnalysisServices;

... class code removed for clarity

// connect to the OLAP server
Server olapServer = new Server();
olapServer.Connect(config.olapServerName);
if (olapServer != null)
{
// connected to server ok, so obtain reference to the OLAP database
Database olapDatabase = olapServer.Databases.FindByName(config.olapDatabaseName);
if (olapDatabase != null)
{
Console.WriteLine(string.Format("Succesfully connected to '{0}' on '{1}'",
  config.olapDatabaseName,
  config.olapServerName));
// export SQL from each data source view (usually only one, but can be many!)
foreach (DataSourceView dsv in olapDatabase.DataSourceViews)
{
Console.WriteLine(string.Format("Exporting SQL from DSV '{0}'", dsv.Name));
// for each table in the DSV, export the SQL in a file
foreach (DataTable dt in dsv.Schema.Tables)
{

Console.WriteLine(string.Format("Exporting SQL from table '{0}'", dt.TableName));

// get name of the table in the DSV
// use the FriendlyName as the user inputs this and therefore has control of it
string queryName = dt.ExtendedProperties["FriendlyName"].ToString().Replace(" ", "_");
string sqlFilePath = Path.Combine(targetDir.FullName, queryName + ".sql");

// delete the sql file if it exists
... file deletion code removed for clarity

// write out the SQL to a file
if (dt.ExtendedProperties["TableType"].ToString() == "View")
{

File.WriteAllText(sqlFilePath, dt.ExtendedProperties["QueryDefinition"].ToString());

}
if (dt.ExtendedProperties["TableType"].ToString() == "Table")
{
File.WriteAllText(sqlFilePath, dt.ExtendedProperties["DbTableName"].ToString());

}

}

}

Console.WriteLine(string.Format("Successfully written out SQL scripts to '{0}'", targetDir.FullName));

}

}

 

Of course, if you are following industry best practice, you should be basing your cube on a series of views. This will mean that this utility will be of limited practical value unless of course you are inheriting a project and want to check if someone did the implementation correctly.

Posted by DrJohn with 1 comment(s)
Filed under: ,

Building a Dynamic OLAP Environment at #SQLbits8

On Saturday I attended the SQLBits conference in sunny Brighton. Great conference, well organized with lots of great speakers presenting high quality content.

For me the highlight was meeting many of the senior guys from the Redmond SQL Server product team including Richard Tkachuk, Mark Souza and Ross Mistry.

Marco Russo presented a great session called Analysis Services Advanced Best Practice. Ok so I have read his book and implement most of the best practices already, but it was great to hear it all from the author himself!

I received very positive feedback on my own session Building a Dynamic OLAP Environment which I had discussed in a previous post. Sometime soon, the slide deck will be made available on the SQLBits web site along with the video of my presentation. However, if you are eager to get your hands on it now, it can be downloaded from here: BuildingDynamicOLAP.zip.

All in all, a great conference! I am looking forward to the next already.

Posted by DrJohn with no comments
Filed under: ,

The most dangerous SQL Script in the world!

In my last blog entry, I outlined how to automate SQL Server database builds from concatenated SQL Scripts. However, I did not mention how I ensure the database is clean before I rebuild it. Clearly a simple DROP/CREATE DATABASE command would suffice; but you may not have permission to execute such commands, especially in a corporate environment controlled by a centralised DBA team. However, you should at least have database owner permissions on the development database so you can actually do your job! Then you can employ my universal "drop all" script which will clear down your database before you run your SQL Scripts to rebuild all the database objects.

Why start with a clean database?

During the development process, it is all too easy to leave old objects hanging around in the database which can have unforeseen consequences. For example, when you rename a table you may forget to delete the old table and change all the related views to use the new table. Clearly this will mean an end-user querying the views will get the wrong data and your reputation will take a nose dive as a result!

Starting with a clean, empty database and then building all your database objects using SQL Scripts using the technique outlined in my previous blog means you know exactly what you have in your database. The database can then be repopulated using SSIS and bingo; you have a data mart "to go".

My universal "drop all" SQL Script

To ensure you start with a clean database run my universal "drop all" script which you can download from here: 100_drop_all.zip

By using the database catalog views, the script finds and drops all of the following database objects:

  1. Foreign key relationships
  2. Stored procedures
  3. Triggers
  4. Database triggers
  5. Views
  6. Tables
  7. Functions
  8. Partition schemes
  9. Partition functions
  10. XML Schema Collections
  11. Schemas
  12. Types
  13. Service broker services
  14. Service broker queues
  15. Service broker contracts
  16. Service broker message types
  17. SQLCLR assemblies

There are two optional sections to the script: drop users and drop roles. You may use these at your peril, particularly as you may well remove your own permissions!

Note that the script has a verbose mode which displays the SQL commands it is executing. This can be switched on by setting @debug=1.

Running this script against one of the system databases is certainly not recommended! So I advise you to keep a USE database statement at the top of the file.

Good luck and be careful!!

Posted by DrJohn with no comments
Filed under:

SSIS: Building SQL databases on-the-fly using concatenated SQL scripts

Over the years I have developed many techniques which help automate the whole SQL Server build process. In my current process, where I need to build entire OLAP data marts on-the-fly, I make regular use of a simple but very effective mechanism to concatenate all the SQL Scripts together from my SSMS (SQL Server Management Studio) projects. This proves invaluable because in two clicks I can redeploy an entire SQL Server database with all tables, views, stored procedures etc. Indeed, I can also use the concatenated SQL scripts with SSIS to build SQL Server databases on-the-fly.

You may be surprised to learn that I often redeploy the database several times per day, or even several times per hour, during the development process. This is because the deployment errors are logged and you can quickly see where SQL Scripts have object dependency errors. For example, after changing a table structure you may have forgotten to change any related views. The deployment log immediately points out all the objects which failed to build so you can fix and redeploy the database very quickly. The alternative approach (i.e. doing changes in the database directly using the SSMS UI) would require you to check all dependent objects before making changes. The chances are that you will miss something and wonder why your app returns the wrong data – a common problem caused by changing a table without re-creating dependent views.

Using SQL Projects in SSMS

A great many developers fail to make use of SQL Projects in SSMS (SQL Server Management Studio). To me they are invaluable way of organizing your SQL Scripts. The screenshot below shows a typical SSMS solution made up of several projects – one project for tables, another for views etc. The key point is that the projects naturally fall into the right order in file system because of the project name. The number in the folder or file name ensures that the projects the SQL scripts are concatenated together in the order that they need to be executed. Hence the script filenames start with 100, 110 etc.


Concatenating SQL Scripts

To concatenate the SQL Scripts together into one file, I use notepad.exe to create a simple batch file (see example screenshot) which uses the TYPE command to write the content of the SQL Script files into a combined file. As the SQL Scripts are in several folders, I simply use several TYPE command multiple times and append the output together.

If you are unfamiliar with batch files, you may not know that the angled bracket (>) means write output of the program into a file. Two angled brackets (>>) means append output of this program into a file. So the command-line

DIR > filelist.txt

would write the content of the DIR command into a file called filelist.txt. In the example shown above, the concatenated file is called SB_DDS.sql

If, like me you place the concatenated file under source code control, then the source code control system will change the file's attribute to "read-only" which in turn would cause the TYPE command to fail. The ATTRIB command can be used to remove the read-only flag.

Using SQLCmd to execute the concatenated file

Now that the SQL Scripts are all in one big file, we can execute the script against a database using SQLCmd using another batch file as shown below:

SQLCmd has numerous options, but the script shown above simply executes the SS_DDS.sql file against the SB_DDS_DB database on the local machine and logs the errors to a file called SB_DDS.log. So after executing the batch file you can simply check the error log to see if your database built without a hitch. If you have errors, then simply fix the source files, re-create the concatenated file and re-run the SQLCmd to rebuild the database. This two click operation allows you to quickly identify and fix errors in your entire database definition.


Using SSIS to execute the concatenated file

To execute the concatenated SQL script using SSIS, you simply drop an Execute SQL task into your package and set the database connection as normal and then select File Connection as the SQLSourceType (as shown below). Create a file connection to your concatenated SQL script and you are ready to go.

 

Tips and Tricks

  • Add a new-line at end of every file
    The most common problem encountered with this approach is that the GO statement on the last line of one file is placed on the same line as the comment at the top of the next file by the TYPE command. The easy fix to this is to ensure all your files have a new-line at the end.
  • Remove all USE database statements
    The SQLCmd identifies which database the script should be run against.  So you should remove all USE database commands from your scripts - otherwise you may get unintentional side effects!!
  • Do the Create Database separately
    If you are using SSIS to create the database as well as create the objects and populate the database, then invoke the CREATE DATABASE command against the master database using a separate package before calling the package that executes the concatenated SQL script.  

 

Posted by DrJohn with 2 comment(s)
Filed under: , ,

SSIS: Deploying OLAP cubes using C# script tasks and AMO

As part of the continuing series on Building dynamic OLAP data marts on-the-fly, this blog entry will focus on how to automate the deployment of OLAP cubes using SQL Server Integration Services (SSIS) and Analysis Services Management Objects (AMO).

OLAP cube deployment is usually done using the Analysis Services Deployment Wizard. However, this option was dismissed for a variety of reasons. Firstly, invoking external processes from SSIS is fraught with problems as (a) it is not always possible to ensure SSIS waits for the external program to terminate; (b) we cannot log the outcome properly and (c) it is not always possible to control the server's configuration to ensure the executable works correctly. Another reason for rejecting the Deployment Wizard is that it requires the 'answers' to be written into four XML files. These XML files record the three things we need to change: the name of the server, the name of the OLAP database and the connection string to the data mart. Although it would be reasonably straight forward to change the content of the XML files programmatically, this adds another set of complication and level of obscurity to the overall process.

When I first investigated the possibility of using C# to deploy a cube, I was surprised to find that there are no other blog entries about the topic. I can only assume everyone else is happy with the Deployment Wizard!

SSIS "forgets" assembly references

If you build your script task from scratch, you will have to remember how to overcome one of the major annoyances of working with SSIS script tasks: the forgetful nature of SSIS when it comes to assembly references. Basically, you can go through the process of adding an assembly reference using the Add Reference dialog, but when you close the script window, SSIS "forgets" the assembly reference so the script will not compile. After repeating the operation several times, you will find that SSIS only remembers the assembly reference when you specifically press the Save All icon in the script window. This problem is not unique to the AMO assembly and has certainly been a "feature" since SQL Server 2005, so I am not amazed it is still present in SQL Server 2008 R2!

Sample Package

So let's take a look at the sample SSIS package I have provided which can be downloaded from here: DeployOlapCubeExample.zip  Below is a screenshot after a successful run.

Connection Managers

The package has three connection managers:

  • AsDatabaseDefinitionFile is a file connection manager pointing to the .asdatabase file you wish to deploy. Note that this can be found in the bin directory of you OLAP database project once you have clicked the "Build" button in Visual Studio
  • TargetOlapServerCS is an Analysis Services connection manager which identifies both the deployment server and the target database name.
  • SourceDataMart is an OLEDB connection manager pointing to the data mart which is to act as the source of data for your cube. This will be used to replace the connection string found in your .asdatabase file

Once you have configured the connection managers, the sample should run and deploy your OLAP database in a few seconds. Of course, in a production environment, these connection managers would be associated with package configurations or set at runtime.

When you run the sample, you should see that the script logs its activity to the output screen (see screenshot above). If you configure logging for the package, then these messages will also appear in your SSIS logging.

Sample Code Walkthrough

Next let's walk through the code. The first step is to parse the connection string provided by the TargetOlapServerCS connection manager and obtain the name of both the target OLAP server and also the name of the OLAP database. Note that the target database does not have to exist to be referenced in an AS connection manager, so I am using this as a convenient way to define both properties.

We now connect to the server and check for the existence of the OLAP database. If it exists, we drop the database so we can re-deploy.

svr.Connect(olapServerName);

if (svr.Connected)

{

// Drop the OLAP database if it already exists

Database db = svr.Databases.FindByName(olapDatabaseName);

if (db != null)

{

db.Drop();

}

// rest of script

}

Next we start building the XMLA command that will actually perform the deployment. Basically this is a small chuck of XML which we need to wrap around the large .asdatabase file generated by the Visual Studio build process.

// Start generating the main part of the XMLA command

XmlDocument xmlaCommand = new XmlDocument();

xmlaCommand.LoadXml(string.Format("<Batch Transaction='false' xmlns='http://schemas.microsoft.com/analysisservices/2003/engine'><Alter AllowCreate='true' ObjectExpansion='ExpandFull'><Object><DatabaseID>{0}</DatabaseID></Object><ObjectDefinition/></Alter></Batch>", olapDatabaseName));

 

Next we need to merge two XML files which we can do by simply using setting the InnerXml property of the ObjectDefinition node as follows:

// load OLAP Database definition from .asdatabase file identified by connection manager

XmlDocument olapCubeDef = new XmlDocument();

olapCubeDef.Load(Dts.Connections["AsDatabaseDefinitionFile"].ConnectionString);

// merge the two XML files by obtain a reference to the ObjectDefinition node

oaRootNode.InnerXml = olapCubeDef.InnerXml;

 

One hurdle I had to overcome was removing detritus from the .asdabase file left by the Visual Studio build. Through an iterative process, I found I needed to remove several nodes as they caused the deployment to fail. The XMLA error message read "Cannot set read-only node: CreatedTimestamp" or similar. In comparing the XMLA generated with by the Deployment Wizard with that generated by my code, these read-only nodes were missing, so clearly I just needed to strip them out. This was easily achieved using XPath to find the relevant XML nodes, of which I show one example below:

foreach (XmlNode node in rootNode.SelectNodes("//ns1:CreatedTimestamp", nsManager))

{

node.ParentNode.RemoveChild(node);

}

Now we need to change the database name in both the ID and Name nodes using code such as:

XmlNode databaseID = xmlaCommand.SelectSingleNode("//ns1:Database/ns1:ID", nsManager);

if (databaseID != null)

databaseID.InnerText = olapDatabaseName;

Finally we need to change the connection string to point at the relevant data mart. Again this is easily achieved using XPath to search for the relevant nodes and then replace the content of the node with the new name or connection string.

XmlNode connectionStringNode = xmlaCommand.SelectSingleNode("//ns1:DataSources/ns1:DataSource/ns1:ConnectionString", nsManager);

if (connectionStringNode != null)

{

connectionStringNode.InnerText = Dts.Connections["SourceDataMart"].ConnectionString;

}

Finally we need to perform the deployment using the Execute XMLA command and check the returned XmlaResultCollection for errors before setting the Dts.TaskResult.

XmlaResultCollection oResults = svr.Execute(xmlaCommand.InnerXml);

 

// check for errors during deployment

foreach (Microsoft.AnalysisServices.XmlaResult oResult in oResults)

{

foreach (Microsoft.AnalysisServices.XmlaMessage oMessage in oResult.Messages)

{

if ((oMessage.GetType().Name == "XmlaError"))

{

FireError(oMessage.Description);

HadError = true;

}

}

}

If you are not familiar with XML programming, all this may all seem a bit daunting, but perceiver as the sample code is pretty short.

If you would like the script to process the OLAP database, simply uncomment the lines in the vicinity of Process method. Of course, you can extend the script to perform your own custom processing and to even synchronize the database to a front-end server. Personally, I like to keep the deployment and processing separate as the code can become overly complex for support staff.

If you want to know more, come see my session at the forthcoming SQLBits conference.

Posted by DrJohn with 2 comment(s)

Building dynamic OLAP data marts on-the-fly

At the forthcoming SQLBits conference, I will be presenting a session on how to dynamically build an OLAP data mart on-the-fly. This blog entry is intended to clarify exactly what I mean by an OLAP data mart, why you may need to build them on-the-fly and finally outline the steps needed to build them dynamically. In subsequent blog entries, I will present exactly how to implement some of the techniques involved.

What is an OLAP data mart?

In data warehousing parlance, a data mart is a subset of the overall corporate data provided to business users to meet specific business needs. Of course, the term does not specify the technology involved, so I coined the term "OLAP data mart" to identify a subset of data which is delivered in the form of an OLAP cube which may be accompanied by the relational database upon which it was built. To clarify, the relational database is specifically create and loaded with the subset of data and then the OLAP cube is built and processed to make the data available to the end-users via standard OLAP client tools.

Why build OLAP data marts?

Market research companies sell data to their clients to make money. To gain competitive advantage, market research providers like to "add value" to their data by providing systems that enhance analytics, thereby allowing clients to make best use of the data. As such, OLAP cubes have become a standard way of delivering added value to clients. They can be built on-the-fly to hold specific data sets and meet particular needs and then hosted on a secure intranet site for remote access, or shipped to clients' own infrastructure for hosting. Even better, they support a wide range of different tools for analytical purposes, including the ever popular Microsoft Excel.

Extension Attributes: The Challenge

One of the key challenges in building multiple OLAP data marts based on the same 'template' is handling extension attributes. These are attributes that meet the client's specific reporting needs, but do not form part of the standard template. Now clearly, these extension attributes have to come into the system via additional files and ultimately be added to relational tables so they can end up in the OLAP cube. However, processing these files and filling dynamically altered tables with SSIS is a challenge as SSIS packages tend to break as soon as the database schema changes.

There are two approaches to this: (1) dynamically build an SSIS package in memory to match the new database schema using C#, or (2) have the extension attributes provided as name/value pairs so the file's schema does not change and can easily be loaded using SSIS. The problem with the first approach is the complexity of writing an awful lot of complex C# code. The problem of the second approach is that name/value pairs are useless to an OLAP cube; so they have to be pivoted back into a proper relational table somewhere in the data load process WITHOUT breaking SSIS. How this can be done will be part of future blog entry.

What is involved in building an OLAP data mart?

There are a great many steps involved in building OLAP data marts on-the-fly. The key point is that all the steps must be automated to allow for the production of multiple OLAP data marts per day (i.e. many thousands, each with its own specific data set and attributes).

Now most of these steps have a great deal in common with standard data warehouse practices. The key difference is that the databases are all built to order. The only permanent database is the metadata database (shown in orange) which holds all the metadata needed to build everything else (i.e. client orders, configuration information, connection strings, client specific requirements and attributes etc.). The staging database (shown in red) has a short life: it is built, populated and then ripped down as soon as the OLAP Data Mart has been populated. In the diagram below, the OLAP data mart comprises the two blue components: the Data Mart which is a relational database and the OLAP Cube which is an OLAP database implemented using Microsoft Analysis Services (SSAS). The client may receive just the OLAP cube or both components together depending on their reporting requirements.

 

So, in broad terms the steps required to fulfil a client order are as follows:

Step 1: Prepare metadata

  • Create a set of database names unique to the client's order
  • Modify all package connection strings to be used by SSIS to point to new databases and file locations.

Step 2: Create relational databases

  • Create the staging and data mart relational databases using dynamic SQL and set the database recovery mode to SIMPLE as we do not need the overhead of logging anything
  • Execute SQL scripts to build all database objects (tables, views, functions and stored procedures) in the two databases

Step 3: Load staging database

  • Use SSIS to load all data files into the staging database in a parallel operation
  • Load extension files containing name/value pairs. These will provide client-specific attributes in the OLAP cube.

Step 4: Load data mart relational database

  • Load the data from staging into the data mart relational database, again in parallel where possible
  • Allocate surrogate keys and use SSIS to perform surrogate key lookup during the load of fact tables

Step 5: Load extension tables & attributes

  • Pivot the extension attributes from their native name/value pairs into proper relational tables
  • Add the extension attributes to the views used by OLAP cube

Step 6: Deploy & Process OLAP cube

  • Deploy the OLAP database directly to the server using a C# script task in SSIS
  • Modify the connection string used by the OLAP cube to point to the data mart relational database
  • Modify the cube structure to add the extension attributes to both the data source view and the relevant dimensions
  • Remove any standard attributes that not required
  • Process the OLAP cube

Step 7: Backup and drop databases

  • Drop staging database as it is no longer required
  • Backup data mart relational and OLAP database and ship these to the client's infrastructure
  • Drop data mart relational and OLAP database from the build server
  • Mark order complete

Start processing the next order, ad infinitum.

So my future blog posts and my forthcoming session at the SQLBits conference will all focus on some of the more interesting aspects of building OLAP data marts on-the-fly such as handling the load of extension attributes and how to dynamically alter the structure of an OLAP cube using C#.

Posted by DrJohn with 3 comment(s)
More Posts Next page »