When it comes to system integration, Microsoft provides such a plethora of options it is far too easy to forget about some of them and/or think about their practical application. This is particularly true of the SQL Server BI stack which is so huge in its scope.

Our need was to populate a relational database using the results of an MDX query so we could do some data reconciliation to prove the new cube provided the results we wanted. Given my skills in C#, my immediate thought was to write a simple console application which performed an MDX query and populate a database table using SQL Bulk Insert. However, this option would not be too easy the rest of the team to support as they do not have the relevant C# skills or experience of ADOMD.NET.

Vincent Rainardi (author of Building a Data Warehouse: with examples in SQL Server) pointed me the direction of OpenQuery and linked servers. This is a remarkably simple and flexible solution to a great many system integration problems, so I thought I had better share it with a wider audience.

OpenQuery and OpenRowSet both execute pass-through queries against a specific server. The 'query string' to be executed is not evaluated by the local server but simply 'passed through' to the target server which is expected to return a row set. The 'query string' can be anything, including MDX, so long as it is understood by the target server. Well this works a treat with MDX and Analysis Services.

To use this technique, you will first need to create a linked SSAS server using either SQL Server Management Studio or in script using the stored procedure master.dbo.sp_addlinkedserver. Here is an example:

Once linked, you can then query your cube using MDX and combine the results with the content of a SQL Server table. Alternatively you can simply insert the results of the MDX query into a database table. For example, the following screenshot shows the MDX executed in SQL Server Management Studio:

Whereas the following screenshot shows the results of the equivalent OpenQuery:

 

Note now the column names contain the unique name of the attribute. Before showing how to deal with these, let's just look one surprising element of the functionality offered by this technique.

If you use a hierarchy in your query, then you will get extra columns describing the values of each hierarchy level. For example, if we change the above query to use the [Customer].[Customer Geography] hierarchy then you get an extra column in the query results describing the [Customer].[Customer Geography].[Country] level.

This is even more dramatic when using a parent-child hierarchy; a query against the Accounts dimension will bring back all six levels.

Dealing with Attribute Unique Names

Of course, you may find the attribute unique name a little long and cumbersome to use in your T-SQL. The easiest way to remove them is to use an alias. However, you still need to type that lengthy name in the first place.

The easy way around this is to copy the query results into Excel and then obtain the column header text from there. However, by default, the copying of column headers is switched off, so you may want to switch on the "Include column headers when copying or saving results" option in SQL Server Management Studio which can be found under Tools->Options->Query Results->Results to Grid. Here is a screenshot to make it easier:

Using an Alias 

As you will know, the square bracket in T-SQL is used as a delimiter in much the same way as it is in MDX. So simply typing:

[Customer].[Customer Geography].[Country].[MEMBER_CAPTION] AS Country

will cause a "multi-part identifier could not be bound" error. To avoid this, use double quotes around the attribute unique name as follows:

"[Customer].[Customer Geography].[Country].[MEMBER_CAPTION]" AS Country,

Numerical Advantage

The problem with OpenQuery is that every column has a data type of string, including the numerical columns. This means that the data type must be changed if you want to perform numerical computations on the figures using T-SQL. So our two numerical columns need to be wrapped in CAST & CONVERT as follows:

CAST(CONVERT(float, "[Measures].[Internet Gross Profit]") AS money) AS GrossProfit

Combining multi-dimensional and relational data together

In the case where you are creating a management report that combines the profitability figures of each division with the relevant manager's comments as to why the figure have (or have not) been achieved, then the use of OpenQuery becomes invaluable. This is because the only way to handle such a situation in the cube would be to create another dimension which links at the relevant level of granularity and enable cube write-back so the manager can update dimensional properties. A lot of hassle for a few pieces of textual information!

With OpenQuery, this becomes trivial as the text can be held in a supplementary database table and combined with the results of the MDX query.

Now the problem you may have spotted is that OpenQuery returns the MEMBER_CAPTION which will probably not be sufficiently unique to match an entry in the relational database. What you really need is the MEMBER_KEY. This can be added as a column using a calculated member as follows:

WITH MEMBER [Measures].[OrganisationID] AS [Organization].[Organizations].MEMBER_KEY

As the Adventure Works database does not contain a 'comments' table, let's create one and insert some rows using the new INSERT feature in SQL Server 2008.

CREATE TABLE dbo.ManagersComments(
    OrganisationID int NOT NULL,
    [Manager's Name] nvarchar(50) NULL,
    [Manager's Comments] nvarchar(255) NULL
);    
INSERT INTO ManagersComments(OrganisationID, [Manager's Name], [Manager's Comments])
    VALUES (6, 'John Doe', 'My team failed again. You know how it is...'),
                (5, 'John Smith', 'My team are all stars.');

So let's bring MDX and SQL data together into one query. This is best achieved with a common table expression as the layout can be far more readable:

WITH MdxQuery
(    
    [Region],
    OrganisationID,
    [Net Sales],
    [Cost of Sales],
    [Gross Margin],
    [Gross Margin%]
)
AS
(
    SELECT
        "[Organization].[Organizations].[Organization Level 04].[MEMBER_CAPTION]" AS [Region],
        CONVERT(int, "[Measures].[OrganisationID]") AS OrganisationID,
        CAST(CONVERT(float, "[Measures].[Net Sales]") AS money) AS [Net Sales],
        --"[Measures].[Net Sales]" AS [Net Sales],
        CAST(CONVERT(float, "[Measures].[Cost of Sales]") AS money) AS [Cost of Sales],
        CAST(CONVERT(float, "[Measures].[Gross Margin]") AS money) AS [Gross Margin],
        CONVERT(decimal(4, 2), (CONVERT(decimal(19, 17), "[Measures].[Gross Margin%]") * 100)) AS [Gross Margin%]
    FROM OPENQUERY(AdvertureWorksServer,
        'WITH MEMBER [Measures].[OrganisationID] AS
            [Organization].[Organizations].MEMBER_KEY
        MEMBER [Measures].[Net Sales] AS
            ([Account].[Accounts].&[50], [Measures].[Amount])
        MEMBER [Measures].[Cost of Sales] AS
            ([Account].[Accounts].&[55], [Measures].[Amount])
        MEMBER [Measures].[Gross Margin] AS
            ([Account].[Accounts].&[49], [Measures].[Amount])
        MEMBER [Measures].[Gross Margin%] AS
            [Measures].[Gross Margin] / [Measures].[Net Sales]
        SELECT
        {
            [Measures].[OrganisationID], [Measures].[Net Sales], [Measures].[Cost of Sales],
            [Measures].[Gross Margin], [Measures].[Gross Margin%]
        } ON 0,
        {
            [Organization].[Organizations].&[14].Children
        }
        ON 1
        FROM [Finance]
        WHERE ([Date].[Fiscal].[Fiscal Quarter].&[2004]&[1])'
    )
),
CombineMdxWithTable (    
    [Region],
    [Manager's Name],
    [Net Sales],
    [Cost of Sales],
    [Gross Margin],
    [Gross Margin%],
    [Manager's Comments]
)
AS
(
    SELECT
        A.[Region],
        B.[Manager's Name],
        A.[Net Sales],
        A.[Cost of Sales],
        A.[Gross Margin],
        A.[Gross Margin%],
        B.[Manager's Comments]
FROM MdxQuery A
    LEFT JOIN dbo.ManagersComments B ON
        A.OrganisationID = B.OrganisationID
)
SELECT * FROM CombineMdxWithTable

 

Here are the results of our query:

The Big Surprise - Support for more than two axes!

What I found most surprising about the functionality offered by OpenQuery is that it will deal with extra dimensions/axes sensibly.  Of course any multi-dimensional query can always be transformed into a two-dimensional recordset; each cell simply becomes a row in the recordset, with one column for each dimension. So for example, a four dimensional MDX query returning a two measures can be easily transformed into a two dimensional recordset containing six columns. Each row represents a cell and has two numerical columns.

Of course, if you try adding more than two axes to an MDX query in SQL Server Management Studio, this is the resulting error you will see:

Results cannot be displayed for cellsets with more than two axes.

Given that OpenQuery does such a good job of flattening the resultset into two dimensions, SQL Server Management Studio could really make more of an effort!

In reality, similar results are returned by a two-dimensional MDX query where the rows axes is a CrossJoin of all the relevant dimensions. It also has the advantage that the query would be able to take advantage of the server's AutoExists performance optimization and therefore fewer rows will be returned. However, it does not sound quite so sexy as a true multi-dimensional query. Ah, well. Technology cannot always be sexy!

Posted by DrJohn | with no comments
Filed under: ,

Vista was driving me nuts! I could not believe how slow it was. The hard disk was always thrashing, the machine ran hot and the battery life was pants. I had spent hours optimizing my environment; first deleting all the 'bloat ware' installed by Dell, then stopping every service I did not recognise and finally removing every entry I could from the start-up registry entries (i.e. HKEY_CURRENT_USER and HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Run).

The biggest culprit for wasting system resources was Vista's SuperFetch. Which idiot thought this up?? The service reads all the files on the computer and 'analyses' what you have done before and predicts what you may need in the immediate future and then loads the application and data into memory so you get fantastic performance if, and this is a big 'if', you happen to ask for the application or data that it thinks you want. In reality, the performance of the whole PC suffers as this silly service just wastes your PC's resources doing things that you do not need. Ahhhhh! I wish I had a shotgun and a time machine to get that S.O.A.B.!

My laptop is no slouch either. A Dell XPS M1530 laptop with Intel Core 2 Duo 2.00GHz, 4GB RAM, 250GB 5,600RPM hard drive and a 256MB NVIDIA GeForce 8600M GT graphics card driving a 1680x1050 pixel TrueLife TFT. So I was particularly annoyed by the lack of performance provided by Vista. The laptop was rated as 4.9 out of 5 in Vista's own performance benchmark; 5 been the best machines that were around when they were developing Vista. Clearly they were benchmarking how much system resource they could waste rather than focus on end-user performance!

As I develop server-based applications on the SQL Server 2005 BI suite, I had installed VMWare Workstation so I could run multiple virtual machines – one for each of my client's server configurations. I find VMWare much more flexible than Virtual PC, but then again, it's not free! What seemed silly is that the Windows Server 2003 instances running inside the virtual machine seemed much faster than Vista itself, especially when copying files across the network!

When I heard the news that SQL Server 2008 had finally shipped, I decided that it was time to do something radical: upgrade to Windows Server 2008 as the native OS and install SQL Server 2008! As a consultant on the road, I cannot afford to be without my machine for too long, so I decided the safest option would be to buy another hard drive and take the Vista disk out and replace it with a shiny new 7,200RPM hard drive. Well performance is everything!

Before taking the Vista hard drive out, I created backups onto my NAS drive and also copied onto a memory stick all the device drivers that Dell had kindly created in the c:\drivers folder (after checking they were the latest versions of course!).

The installation of Windows Server 2008 on my laptop went like a dream. Every time the OS found a new device, I simply pointed it at the root of the memory stick and, bingo, it installed without a glitch. The only devices that failed were the ones that Dell had forgotten to place in the c:\drivers folder or for my own printer/scanner. These were soon downloaded from Dell and Canon's web sites.

Why did it go so smoothly? Well Windows Server 2008 shares the same kernel as Vista and the device driver model is the same. So a Vista device driver will work with Windows Server 2008 and visa versa. However, be aware that this will only work with the 32-bit version of Windows Server 2008 as very few device drivers are available for 64-bit Windows.

A very Desktop Experience

Of course Vista does have some charms and I soon came to miss certain applications like Windows Photo Gallery and Windows Media Player. I tried finding them on the Microsoft Download site but to no avail. Windows Media Player is built into Vista and therefore not provided as a separate download. I searched the internet for clues as to where I could get them from, but to no avail. Several days went by before I found www.win2008workstation.com which has many tips on how to turn Windows Server 2008 into a workstation. This includes how to turn on something called the Desktop Experience. I followed the procedure and my desired applications magically appeared. However, some undesirable applications also popped up like Windows Defender (what junk!), Windows Mail, Windows Contacts and Windows Calendar. Unfortunately, you get all or nothing. There is no fine control over which of the Vista applications get installed. At least you can disable Windows Defender by disabling the related service.

Virtual CD-ROM

Eager to install SQL Server 2008 from a freshly downloaded ISO image, I started scouting for a way to load an ISO images as a virtual drive. I figured there must be a way to load an ISO image built into the OS itself. However, I could not find the right command-line, but I am sure there is someone out there that knows how to do it! So I tried my old favourite Virtual CD-ROM Control Panel which is a free download originally developed by Microsoft but now provided as an unsupported utility. To my surprise this worked a treat. Amazing since it was last updated in 2006!

Burning Desire

Over the years I have been a great lover of Nero Burning ROM. The main application had a simple but powerful user interface. However, the Nero package has grown out of all recognition over the years and offers so many utilities to do things I never imagined I needed (i.e. bloat ware) that I refuse to install it on my machine. Unfortunately the original Nero Burning ROM cannot be purchased separately, so I needed an alternative. After a perusal of Wikipedia, I found some freeware called Express Burn. The UI is almost identical to Nero Burning ROM and seems to work well. However, beware that many of the menu options are actually links to upgrade you to the paid for version. If anyone has an alternative suggestion, please let me know.

One frustration is that the DVD+/-RW drive was not originally visible to Express Burn unless I am logged in as Administrator, even though my own account belongs to the local administrators group! Clearly this is due to some obscure Local Security Policy. I did some fiddling about and amazingly today, as I write this blog entry I find that the DVD+/-RW drive has become visible to Express Burn again. Perhaps it was something I did (although I know not what) or the pressure of public exposure that made my device behave itself.

Well Connected

One slightly problematic device I had was my 3 USB mobile internet modem. This worked fine on first installation then stopped working for no apparent reason. I figured out it may be something to do with the Windows Firewall which, if I remember correctly, first appeared when I switched on the Desktop Experience. However, after tinkering with the firewall settings the device would still not work. I tried installing the latest device drivers, but to no avail. I uninstalled the device software and decided to ring 3's support line, confident in the knowledge that they would be no help at all; after all how many customers have mobile internet installed on a server?? Of course, several days passed before I resolved to get the problem fixed. Just prior to the call I thought I would prepare by reinstalling the device so the details of the problem would be fresh in my mind. Well blow me, the damn thing started working!

Mouse Problems

The one device that I have real problems with is my Logitech V470 Cordless Laser Mouse for Bluetooth. I can add it as a Bluetooth device, but it will not actually work. If I install the latest SetPoint software from Logitech, the mouse pointer jumps all over the place like a demented wasp. Ideas anyone?

My tool bag

I often find it useful to know what other consultants in the field carry around in their 'tool bag'. Well, here is my set of applications currently installed on the laptop:

  • Microsoft Windows Server 2008 including IIS 7.0
  • Microsoft SQL Server 2008 Developer Edition including SSAS, SSRS and SSIS and the new Adventure Works 2008 database samples.
  • Microsoft Visual Studio 2008
  • MSDN Library
  • Microsoft Office 2007 with "Save as PDF" enhancement
  • Microsoft Data Mining Add-ins for Excel 2007
  • UltraEdit – fantastic text and code editor I have been using for many, many years
  • Password Safe – I use this freeware to hold the username/password to over 100 web sites and bank accounts. They are all safe inside a heavily encrypted file which can be safely backed up to my NAS drive or memory stick without any danger of someone else been able to open the file.
  • PowerMarks – the easy way to move your internet favourites from machine to machine
  • Adobe Acrobat Reader – of course!
  • Primo PDF – freeware alternative to Adobe Acrobat Distiller for creating PDF files
  • Paint.NET – what a great free alternative to Adobe Photoshop Elements
  • Ultratagger – easy to use MP3 tagging software
  • FileZilla – great freeware FTP application
  • WinZip – the de facto standard
  • Apple iTunes for my iPod and (future) iPhone
  • www.iDrive.com on-line backup
  • Express Burn DVD/CD-ROM burning software
  • 3 USB mobile internet

 

So if you are thinking of making the move from Vista to Windows Server 2008, do it now! The stability, performance, flexibility and power of Windows Server 2008 can be yours.

Good luck!

Posted by DrJohn | 1 comment(s)

"I'm busting for a Wii!" was my wife's exclamation in the centre of Warrington. We had been an attempting to avoid traffic on the M6 but got snarled up in even more traffic in central Warrington. I did not understand at first, but then I saw the cheeky kid's grin on the advert on the back of a bus. In good northern dialect, he was not just bursting, but "busting" with excitement about the latest Nintendo games machine. What surprised me more was what followed. My wife gave me a 30 minute rant questioning exactly why I had not yet brought her a games machine. Of course my teenage daughter joined in the fray whilst my teenage son remained quite. Games machines have long been the domain of spotty teenage boys or young men with time on their hands and few friends. They are certainly not the domain of middle age females with a technology adversion! So I was rather taken aback by the whole situation.

However, this weekend the SQL Bits conferance delivered in more ways than I could possibly imagine. The event was well organised and well attended and all the lectures on the "track" I was on were well presented.

Andrew Sadler (formerly of ProClarity and now a consultant with Edenbrook) did a breakneck demonstration of the Microsoft BI stack including PerformancePoint and Excel Services. As I work with this stuff every day, I did not learn anything particularly new; it is just great to see someone else's take of the Microsoft offerings and exactly how they present it all as a 'coherent' strategy. There was clearly pent up frustration in the audience that the ProClarity product had been left on the shelf by Microsoft. A great tool been suffocated by the giant. I just hope that PerformancePoint v2 (whenever it surfaces) actually provides functionality as good, or better, than ProClarity's.

Simon Sabin did a great job of introducing some of the new features in the T-SQL language. To be honest, the audience got most animated when they realised Simon had simply highlighted the name of a table and hit some key and SQL Server Management Studio performed a query that listed not only the table definition but primary keys etc. He then showed us all how to configure keyboard shortcuts in Tools->Options->Environment->Keyboard. It seems that any single parameter stored proc (including your own) can be mapped to a keyboard shortcut and it will be passed the highlighted text when invoked. Simple, effective and one of the best things I learnt out of the session! Not that I am decrying Simon's excellent session, but something simple like a keyboard shortcut can save a lot of time in the long run, and these things are invaluable.

Allan Mitchell performed a great demo of the Data Mining Add-In for Excel. This is a tool that I always keep meaning to use on a project, but somehow never get around to getting it out of the bag and dusting it off. The new version of the DM Add-In for Excel 2007 demos really well. I will have to make a determined effort to do something with this technology soon!

It was great to meet Chris Webb for the first time. Having read the blogs and the book, it was good to get to know the guy in person. I will be using Intelligencia Query for BIDS and Reporting Services in my next project jointly developed by Chris and Andrew Wiles who also presented his 'reusable' approach to BI the day. Colin Hardie of the IMGroup ably demonstrated the new MERGE feature in T-SQL as well as several other approaches to handling Type 2 slowly changing dimensions.

It was great to meet old friends, put faces to people I had only previously met in 'cyberspace' and meet some new contacts. Microsoft provided free drinks after the gig and, even better, I won a Wii in the RedGate prize draw. Even my wife agrees that it was a good conference!

Posted by DrJohn | with no comments

There are often circumstances where an SSAS stored procedure needs to connect to the transactional database to perform a query.  This is certainly the case in a near-real-time OLAP solution where the cube sits directly on top of the application database and dynamic dimension security is implemented by a stored proc. Here the SSAS stored procedure has to query the transactional database in order to get the user's authorisation. Clearly the SSAS database has a data source and it uses this connection to process the cube. So how do we get the connection string?

Well the short code snippet below provides the solution. The code simply connects back to the current database using AMO and obtains the connection string from the data source object. It then removes the redundant "Provider=SQLNCLI.1;" before returning a valid SQL connection string that can be used by SqlConnection().

Of course, you must provide your stored proc assembly with the "Unrestricted" permission set and set its impersonation mode to "Service Account" for the connection string to work.

using AMO = Microsoft.AnalysisServices;

using Microsoft.AnalysisServices.AdomdServer;

...

/// <summary>

/// Obtains the SqlConnection string used by the OLAP database

/// </summary>

/// <returns></returns>

public static string GetSqlConnectionString()

{

    // connect to the current instance of Analysis Services and return the transactional data source

    AMO.Server svr = new AMO.Server();

    svr.Connect("Data Source=" + Context.CurrentServerID + ";Initial Catalog=" + Context.CurrentDatabaseName);

    // get the connection to the transactional database from the OLAP connection manager

    string connStr = svr.Databases.GetByName(Context.CurrentDatabaseName).DataSources[0].ConnectionString.Replace("Provider=SQLNCLI.1;", "");

    svr.Disconnect();

    return (connStr);

}

Posted by DrJohn | with no comments
Filed under: ,

I have always looked down upon the Fact and Dim prefixes applied to the tables within the AdventureWorksDW database with derision. "Why," I asked myself, "would anyone feel the need to use such prefixes? Clearly this is BI for dummies." Given that most BI projects use a nice clean, dedicated data warehouse database where all the tables are imported into the cube, perhaps my derision was justified.

However, in my current project I have been building a near-real-time OLAP solution directly on top of an application database. Of course, to isolate the cube from potential schema changes, I created an abstraction layer in the form of a set of views. However, as the project has progressed and the large team of developers has created more and more database objects, it has become increasingly difficult to find the views created for the cube among the debris.

I recently had to re-build the entire cube from scratch due to a bug caused by reverting to a previous version in source control. Of course, I took the opportunity to consolidate all my T-SQL code into a few files and adopt a consistent naming convention for all my views. So what naming convention did I adopt? Well Fact and Dim of course! Why? Well because anyone following on from my work who is half-familiar with the BI sample databases will immediately recognise the relevance.

I subsequently found out the big benefit of adopting this convention – it makes creating a DSV much, much easier! A quick search for all objects containing 'Fact' and 'Dim' soon had my new DSV populated with all the relevant objects.

So the Fact is that this practice is not so Dim after all!

Posted by DrJohn | 2 comment(s)
Filed under:

There are some things in the life that are so implicit and taken for granted that nobody ever bothers to tell you about them or write up any documentation. This is how I feel the implicit support for date and time measures must have come about in Microsoft SQL Server 2005 Analysis Services (SSAS). Certainly I could not find any documentation about the topic and yet intriguingly both Measures and Calculated Members have "Simple Date" and "Simple Time" in their drop down list of possible text formats.

Clearly SSAS will allow you to add attributes to a dimension that have the data type DateTime. However, SSAS does not allow you to create a measure based on DateTime data type. Now this is a bit of a drawback when you want to calculate the average time a process took, which is exactly what I needed to do. After searching the internet, I was none the wiser as to how to solve the problem so some experimentation was in order.

An obvious approach would be to create a measure which contains the elapse time for the process in minutes. This would certainly give the right answer when averaged. However, presenting the resulting number in a format that was meaningful to the end-user is beyond standard out-of-the-box functionality. Especially when the elapse time needs to be presented in financial speak such as T+1 15:30 (i.e. the next day at 3:30pm). And since the end-user would be browsing the cube with Excel 2007, standard out-of-the-box functionality is all I could use.

My data was pretty simple:

CREATE TABLE dbo.WhenEventHappened(

SurrogateKey int NOT NULL,

MeasurementTypeID int NOT NULL,

ReferenceDate datetime NOT NULL,

DateTimeEventHappened datetime NOT NULL

) ON [PRIMARY]

 

Where ReferenceDate holds the date the event is relative to and DateTimeEventHappened clearly holds the date & time when the event took place. So for example, if we use the financial services analogy the trade took place on the ReferenceDate and it was settled on DateTimeEventHappened. So the elapse time is (DateTimeEventHappened – ReferenceDate) in days, hours and minutes. In my data, ReferenceDate is always a date with no time element.

The SQL Server DateTime data type

You probably know that behind the scenes SQL Server holds dates as a decimal number. The whole number contains the number of days since 1900 and the fraction represents the date. For example, 39415.5 represents 2007-12-01 12:00:00

So, if

ReferenceDate= 2007-12-01 00:00:00

and

DateTimeEventHappened=2007-12-01 12:00:00

then

(DateTimeEventHappened – ReferenceDate)= 1900-01-01 12:00:00 which is held as 0.5

OK so this gives me the elapse time, but now do I get it into the cube? Well the answer is simple. Convert the DateTime into a decimal number. To do this you need to apply the relevant cast/convert such as:

CONVERT(float, (A.DateTimeEventHappened - A.ReferenceDate)) AS EventDateDiffDecimal

To import this data into the cube, I created a view that performed the relevant cast/converts.


Now, for illustration of what happens between SQL and SSAS, I have added some redundant columns to my view just to make it obvious what is really going on here. So here is my data, including the extra ReferenceDateDecimal and DateTimeEventHappenedDecimal columns.  I have also created some simple data which is shown below:

SQL Data

Next I simply added this new view to the cube's Data Source View and added the EventTimeDecimal and EventDateDiffDecimal plus the extra columns to my example cube as a new measure group which automatically sets the AggregateFunction to Sum. So here is the result:

Cube Measures

Now apply magic!

OK, so the above is not so useful, so let's apply some magic. Simply set the FormatString and AggregateFunction for each measure as follows:

Column

Format String

AggregateFunction

Reference Date Decimal

dd-mmm-yyyy

AverageOfChildren

Date Time Event Happened Decimal

dd-mmm-yyyy hh:mm

AverageOfChildren

Event Time Decimal

hh:mm

AverageOfChildren

Event Date Diff Decimal

dd-mmm-yyyy hh:mm

AverageOfChildren

So after re-deploying the cube, we get:

Formatted Cube Data 

 

Oh look, wrong dates!

Well we have dates, but not the correct ones! It seems that date zero in SQL Server is 01-Jan-1990 and date zero in SSAS is 30-Dec-1899! To fix the problem, we simply add two to the values provided by our EventElapsePeriod view. After processing the cube we now get the correct dates:

Cube dates plus 2

However, I want the EventDateDiffDecimal to be presented in the format T+1 15:30. So I need to subtract one from the EventDateDiffDecimal value so that I get 1-Jan as my date. Then I can format the value, ignoring the month and year using the FormatString "\T\+d hh:mm". Now purist among you may point out that this strategy will not work when the date gets bigger than 31 as it will flick over T+1 again. This is true, but I have defined EventDateDiffDecimal as a semi-additive measure so I will not hit this problem.

So here is the final formatted data:

CubeDateTPlus1

Note how the Grand Total is correctly calculated. Just magic!

If you would like to try this for yourself, I have published the T-SQL scripts and cube for download here.  Note that the ZIP file also contains the SQL script to re-create the sample table and view used in this article.

Good Luck!

Posted by DrJohn | 2 comment(s)
Filed under:

Jamie Thompson gave me some very helpful feedback on my recently published article over at SQLCentral.com called Cleaning Address Data with SSIS Using a Web Service and my previous blog entry on Calling SharePoint web services from the data flow. Jamie pointed out that I had assumed my audience would be fully familiar with the .NET framework and terms such as WSDL and Proxy Class need further explanation. So here is my attempt to clarify some issues he pointed out.

  • WSDL
    WSDL is short for Web Service Description Language and is the definition of what the web service provides and how it can be invoked. A WSDL document is an XML file which lists all the methods provided by the web service and their parameters. The WSDL document is returned by a web service when "?wsdl" is appended to the web service's URL (e.g. http://services.postcodeanywhere.co.uk/uk/lookup.asmx?wsdl).
  • Proxy Class
    A proxy class simplifies the interaction between your application and a web service. The .NET Framework provides a command-line utility called wsdl.exe which generates the code for a proxy class automatically. In Visual Studio, the Add Web Reference command automatically executes wsdl.exe and generated the proxy class for you. However, with SSIS you need to create a proxy class manually using a command-line such as this:
    e.g.
    wsdl /language:VB http://services.postcodeanywhere.co.uk/uk/lookup.asmx?wsdl /out:c:\PostcodeAnywhere.vb
  • GAC
    The Global Assembly Cache (GAC) is the central repository for sharing assemblies (DLLs). In order to register an assembly in the GAC, it must be strong named (i.e. must have a unique hash value, public key, locale and version number).

If you want to know more about the .NET Framework, then read O'Reilly's .NET Framework Essentials, 3rd Edition. 

 

Posted by DrJohn | with no comments
Filed under: , ,

Excel spreadsheets with complicated layouts really do not lend themselves to processing with the standard Microsoft OLE DB drivers. Indeed, extracting one or two cells from different worksheets can be positively painful! This is where the power and flexibility of Aspose.Cells comes into its own.

Aspose.Cells provides an API that looks very similar to the OLE automation API exposed by Excel in the Office InterOp assemblies. However, you do not get all the headaches normally associated with the InterOp assemblies (i.e. modal dialogs, memory leaks, multiple Excel instances etc. etc.). With Aspose.Cells you do not need Excel to be installed on your server. Instead you only need a single assembly (Aspose.Cells.dll ) and the accompanying license file (Aspose.Cells.lic). Simple!

Working with Aspose.Cells in SSIS

To work with Aspose.Cells, you need to use the Script Task or Script Component. My personal preference is to use the Script Component in its asynchronous transformation mode. This way I can feed in a list of Workbooks that I want the component to process on its input buffer and the component can write the data read from the Workbooks onto multiple output buffers (one buffer for each type of data).

As SSIS uses Visual Studio for Applications (VSA), you do not have the flexible "Add Reference" dialog found in Visual Studio. In order to get around this issue while you are developing, you need to copy the Aspose.Cells assembly and associated license file into your .NET folder, which is usually:

C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727

Restart Visual Studio and you will see Aspose.Cells listed on VSA's "Add Reference" dialog. Now you can add the appropriate "imports" statement to the top of your module and start coding. Easy!

Later, when you move to production, you will need to register the Aspose.Cells assembly in the GAC by simply dropping the assembly into the following folder:

C:\WINDOWS\assembly

Dealing with the Aspose.Cells License file

Aspose.Cells has a license file which can either be in the same directory as the assembly, or in your own location. The easiest way to handle this in SSIS is to use a File Connection Manager which records the location of the license file. Then in code, you will need to add:

Dim license As Aspose.Cells.License = New Aspose.Cells.License()

license.SetLicense(Me.Connections.AsposeCellsLicense.ConnectionString)

 

Example Code

My sample code reads a list of workbooks from the input buffer and writes the extracted data to the output buffer.  The data flow is shown below along with the code from my asynchronous Script Component. Enjoy!

Aspose.Cells Script Component in the Data Flow 

' Aspose.Cells.dll must be registered in the GAC for production

' AND the C:\WINNT\Microsoft.NET\Framework\v2.0.50727 for development

 

' standard libraries

Imports System

Imports System.Data

Imports System.Math

Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper

Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

' additional libraries

Imports System.IO

Imports System.Collections.Generic

Imports Aspose.Cells

 

Public Class ScriptMain

Inherits UserComponent

 

Public Overrides Sub WorkbookList_ProcessInputRow(ByVal Row As WorkbookListBuffer)

 

' Imports all data from the the Excel workbooks provided on the input

 

' Note that each workbook contains several worksheets.

' For efficiency, we open the workbook once and then

' process all the worksheets in the file

 

Try

' The previous step in the package downloaded the workbook.

Dim fInfo As FileInfo = New FileInfo(Path.Combine(Me.Variables.ImportFolder, Row.URL))

If fInfo.Exists Then

 

' Set Aspose.Cells license

Dim license As Aspose.Cells.License = New Aspose.Cells.License()

license.SetLicense(Me.Connections.AsposeCellsLicense.ConnectionString)

 

' Open the workbook

Dim wrkBook As Workbook = New Workbook()

wrkBook.Open(fInfo.FullName)

 

' process each worksheet

For Each wrkSheet As Worksheet In wrkBook.Worksheets

Try

' Check we have a valid worksheet by testing

' for text prompts in specific locations

If ValidWorksheetLayout(wrkSheet) Then

' process each worksheet found in the workbook

' that has the correct format

ImportWorksheetData(Row.WorkbookID, wrkSheet, Row)

End If

Catch exi As Exception

Me.ComponentMetaData.FireInformation(0, Me.ComponentMetaData.Name, "WorkbookList_ProcessInputRow: Invalid Worksheet Layout: " & wrkSheet.Name, String.Empty, 0, True)

End Try

Next

Else

Me.ComponentMetaData.FireError(0, Me.ComponentMetaData.Name, "WorkbookList_ProcessInputRow: Missing Excel File: " & fInfo.FullName, String.Empty, 0, True)

End If

Catch ex As Exception

Me.ComponentMetaData.FireError(0, Me.ComponentMetaData.Name, "WorkbookList_ProcessInputRow: " & ex.Message, String.Empty, 0, True)

End Try

End Sub

 

Private Function ValidWorksheetLayout(ByVal wrkSheet As Worksheet) As Boolean

' checks if data is in expected locations. If not, then return false.

If wrkSheet.Cells("B3").Value.ToString = "Profit and loss projections" And wrkSheet.Cells("B14").Value.ToString = "Revenue" Then

Return True

End If

Return False

End Function

 

Private Sub ImportWorksheetData(ByVal WorkbookID As Integer, ByVal wrkSheet As Worksheet, ByVal Row As WorkbookListBuffer)

' imports all data from the worksheet

Dim iRow As Integer

Dim iCol As Integer

Dim iCount As Integer

Dim dataRange As DataTable

Dim dr As DataTableReader

Dim HeaderRow As List(Of Integer)

Try

' heading row is in seperate location to data, so import heading first

' note that Aspose.Cells uses zero-based row and column numbers in ExportDataTable call

' so Cell=4,Row=12 is actually cell E13

Try

' read single header row into generic list

HeaderRow = New List(Of Integer)

dataRange = wrkSheet.Cells.ExportDataTable(12, 4, 1, 5)

dr = dataRange.CreateDataReader()

If dr.Read Then

For iCount = 0 To 4

HeaderRow.Add(GetInteger(dr, iCount))

Next

End If

 

Catch ex0 As Exception

Me.ComponentMetaData.FireError(0, Me.ComponentMetaData.Name, "ImportWorksheetData ex0: " & ex0.Message, String.Empty, 0, True)

End Try

 

Try

dataRange = wrkSheet.Cells.ExportDataTable(14, 2, 2, 7)

dr = dataRange.CreateDataReader()

While dr.Read

ProcessDataTable(WorkbookID, dr, 2, 6, HeaderRow)

End While

 

Catch ex1 As Exception

Me.ComponentMetaData.FireError(0, Me.ComponentMetaData.Name, "ImportWorksheetData ex1: " & ex1.Message, String.Empty, 0, True)

End Try

 

Try

dataRange = wrkSheet.Cells.ExportDataTable(19, 2, 2, 7)

dr = dataRange.CreateDataReader()

While dr.Read

ProcessDataTable(WorkbookID, dr, 2, 6, HeaderRow)

End While

Catch ex2 As Exception

Me.ComponentMetaData.FireError(0, Me.ComponentMetaData.Name, "ImportWorksheetData ex2: " & ex2.Message, String.Empty, 0, True)

End Try

 

Try

dataRange = wrkSheet.Cells.ExportDataTable(23, 2, 9, 7)

dr = dataRange.CreateDataReader()

While dr.Read

ProcessDataTable(WorkbookID, dr, 2, 6, HeaderRow)

End While

Catch ex3 As Exception

Me.ComponentMetaData.FireError(0, Me.ComponentMetaData.Name, "ImportWorksheetData ex3: " & ex3.Message, String.Empty, 0, True)

End Try

 

Catch exOuter As Exception

Me.ComponentMetaData.FireError(0, Me.ComponentMetaData.Name, "ImportWorksheetData: outer " & exOuter.Message, String.Empty, 0, True)

End Try

End Sub

 

Public Sub ProcessDataTable(ByVal WorkbookID As Integer, ByVal dr As DataTableReader, ByVal iStartDataCol As Integer, ByVal iDataColCount As Integer, ByVal HeaderRow As List(Of Integer))

Dim iCol As Integer

Dim DataValue As Decimal

Dim bAddRow As Boolean

Dim TimePeriodID As Integer

Dim OutputCount As Integer = 0

 

Try

For iCol = iStartDataCol To iDataColCount

bAddRow = False

Try

If Not dr.IsDBNull(iCol) Then

DataValue = GetDecimal(dr, iCol)

If DataValue <> 0 Then bAddRow = True

If bAddRow Then

With Me.WorksheetDataBuffer

.AddRow()

' get header from the row we read into a list earlier

.FinancialYear = HeaderRow(iCol - 2)

.WorkbookID = WorkbookID

.RowName = dr.GetString(0)

.DataValue = DataValue

End With

End If

End If

Catch ex1 As Exception

Me.ComponentMetaData.FireError(0, Me.ComponentMetaData.Name, "ProcessDataTable: Inner " & ex1.Message, String.Empty, 0, True)

End Try

Next

 

Catch ex As Exception

Me.ComponentMetaData.FireError(0, Me.ComponentMetaData.Name, "ProcessDataTable: Outer " & ex.Message, String.Empty, 0, True)

End Try

End Sub

 

Function GetDecimal(ByVal dr As DataTableReader, ByVal iCol As Integer) As Decimal

Dim DataValue As Decimal

Try

Select Case dr.GetDataTypeName(iCol)

Case "String"

If IsNumeric(dr.GetString(iCol)) Then

DataValue = CDec(dr.GetString(iCol))

End If

Case "Integer"

DataValue = CDec(dr.GetInt32(iCol))

Case "Double"

DataValue = CDec(dr.GetDouble(iCol))

Case "Decimal"

DataValue = dr.GetDecimal(iCol)

End Select

Catch ex As Exception

Me.ComponentMetaData.FireError(0, Me.ComponentMetaData.Name, "GetDecimal " & ex.Message, String.Empty, 0, True)

End Try

Return DataValue

End Function

 

Function GetInteger(ByVal dr As DataTableReader, ByVal iCol As Integer) As Integer

Dim DataValue As Integer

Try

Select Case dr.GetDataTypeName(iCol)

Case "String"

If IsNumeric(dr.GetString(iCol)) Then

DataValue = CInt(dr.GetString(iCol))

End If

Case "Integer"

DataValue = dr.GetInt32(iCol)

Case "Double"

DataValue = CInt(dr.GetDouble(iCol))

Case "Decimal"

DataValue = CInt(dr.GetDecimal(iCol))

End Select

Catch ex As Exception

Me.ComponentMetaData.FireError(0, Me.ComponentMetaData.Name, "GetInteger " & ex.Message, String.Empty, 0, True)

End Try

Return DataValue

End Function

End Class

Posted by DrJohn | with no comments
Filed under: ,

In my previous post, I outlined how to obtain a recursive list of all Excel files held in a SharePoint document library. In SSIS, I use this code to upsert records into a table which identifies which files are new or modified. Clearly I could process the file in-situ, but that would over complicate my code with data streams which SSIS would not understand – especially if you are attempting to use the standard Microsoft.Jet.OLEDB.4.0 engine to process the file! So downloading the file to a local folder is the easiest option. Fortunately, I spotted Greg Enslow's post which pointed me in the direction of using the WebClient library.

In my control flow I obtain a list all the files I need to process and store it in an ADO.NET recordset. I then iterate over this list using a ForEach task, which sets various variables containing the URL, file extension and the document's name and it's GUID in SharePoint. My script task then downloads each file so it can be processed by subsequent tasks in the control flow.

Control Flow 

I have expanded on Greg's original code to support file system locations as well as SharePoint URLs. Enjoy!

Imports System

Imports System.Data

Imports System.Math

Imports Microsoft.SqlServer.Dts.Runtime

Imports System.Net

Imports System.IO

 

Public Class ScriptMain

 

Public Sub Main()

' downloads the file from SharePoint or a file system location to a local folder

 

Dim taskResult As Integer = Dts.Results.Success

 

Try

' obtain location of local folder from variable

Dim dir As DirectoryInfo = New DirectoryInfo(Dts.Variables("ImportFolder").Value.ToString())

If dir.Exists Then

 

' Create the filename for local storage using

' the GUID from SharePoint as this will be unique.

Dim file As FileInfo =