Sparks from the Anvil

SSAS: Kerberos kills ‘large’ MDX queries on Windows Server 2008

Kerberos is nasty! I have had several encounters with Kerberos and none of them have been pleasant. I try to avoid Kerberos like try to avoid the dentist. However, some projects demand that Kerberos authentication be used to cater for the 'third hop' and then the pain begins.

The symptoms of my latest encounter were somewhat odd. Local MDX queries worked fine, but queries performed over the network failed with the message:

Executing the query ...
The connection either timed out or was lost.
Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host.
An existing connection was forcibly closed by the remote host
Execution complete

Of course the error message does not tell us anything useful. Just that someone does not like us today!

At first we thought there was a fault with the MDX query. But no, the MDX query ran fine against the local development server and also ran fine against the production server when run in SQL Server Management Studio in local mode (i.e. not across the network). But as soon as we ran the query across the network, we got the error.

Suspecting a hardware fault on the brand new server, the IT support team checked out the memory and changed the network cards. No change; the queries still failed.

Anyone familiar with SQL Server Analysis Services (SSAS) knows that it uses a Windows-based security model (i.e. the user must have Windows account in order to connect to the OLAP database). Unfortunately, this becomes a problem when the user's credentials need to be passed from machine to machine before finally arriving at the OLAP server. Basically the NTLM protocol does not support delegation across more than two hops.

To cater for the third 'hop' you have to co-opt the help of the 'dark side': i.e. Kerberos and SPNs. Servers have to be 'registered for delegation' in Active Directory and SPNs have to be set up to configure Kerberos. This is a black art and my colleague and I have the battle scars to prove it!

Our configuration was not unusual. Users connected to a web server running SharePoint 2007 and SQL Server Reporting Services (SSRS). SSRS then queries SQL Server Analysis Services (SSAS) database which resides on the backend 'data server'. The complication occurs when users connect over the internet; they are authenticated by Microsoft ISA server which passes their credentials on to the web server which in turn passes their credentials to SSRS which in turn passes their credentials on to SSAS.

Now, this infrastructure was working fine in the current SQL Server 2005 / Windows Serve 2003 implementation. However, the new SQL Server 2008 / Windows Server 2008 implementation was having trouble with a single hop, so we were in big trouble!

Suspecting a corrupt installation of SSAS, we did a re-installation of the whole SQL Server 2008 suite, but the MDX queries still failed. So next we did a fresh install of the Windows Server 2008 operating system along with SQL Server 2008, but the MDX queries still failed!

At this point we still believed that all MDX queries failed. But then I observed that the filters on the SSRS reports were been populated, whereas running the main report query generated the same ugly error message.

Been convinced my MDX was at fault, I started a trace of activity on the Analysis Services service which showed that small MDX queries worked fine (all the relevant log entries were present) whereas, large MDX queries failed and the only log entry was an "Audit Logout notification" event.

I started to play with the MDX and found that if I padded out a very simple query with comments it started to fail around the 1096 character mark i.e. simply adding /******/ until the whole statement was over 1kB caused the query to fail! Clearly something major was going wrong!

Still believing it to be a hardware or software implementation error, we proceeded to install SQL Server 2008 OLAP database on another identical box. No problem! MDX of any size worked fine. So it must be a hardware or software implementation fault????

After much scratching of his head, my good friend and colleague Eric Moutell eventually announced that the only difference between the two machines is that one had been set up with SPN entries to allow Kerberos authentication. So he deleted the entries, rebooted the world, rebooted the world again and finally any size MDX query worked on the original box. Ah ha! We were getting somewhere. So now we knew that nasty old Kerberos was having a laugh at our expense!

We got in contact with Microsoft support and after sending detailed information about the bug and several gigabytes of server logs, they eventually admitted we had found a bug in Windows Server 2008 which they have agreed to fix!

For those of you experiencing a similar problem, I reproduce the email from Microsoft support which you may find useful. In our environment, the server is running Windows Server 2008 64-bit with Microsoft SQL Server 2008 with the latest cumulative update patch CU3. The client machines we tried were either Windows Server 2008 64-bit or 32-bit running Microsoft SQL Server 2008 CU3.

From: Camino De Vicente Rodriguez
Sent: 16 March 2009 13:58
Subject: RE: Your case with Microsoft (SRZ090205000308)
Escalation engineers have confirmed that is a problem on 
Kerberos.dll related to encryption with AES on Windows 2008 and Vista.
They have reported to the Product Group and,
after analyzing it, they have accepted to fix it.
The estimated time for the fix is May 
(it is required two month cycle hotfix test pass).
I am sending you the Workarounds if you cannot wait until May:
If Kerberos authentication is a requirement, run Analysis Services 
on a Windows 2003 Server, since Windows 2003 Server is not AES aware.
Use Windows 2003 Server, Windows XP, or Windows 2000 Server 
to run client applications that will be connecting to the
Analysis Server configured for Kerberos authentication and running on Windows 2008.
Since these operating systems are not AES aware,
that will avoid use of AES for encryption/decryption.
Avoid configuring the Analysis Server to use Kerberos 
Authentication, since this will result in the AES
not being used for encryption/decryption
Add ";SSPI=NTLM" or ";Integrated Security=SSPI" to the connection string, 
which will force use of NTLM and avoid use of AES for
encryption/decryption
Best regards,
Camino de Vicente
EMEA GTSC Development Support Engineer
Posted by DrJohn with 12 comment(s)
Filed under: ,

SSAS writeback error was ‘an accident with a contraceptive and a time machine’

Those of you that heard the hit radio series 'The Hitchhikers Guide to the Galaxy' or read Douglas Adams' book will recognise the title of my blog entry as been the reason quoted by Zaphod when asked to explain why he was Zaphod Beeblebrox the First and his father was Zaphod Beeblebrox the Third. Apparently it was all down to 'an accident with a contraceptive and a time machine' whereby Zaphod Beeblebrox had travelled back in time and fathered his own ancestor.

In my case the 'time machine' was a source code control system that provided the ability to go 'back in time' and revert my SSAS project to a previous incarnation. My 'accident' was to combine some of the 'past' with the 'present' to end up with a creation of which Frankenstein would have been proud.

The key problem of using a source code control system with an Analysis Services project is that the file-based source code control model just does not fit in with the way SSAS works. For example, adding a security role to a cube not only creates a new role file but also fundamentally alters the content of both the .cube file and all the dimension files for which you define attribute security. So when you come to check-in the project, you find that two or three files have changed when you only expected to alter one! So when it comes to reverting to a previous version you really have to take all the files from a specific date, not just one or two files.

The horror of my own Frankenstein creation did not get recognised until several days after its birth. When testing part of the application's functionality that used cube writeback we got the obscure message

Errors in the high-level relational engine.
Only one-to-many relationships are supported for parent-child dimensions.
The '' table contains many-to-many relationships between parent and child columns.

As you see, the table name was missing from the error message, so we were bemused as to which table was at fault. If the rest of the error message were to be believed, the data was incorrect.  Well we checked the data and it was fine. Moreover the cube did not complain when processing the data.  We deleted and recreated the writeback tables, so they are not at fault. When we disabled writeback, the error message disappeared.  

Well, there is nothing more frustrating or time-consuming than a misleading error message and a bunch of red-herrings. We wasted many, many hours trying to figure out the cause of the error. A search the forums revealed nothing so I posted my own entry – to which no one replied. Eventually the issue was raised with Microsoft. After much teeth gnashing, they found the source of the problem: lack of a primary key on a table in the DSV. Well the primary key was certainly there at some point as the writeback had worked in the past. However, because I had used source control to 'travel back in time' and revert some files to an earlier version, I had clearly got my revisions well and truly mixed up. My 'contraceptive' had not worked and Frankenstein's monster was duly born.

So beware all of you who play with 'contraceptives' and 'time-machines'!

Posted by DrJohn with 1 comment(s)
Filed under:

MDX + T-SQL: Combining relational and multi-dimensional data into one query result set

When it comes to system integration, Microsoft provides such a plethora of options it is far too easy to forget about some of them and/or think about their practical application. This is particularly true of the SQL Server BI stack which is so huge in its scope.

Our need was to populate a relational database using the results of an MDX query so we could do some data reconciliation to prove the new cube provided the results we wanted. Given my skills in C#, my immediate thought was to write a simple console application which performed an MDX query and populate a database table using SQL Bulk Insert. However, this option would not be too easy the rest of the team to support as they do not have the relevant C# skills or experience of ADOMD.NET.

Vincent Rainardi (author of Building a Data Warehouse: with examples in SQL Server) pointed me the direction of OpenQuery and linked servers. This is a remarkably simple and flexible solution to a great many system integration problems, so I thought I had better share it with a wider audience.

OpenQuery and OpenRowSet both execute pass-through queries against a specific server. The 'query string' to be executed is not evaluated by the local server but simply 'passed through' to the target server which is expected to return a row set. The 'query string' can be anything, including MDX, so long as it is understood by the target server. Well this works a treat with MDX and Analysis Services.

To use this technique, you will first need to create a linked SSAS server using either SQL Server Management Studio or in script using the stored procedure master.dbo.sp_addlinkedserver. Here is an example:

Once linked, you can then query your cube using MDX and combine the results with the content of a SQL Server table. Alternatively you can simply insert the results of the MDX query into a database table. For example, the following screenshot shows the MDX executed in SQL Server Management Studio:

Whereas the following screenshot shows the results of the equivalent OpenQuery:

 

Note now the column names contain the unique name of the attribute. Before showing how to deal with these, let's just look one surprising element of the functionality offered by this technique.

If you use a hierarchy in your query, then you will get extra columns describing the values of each hierarchy level. For example, if we change the above query to use the [Customer].[Customer Geography] hierarchy then you get an extra column in the query results describing the [Customer].[Customer Geography].[Country] level.

This is even more dramatic when using a parent-child hierarchy; a query against the Accounts dimension will bring back all six levels.

Dealing with Attribute Unique Names

Of course, you may find the attribute unique name a little long and cumbersome to use in your T-SQL. The easiest way to remove them is to use an alias. However, you still need to type that lengthy name in the first place.

The easy way around this is to copy the query results into Excel and then obtain the column header text from there. However, by default, the copying of column headers is switched off, so you may want to switch on the "Include column headers when copying or saving results" option in SQL Server Management Studio which can be found under Tools->Options->Query Results->Results to Grid. Here is a screenshot to make it easier:

Using an Alias 

As you will know, the square bracket in T-SQL is used as a delimiter in much the same way as it is in MDX. So simply typing:

[Customer].[Customer Geography].[Country].[MEMBER_CAPTION] AS Country

will cause a "multi-part identifier could not be bound" error. To avoid this, use double quotes around the attribute unique name as follows:

"[Customer].[Customer Geography].[Country].[MEMBER_CAPTION]" AS Country,

Numerical Advantage

The problem with OpenQuery is that every column has a data type of string, including the numerical columns. This means that the data type must be changed if you want to perform numerical computations on the figures using T-SQL. So our two numerical columns need to be wrapped in CAST & CONVERT as follows:

CAST(CONVERT(float, "[Measures].[Internet Gross Profit]") AS money) AS GrossProfit

Combining multi-dimensional and relational data together

In the case where you are creating a management report that combines the profitability figures of each division with the relevant manager's comments as to why the figure have (or have not) been achieved, then the use of OpenQuery becomes invaluable. This is because the only way to handle such a situation in the cube would be to create another dimension which links at the relevant level of granularity and enable cube write-back so the manager can update dimensional properties. A lot of hassle for a few pieces of textual information!

With OpenQuery, this becomes trivial as the text can be held in a supplementary database table and combined with the results of the MDX query.

Now the problem you may have spotted is that OpenQuery returns the MEMBER_CAPTION which will probably not be sufficiently unique to match an entry in the relational database. What you really need is the MEMBER_KEY. This can be added as a column using a calculated member as follows:

WITH MEMBER [Measures].[OrganisationID] AS [Organization].[Organizations].MEMBER_KEY

As the Adventure Works database does not contain a 'comments' table, let's create one and insert some rows using the new INSERT feature in SQL Server 2008.

CREATE TABLE dbo.ManagersComments(
    OrganisationID int NOT NULL,
    [Manager's Name] nvarchar(50) NULL,
    [Manager's Comments] nvarchar(255) NULL
);    
INSERT INTO ManagersComments(OrganisationID, [Manager's Name], [Manager's Comments])
    VALUES (6, 'John Doe', 'My team failed again. You know how it is...'),
                (5, 'John Smith', 'My team are all stars.');

So let's bring MDX and SQL data together into one query. This is best achieved with a common table expression as the layout can be far more readable:

WITH MdxQuery
(    
    [Region],
    OrganisationID,
    [Net Sales],
    [Cost of Sales],
    [Gross Margin],
    [Gross Margin%]
)
AS
(
    SELECT
        "[Organization].[Organizations].[Organization Level 04].[MEMBER_CAPTION]" AS [Region],
        CONVERT(int, "[Measures].[OrganisationID]") AS OrganisationID,
        CAST(CONVERT(float, "[Measures].[Net Sales]") AS money) AS [Net Sales],
        --"[Measures].[Net Sales]" AS [Net Sales],
        CAST(CONVERT(float, "[Measures].[Cost of Sales]") AS money) AS [Cost of Sales],
        CAST(CONVERT(float, "[Measures].[Gross Margin]") AS money) AS [Gross Margin],
        CONVERT(decimal(4, 2), (CONVERT(decimal(19, 17), "[Measures].[Gross Margin%]") * 100)) AS [Gross Margin%]
    FROM OPENQUERY(AdvertureWorksServer,
        'WITH MEMBER [Measures].[OrganisationID] AS
            [Organization].[Organizations].MEMBER_KEY
        MEMBER [Measures].[Net Sales] AS
            ([Account].[Accounts].&[50], [Measures].[Amount])
        MEMBER [Measures].[Cost of Sales] AS
            ([Account].[Accounts].&[55], [Measures].[Amount])
        MEMBER [Measures].[Gross Margin] AS
            ([Account].[Accounts].&[49], [Measures].[Amount])
        MEMBER [Measures].[Gross Margin%] AS
            [Measures].[Gross Margin] / [Measures].[Net Sales]
        SELECT
        {
            [Measures].[OrganisationID], [Measures].[Net Sales], [Measures].[Cost of Sales],
            [Measures].[Gross Margin], [Measures].[Gross Margin%]
        } ON 0,
        {
            [Organization].[Organizations].&[14].Children
        }
        ON 1
        FROM [Finance]
        WHERE ([Date].[Fiscal].[Fiscal Quarter].&[2004]&[1])'
    )
),
CombineMdxWithTable (    
    [Region],
    [Manager's Name],
    [Net Sales],
    [Cost of Sales],
    [Gross Margin],
    [Gross Margin%],
    [Manager's Comments]
)
AS
(
    SELECT
        A.[Region],
        B.[Manager's Name],
        A.[Net Sales],
        A.[Cost of Sales],
        A.[Gross Margin],
        A.[Gross Margin%],
        B.[Manager's Comments]
FROM MdxQuery A
    LEFT JOIN dbo.ManagersComments B ON
        A.OrganisationID = B.OrganisationID
)
SELECT * FROM CombineMdxWithTable

 

Here are the results of our query:

The Big Surprise - Support for more than two axes!

What I found most surprising about the functionality offered by OpenQuery is that it will deal with extra dimensions/axes sensibly.  Of course any multi-dimensional query can always be transformed into a two-dimensional recordset; each cell simply becomes a row in the recordset, with one column for each dimension. So for example, a four dimensional MDX query returning a two measures can be easily transformed into a two dimensional recordset containing six columns. Each row represents a cell and has two numerical columns.

Of course, if you try adding more than two axes to an MDX query in SQL Server Management Studio, this is the resulting error you will see:

Results cannot be displayed for cellsets with more than two axes.

Given that OpenQuery does such a good job of flattening the resultset into two dimensions, SQL Server Management Studio could really make more of an effort!

In reality, similar results are returned by a two-dimensional MDX query where the rows axes is a CrossJoin of all the relevant dimensions. It also has the advantage that the query would be able to take advantage of the server's AutoExists performance optimization and therefore fewer rows will be returned. However, it does not sound quite so sexy as a true multi-dimensional query. Ah, well. Technology cannot always be sexy!

Posted by DrJohn with 9 comment(s)
Filed under: ,

Goodbye Vista: Installing Windows Server 2008 on a Dell XPS laptop

Vista was driving me nuts! I could not believe how slow it was. The hard disk was always thrashing, the machine ran hot and the battery life was pants. I had spent hours optimizing my environment; first deleting all the 'bloat ware' installed by Dell, then stopping every service I did not recognise and finally removing every entry I could from the start-up registry entries (i.e. HKEY_CURRENT_USER and HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Run).

The biggest culprit for wasting system resources was Vista's SuperFetch. Which idiot thought this up?? The service reads all the files on the computer and 'analyses' what you have done before and predicts what you may need in the immediate future and then loads the application and data into memory so you get fantastic performance if, and this is a big 'if', you happen to ask for the application or data that it thinks you want. In reality, the performance of the whole PC suffers as this silly service just wastes your PC's resources doing things that you do not need. Ahhhhh! I wish I had a shotgun and a time machine to get that S.O.A.B.!

My laptop is no slouch either. A Dell XPS M1530 laptop with Intel Core 2 Duo 2.00GHz, 4GB RAM, 250GB 5,600RPM hard drive and a 256MB NVIDIA GeForce 8600M GT graphics card driving a 1680x1050 pixel TrueLife TFT. So I was particularly annoyed by the lack of performance provided by Vista. The laptop was rated as 4.9 out of 5 in Vista's own performance benchmark; 5 been the best machines that were around when they were developing Vista. Clearly they were benchmarking how much system resource they could waste rather than focus on end-user performance!

As I develop server-based applications on the SQL Server 2005 BI suite, I had installed VMWare Workstation so I could run multiple virtual machines – one for each of my client's server configurations. I find VMWare much more flexible than Virtual PC, but then again, it's not free! What seemed silly is that the Windows Server 2003 instances running inside the virtual machine seemed much faster than Vista itself, especially when copying files across the network!

When I heard the news that SQL Server 2008 had finally shipped, I decided that it was time to do something radical: upgrade to Windows Server 2008 as the native OS and install SQL Server 2008! As a consultant on the road, I cannot afford to be without my machine for too long, so I decided the safest option would be to buy another hard drive and take the Vista disk out and replace it with a shiny new 7,200RPM hard drive. Well performance is everything!

Before taking the Vista hard drive out, I created backups onto my NAS drive and also copied onto a memory stick all the device drivers that Dell had kindly created in the c:\drivers folder (after checking they were the latest versions of course!).

The installation of Windows Server 2008 on my laptop went like a dream. Every time the OS found a new device, I simply pointed it at the root of the memory stick and, bingo, it installed without a glitch. The only devices that failed were the ones that Dell had forgotten to place in the c:\drivers folder or for my own printer/scanner. These were soon downloaded from Dell and Canon's web sites.

Why did it go so smoothly? Well Windows Server 2008 shares the same kernel as Vista and the device driver model is the same. So a Vista device driver will work with Windows Server 2008 and visa versa. However, be aware that this will only work with the 32-bit version of Windows Server 2008 as very few device drivers are available for 64-bit Windows.

A very Desktop Experience

Of course Vista does have some charms and I soon came to miss certain applications like Windows Photo Gallery and Windows Media Player. I tried finding them on the Microsoft Download site but to no avail. Windows Media Player is built into Vista and therefore not provided as a separate download. I searched the internet for clues as to where I could get them from, but to no avail. Several days went by before I found www.win2008workstation.com which has many tips on how to turn Windows Server 2008 into a workstation. This includes how to turn on something called the Desktop Experience. I followed the procedure and my desired applications magically appeared. However, some undesirable applications also popped up like Windows Defender (what junk!), Windows Mail, Windows Contacts and Windows Calendar. Unfortunately, you get all or nothing. There is no fine control over which of the Vista applications get installed. At least you can disable Windows Defender by disabling the related service.

Virtual CD-ROM

Eager to install SQL Server 2008 from a freshly downloaded ISO image, I started scouting for a way to load an ISO images as a virtual drive. I figured there must be a way to load an ISO image built into the OS itself. However, I could not find the right command-line, but I am sure there is someone out there that knows how to do it! So I tried my old favourite Virtual CD-ROM Control Panel which is a free download originally developed by Microsoft but now provided as an unsupported utility. To my surprise this worked a treat. Amazing since it was last updated in 2006!

Burning Desire

Over the years I have been a great lover of Nero Burning ROM. The main application had a simple but powerful user interface. However, the Nero package has grown out of all recognition over the years and offers so many utilities to do things I never imagined I needed (i.e. bloat ware) that I refuse to install it on my machine. Unfortunately the original Nero Burning ROM cannot be purchased separately, so I needed an alternative. After a perusal of Wikipedia, I found some freeware called ImgBurn. The UI is almost identical to Nero Burning ROM and seems to work well.

Well Connected

One slightly problematic device I had was my 3 USB mobile internet modem. This worked fine on first installation then stopped working for no apparent reason. I figured out it may be something to do with the Windows Firewall which, if I remember correctly, first appeared when I switched on the Desktop Experience. However, after tinkering with the firewall settings the device would still not work. I tried installing the latest device drivers, but to no avail. I uninstalled the device software and decided to ring 3's support line, confident in the knowledge that they would be no help at all; after all how many customers have mobile internet installed on a server?? Of course, several days passed before I resolved to get the problem fixed. Just prior to the call I thought I would prepare by reinstalling the device so the details of the problem would be fresh in my mind. Well blow me, the damn thing started working!

Mouse Problems

The one device that I have real problems with is my Logitech V470 Cordless Laser Mouse for Bluetooth. I can add it as a Bluetooth device, but it will not actually work. If I install the latest SetPoint software from Logitech, the mouse pointer jumps all over the place like a demented wasp. Ideas anyone?

My tool bag

I often find it useful to know what other consultants in the field carry around in their 'tool bag'. Well, here is my set of applications currently installed on the laptop:

  • Microsoft Windows Server 2008 including IIS 7.0
  • Microsoft SQL Server 2008 Developer Edition including SSAS, SSRS and SSIS and the new Adventure Works 2008 database samples.
  • Microsoft Visual Studio 2008
  • MSDN Library
  • Microsoft Office 2007 with "Save as PDF" enhancement
  • Microsoft Data Mining Add-ins for Excel 2007
  • UltraEdit – fantastic text and code editor I have been using for many, many years
  • Password Safe – I use this freeware to hold the username/password to over 100 web sites and bank accounts. They are all safe inside a heavily encrypted file which can be safely backed up to my NAS drive or memory stick without any danger of someone else been able to open the file.
  • PowerMarks – the easy way to move your internet favourites from machine to machine
  • Adobe Acrobat Reader – of course!
  • Primo PDF – freeware alternative to Adobe Acrobat Distiller for creating PDF files
  • Paint.NET – what a great free alternative to Adobe Photoshop Elements
  • Ultratagger – easy to use MP3 tagging software
  • FileZilla – great freeware FTP application
  • WinZip – the de facto standard
  • Apple iTunes for my iPod and (future) iPhone
  • ImgBurn DVD/CD-ROM burning software
  • 3 USB mobile internet

So if you are thinking of making the move from Vista to Windows Server 2008, do it now! The stability, performance, flexibility and power of Windows Server 2008 can be yours.

Good luck!

 

Updated: 9-Apr-2009

Going 64-bit

If you are thinking of making the switch to 64-bit, then do it!  Originally, I only had the option of 32-bit drivers from Dell.  So as soon as I found out that 64-bit drivers were available, I did the upgrade.  Everything went even more smoothly than the original conversion, so I was very impressed.  See comment below for more info.

Posted by DrJohn with 4 comment(s)

Busting for a Wii at SQL Bits

"I'm busting for a Wii!" was my wife's exclamation in the centre of Warrington. We had been an attempting to avoid traffic on the M6 but got snarled up in even more traffic in central Warrington. I did not understand at first, but then I saw the cheeky kid's grin on the advert on the back of a bus. In good northern dialect, he was not just bursting, but "busting" with excitement about the latest Nintendo games machine. What surprised me more was what followed. My wife gave me a 30 minute rant questioning exactly why I had not yet brought her a games machine. Of course my teenage daughter joined in the fray whilst my teenage son remained quite. Games machines have long been the domain of spotty teenage boys or young men with time on their hands and few friends. They are certainly not the domain of middle age females with a technology adversion! So I was rather taken aback by the whole situation.

However, this weekend the SQL Bits conferance delivered in more ways than I could possibly imagine. The event was well organised and well attended and all the lectures on the "track" I was on were well presented.

Andrew Sadler (formerly of ProClarity and now a consultant with Edenbrook) did a breakneck demonstration of the Microsoft BI stack including PerformancePoint and Excel Services. As I work with this stuff every day, I did not learn anything particularly new; it is just great to see someone else's take of the Microsoft offerings and exactly how they present it all as a 'coherent' strategy. There was clearly pent up frustration in the audience that the ProClarity product had been left on the shelf by Microsoft. A great tool been suffocated by the giant. I just hope that PerformancePoint v2 (whenever it surfaces) actually provides functionality as good, or better, than ProClarity's.

Simon Sabin did a great job of introducing some of the new features in the T-SQL language. To be honest, the audience got most animated when they realised Simon had simply highlighted the name of a table and hit some key and SQL Server Management Studio performed a query that listed not only the table definition but primary keys etc. He then showed us all how to configure keyboard shortcuts in Tools->Options->Environment->Keyboard. It seems that any single parameter stored proc (including your own) can be mapped to a keyboard shortcut and it will be passed the highlighted text when invoked. Simple, effective and one of the best things I learnt out of the session! Not that I am decrying Simon's excellent session, but something simple like a keyboard shortcut can save a lot of time in the long run, and these things are invaluable.

Allan Mitchell performed a great demo of the Data Mining Add-In for Excel. This is a tool that I always keep meaning to use on a project, but somehow never get around to getting it out of the bag and dusting it off. The new version of the DM Add-In for Excel 2007 demos really well. I will have to make a determined effort to do something with this technology soon!

It was great to meet Chris Webb for the first time. Having read the blogs and the book, it was good to get to know the guy in person. I will be using Intelligencia Query for BIDS and Reporting Services in my next project jointly developed by Chris and Andrew Wiles who also presented his 'reusable' approach to BI the day. Colin Hardie of the IMGroup ably demonstrated the new MERGE feature in T-SQL as well as several other approaches to handling Type 2 slowly changing dimensions.

It was great to meet old friends, put faces to people I had only previously met in 'cyberspace' and meet some new contacts. Microsoft provided free drinks after the gig and, even better, I won a Wii in the RedGate prize draw. Even my wife agrees that it was a good conference!

Posted by DrJohn with no comments
Filed under:

SSAS Stored Procedures - connecting to the transactional data source

There are often circumstances where an SSAS stored procedure needs to connect to the transactional database to perform a query.  This is certainly the case in a near-real-time OLAP solution where the cube sits directly on top of the application database and dynamic dimension security is implemented by a stored proc. Here the SSAS stored procedure has to query the transactional database in order to get the user's authorisation. Clearly the SSAS database has a data source and it uses this connection to process the cube. So how do we get the connection string?

Well the short code snippet below provides the solution. The code simply connects back to the current database using AMO and obtains the connection string from the data source object. It then removes the redundant "Provider=SQLNCLI.1;" before returning a valid SQL connection string that can be used by SqlConnection().

Of course, you must provide your stored proc assembly with the "Unrestricted" permission set and set its impersonation mode to "Service Account" for the connection string to work.

using AMO = Microsoft.AnalysisServices;

using Microsoft.AnalysisServices.AdomdServer;

...

/// <summary>

/// Obtains the SqlConnection string used by the OLAP database

/// </summary>

/// <returns></returns>

public static string GetSqlConnectionString()

{

    // connect to the current instance of Analysis Services and return the transactional data source

    AMO.Server svr = new AMO.Server();

    svr.Connect("Data Source=" + Context.CurrentServerID + ";Initial Catalog=" + Context.CurrentDatabaseName);

    // get the connection to the transactional database from the OLAP connection manager

    string connStr = svr.Databases.GetByName(Context.CurrentDatabaseName).DataSources[0].ConnectionString.Replace("Provider=SQLNCLI.1;", "");

    svr.Disconnect();

    return (connStr);

}

Posted by DrJohn with 1 comment(s)
Filed under: ,

The FACT is that it is not so DIM after all

I have always looked down upon the Fact and Dim prefixes applied to the tables within the AdventureWorksDW database with derision. "Why," I asked myself, "would anyone feel the need to use such prefixes? Clearly this is BI for dummies." Given that most BI projects use a nice clean, dedicated data warehouse database where all the tables are imported into the cube, perhaps my derision was justified.

However, in my current project I have been building a near-real-time OLAP solution directly on top of an application database. Of course, to isolate the cube from potential schema changes, I created an abstraction layer in the form of a set of views. However, as the project has progressed and the large team of developers has created more and more database objects, it has become increasingly difficult to find the views created for the cube among the debris.

I recently had to re-build the entire cube from scratch due to a bug caused by reverting to a previous version in source control. Of course, I took the opportunity to consolidate all my T-SQL code into a few files and adopt a consistent naming convention for all my views. So what naming convention did I adopt? Well Fact and Dim of course! Why? Well because anyone following on from my work who is half-familiar with the BI sample databases will immediately recognise the relevance.

I subsequently found out the big benefit of adopting this convention – it makes creating a DSV much, much easier! A quick search for all objects containing 'Fact' and 'Dim' soon had my new DSV populated with all the relevant objects.

So the Fact is that this practice is not so Dim after all!

Posted by DrJohn with 2 comment(s)
Filed under:

SSAS: Working with date and time measures to calculate average elapse time

There are some things in the life that are so implicit and taken for granted that nobody ever bothers to tell you about them or write up any documentation. This is how I feel the implicit support for date and time measures must have come about in Microsoft SQL Server 2005 Analysis Services (SSAS). Certainly I could not find any documentation about the topic and yet intriguingly both Measures and Calculated Members have "Simple Date" and "Simple Time" in their drop down list of possible text formats.

Clearly SSAS will allow you to add attributes to a dimension that have the data type DateTime. However, SSAS does not allow you to create a measure based on DateTime data type. Now this is a bit of a drawback when you want to calculate the average time a process took, which is exactly what I needed to do. After searching the internet, I was none the wiser as to how to solve the problem so some experimentation was in order.

An obvious approach would be to create a measure which contains the elapse time for the process in minutes. This would certainly give the right answer when averaged. However, presenting the resulting number in a format that was meaningful to the end-user is beyond standard out-of-the-box functionality. Especially when the elapse time needs to be presented in financial speak such as T+1 15:30 (i.e. the next day at 3:30pm). And since the end-user would be browsing the cube with Excel 2007, standard out-of-the-box functionality is all I could use.

My data was pretty simple:

CREATE TABLE dbo.WhenEventHappened(

SurrogateKey int NOT NULL,

MeasurementTypeID int NOT NULL,

ReferenceDate datetime NOT NULL,

DateTimeEventHappened datetime NOT NULL

) ON [PRIMARY]

 

Where ReferenceDate holds the date the event is relative to and DateTimeEventHappened clearly holds the date & time when the event took place. So for example, if we use the financial services analogy the trade took place on the ReferenceDate and it was settled on DateTimeEventHappened. So the elapse time is (DateTimeEventHappened – ReferenceDate) in days, hours and minutes. In my data, ReferenceDate is always a date with no time element.

The SQL Server DateTime data type

You probably know that behind the scenes SQL Server holds dates as a decimal number. The whole number contains the number of days since 1900 and the fraction represents the date. For example, 39415.5 represents 2007-12-01 12:00:00

So, if

ReferenceDate= 2007-12-01 00:00:00

and

DateTimeEventHappened=2007-12-01 12:00:00

then

(DateTimeEventHappened – ReferenceDate)= 1900-01-01 12:00:00 which is held as 0.5

OK so this gives me the elapse time, but now do I get it into the cube? Well the answer is simple. Convert the DateTime into a decimal number. To do this you need to apply the relevant cast/convert such as:

CONVERT(float, (A.DateTimeEventHappened - A.ReferenceDate)) AS EventDateDiffDecimal

To import this data into the cube, I created a view that performed the relevant cast/converts.


Now, for illustration of what happens between SQL and SSAS, I have added some redundant columns to my view just to make it obvious what is really going on here. So here is my data, including the extra ReferenceDateDecimal and DateTimeEventHappenedDecimal columns.  I have also created some simple data which is shown below:


Next I simply added this new view to the cube's Data Source View and added the EventTimeDecimal and EventDateDiffDecimal plus the extra columns to my example cube as a new measure group which automatically sets the AggregateFunction to Sum. So here is the result:


Now apply magic!

OK, so the above is not so useful, so let's apply some magic. Simply set the FormatString and AggregateFunction for each measure as follows:

Column

Format String

AggregateFunction

Reference Date Decimal

dd-mmm-yyyy

AverageOfChildren

Date Time Event Happened Decimal

dd-mmm-yyyy hh:mm

AverageOfChildren

Event Time Decimal

hh:mm

AverageOfChildren

Event Date Diff Decimal

dd-mmm-yyyy hh:mm

AverageOfChildren

So after re-deploying the cube, we get:


 

 

Oh look, wrong dates!

Well we have dates, but not the correct ones! It seems that date zero in SQL Server is 01-Jan-1990 and date zero in SSAS is 30-Dec-1899! To fix the problem, we simply add two to the values provided by our EventElapsePeriod view. After processing the cube we now get the correct dates:


However, I want the EventDateDiffDecimal to be presented in the format T+1 15:30. So I need to subtract one from the EventDateDiffDecimal value so that I get 1-Jan as my date. Then I can format the value, ignoring the month and year using the FormatString "\T\+d hh:mm". Now purist among you may point out that this strategy will not work when the date gets bigger than 31 as it will flick over T+1 again. This is true, but I have defined EventDateDiffDecimal as a semi-additive measure so I will not hit this problem.

So here is the final formatted data:


Note how the Grand Total is correctly calculated. Just magic!

If you would like to try this for yourself, I have published the T-SQL scripts and cube for download here: DateTimeMeasures.zip  Note that the ZIP file also contains the SQL script to re-create the sample table and view used in this article.

Good Luck!

Posted by DrJohn with 5 comment(s)
Filed under:

Cleaning Address Data with SSIS using a Web Service - Explained

Jamie Thompson gave me some very helpful feedback on my recently published article over at SQLCentral.com called Cleaning Address Data with SSIS Using a Web Service and my previous blog entry on Calling SharePoint web services from the data flow. Jamie pointed out that I had assumed my audience would be fully familiar with the .NET framework and terms such as WSDL and Proxy Class need further explanation. So here is my attempt to clarify some issues he pointed out.

  • WSDL
    WSDL is short for Web Service Description Language and is the definition of what the web service provides and how it can be invoked. A WSDL document is an XML file which lists all the methods provided by the web service and their parameters. The WSDL document is returned by a web service when "?wsdl" is appended to the web service's URL (e.g. http://services.postcodeanywhere.co.uk/uk/lookup.asmx?wsdl).
  • Proxy Class
    A proxy class simplifies the interaction between your application and a web service. The .NET Framework provides a command-line utility called wsdl.exe which generates the code for a proxy class automatically. In Visual Studio, the Add Web Reference command automatically executes wsdl.exe and generated the proxy class for you. However, with SSIS you need to create a proxy class manually using a command-line such as this:
    e.g.
    wsdl /language:VB http://services.postcodeanywhere.co.uk/uk/lookup.asmx?wsdl /out:c:\PostcodeAnywhere.vb
  • GAC
    The Global Assembly Cache (GAC) is the central repository for sharing assemblies (DLLs). In order to register an assembly in the GAC, it must be strong named (i.e. must have a unique hash value, public key, locale and version number).

If you want to know more about the .NET Framework, then read O'Reilly's .NET Framework Essentials, 3rd Edition. 

 

Posted by DrJohn with 1 comment(s)
Filed under: , ,

SSIS: Getting data from Excel files using Aspose.Cells

Excel spreadsheets with complicated layouts really do not lend themselves to processing with the standard Microsoft OLE DB drivers. Indeed, extracting one or two cells from different worksheets can be positively painful! This is where the power and flexibility of Aspose.Cells comes into its own.

Aspose.Cells provides an API that looks very similar to the OLE automation API exposed by Excel in the Office InterOp assemblies. However, you do not get all the headaches normally associated with the InterOp assemblies (i.e. modal dialogs, memory leaks, multiple Excel instances etc. etc.). With Aspose.Cells you do not need Excel to be installed on your server. Instead you only need a single assembly (Aspose.Cells.dll ) and the accompanying license file (Aspose.Cells.lic). Simple!

Working with Aspose.Cells in SSIS

To work with Aspose.Cells, you need to use the Script Task or Script Component. My personal preference is to use the Script Component in its asynchronous transformation mode. This way I can feed in a list of Workbooks that I want the component to process on its input buffer and the component can write the data read from the Workbooks onto multiple output buffers (one buffer for each type of data).

As SSIS uses Visual Studio for Applications (VSA), you do not have the flexible "Add Reference" dialog found in Visual Studio. In order to get around this issue while you are developing, you need to copy the Aspose.Cells assembly and associated license file into your .NET folder, which is usually:

C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727

Restart Visual Studio and you will see Aspose.Cells listed on VSA's "Add Reference" dialog. Now you can add the appropriate "imports" statement to the top of your module and start coding. Easy!

Later, when you move to production, you will need to register the Aspose.Cells assembly in the GAC by simply dropping the assembly into the following folder:

C:\WINDOWS\assembly

Dealing with the Aspose.Cells License file

Aspose.Cells has a license file which can either be in the same directory as the assembly, or in your own location. The easiest way to handle this in SSIS is to use a File Connection Manager which records the location of the license file. Then in code, you will need to add:

Dim license As Aspose.Cells.License = New Aspose.Cells.License()

license.SetLicense(Me.Connections.AsposeCellsLicense.ConnectionString)

 

Example Code

My sample code reads a list of workbooks from the input buffer and writes the extracted data to the output buffer.  The data flow is shown below along with the code from my asynchronous Script Component. Enjoy!


 

' Aspose.Cells.dll must be registered in the GAC for production

' AND the C:\WINNT\Microsoft.NET\Framework\v2.0.50727 for development

 

' standard libraries

Imports System

Imports System.Data

Imports System.Math

Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper

Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

' additional libraries

Imports System.IO

Imports System.Collections.Generic

Imports Aspose.Cells

 

Public Class ScriptMain

Inherits UserComponent

 

Public Overrides Sub WorkbookList_ProcessInputRow(ByVal Row As WorkbookListBuffer)

 

' Imports all data from the the Excel workbooks provided on the input

 

' Note that each workbook contains several worksheets.

' For efficiency, we open the workbook once and then

' process all the worksheets in the file

 

Try

' The previous step in the package downloaded the workbook.

Dim fInfo As FileInfo = New FileInfo(Path.Combine(Me.Variables.ImportFolder, Row.URL))

If fInfo.Exists Then

 

' Set Aspose.Cells license

Dim license As Aspose.Cells.License = New Aspose.Cells.License()

license.SetLicense(Me.Connections.AsposeCellsLicense.ConnectionString)

 

' Open the workbook

Dim wrkBook As Workbook = New Workbook()

wrkBook.Open(fInfo.FullName)

 

' process each worksheet

For Each wrkSheet As Worksheet In wrkBook.Worksheets

Try

' Check we have a valid worksheet by testing

' for text prompts in specific locations

If ValidWorksheetLayout(wrkSheet) Then

' process each worksheet found in the workbook

' that has the correct format

ImportWorksheetData(Row.WorkbookID, wrkSheet, Row)

End If

Catch exi As Exception

Me.ComponentMetaData.FireInformation(0, Me.ComponentMetaData.Name, "WorkbookList_ProcessInputRow: Invalid Worksheet Layout: " & wrkSheet.Name, String.Empty, 0, True)

End Try

Next

Else

Me.ComponentMetaData.FireError(0, Me.ComponentMetaData.Name, "WorkbookList_ProcessInputRow: Missing Excel File: " & fInfo.FullName, String.Empty, 0, True)

End If

Catch ex As Exception

Me.ComponentMetaData.FireError(0, Me.ComponentMetaData.Name, "WorkbookList_ProcessInputRow: " & ex.Message, String.Empty, 0, True)

End Try

End Sub

 

Private Function ValidWorksheetLayout(ByVal wrkSheet As Worksheet) As Boolean

' checks if data is in expected locations. If not, then return false.

If wrkSheet.Cells("B3").Value.ToString = "Profit and loss projections" And wrkSheet.Cells("B14").Value.ToString = "Revenue" Then

Return True

End If

Return False

End Function

 

Private Sub ImportWorksheetData(ByVal WorkbookID As Integer, ByVal wrkSheet As Worksheet, ByVal Row As WorkbookListBuffer)

' imports all data from the worksheet

Dim iRow As Integer

Dim iCol As Integer

Dim iCount As Integer

Dim dataRange As DataTable

Dim dr As DataTableReader

Dim HeaderRow As List(Of Integer)

Try

' heading row is in seperate location to data, so import heading first

' note that Aspose.Cells uses zero-based row and column numbers in ExportDataTable call

' so Cell=4,Row=12 is actually cell E13

Try

' read single header row into generic list

HeaderRow = New List(Of Integer)

dataRange = wrkSheet.Cells.ExportDataTable(12, 4, 1, 5)

dr = dataRange.CreateDataReader()

If dr.Read Then

For iCount = 0 To 4

HeaderRow.Add(GetInteger(dr, iCount))

Next

End If

 

Catch ex0 As Exception

Me.ComponentMetaData.FireError(0, Me.ComponentMetaData.Name, "ImportWorksheetData ex0: " & ex0.Message, String.Empty, 0, True)

End Try

 

Try

dataRange = wrkSheet.Cells.ExportDataTable(14, 2, 2, 7)

dr = dataRange.CreateDataReader()

While dr.Read

ProcessDataTable(WorkbookID, dr, 2, 6, HeaderRow)

End While

 

Catch ex1 As Exception

Me.ComponentMetaData.FireError(0, Me.ComponentMetaData.Name, "ImportWorksheetData ex1: " & ex1.Message, String.Empty, 0, True)

End Try

 

Try

dataRange = wrkSheet.Cells.ExportDataTable(19, 2, 2, 7)

dr = dataRange.CreateDataReader()

While dr.Read

ProcessDataTable(WorkbookID, dr, 2, 6, HeaderRow)

End While

Catch ex2 As Exception

Me.ComponentMetaData.FireError(0, Me.ComponentMetaData.Name, "ImportWorksheetData ex2: " & ex2.Message, String.Empty, 0, True)

End Try

 

Try

dataRange = wrkSheet.Cells.ExportDataTable(23, 2, 9, 7)

dr = dataRange.CreateDataReader()

While dr.Read

ProcessDataTable(WorkbookID, dr, 2, 6, HeaderRow)

End While

Catch ex3 As Exception

Me.ComponentMetaData.FireError(0, Me.ComponentMetaData.Name, "ImportWorksheetData ex3: " & ex3.Message, String.Empty, 0, True)

End Try

 

Catch exOuter As Exception

Me.ComponentMetaData.FireError(0, Me.ComponentMetaData.Name, "ImportWorksheetData: outer " & exOuter.Message, String.Empty, 0, True)

End Try

End Sub

 

Public Sub ProcessDataTable(ByVal WorkbookID As Integer, ByVal dr As DataTableReader, ByVal iStartDataCol As Integer, ByVal iDataColCount As Integer, ByVal HeaderRow As List(Of Integer))

Dim iCol As Integer

Dim DataValue As Decimal

Dim bAddRow As Boolean

Dim TimePeriodID As Integer

Dim OutputCount As Integer = 0

 

Try

For iCol = iStartDataCol To iDataColCount

bAddRow = False

Try

If Not dr.IsDBNull(iCol) Then

DataValue = GetDecimal(dr, iCol)

If DataValue <> 0 Then bAddRow = True

If bAddRow Then

With Me.WorksheetDataBuffer

.AddRow()

' get header from the row we read into a list earlier

.FinancialYear = HeaderRow(iCol - 2)

.WorkbookID = WorkbookID

.RowName = dr.GetString(0)

.DataValue = DataValue

End With

End If

End If

Catch ex1 As Exception

Me.ComponentMetaData.FireError(0, Me.ComponentMetaData.Name, "ProcessDataTable: Inner " & ex1.Message, String.Empty, 0, True)

End Try

Next

 

Catch ex As Exception

Me.ComponentMetaData.FireError(0, Me.ComponentMetaData.Name, "ProcessDataTable: Outer " & ex.Message, String.Empty, 0, True)

End Try

End Sub

 

Function GetDecimal(ByVal dr As DataTableReader, ByVal iCol As Integer) As Decimal

Dim DataValue As Decimal

Try

Select Case dr.GetDataTypeName(iCol)

Case "String"

If IsNumeric(dr.GetString(iCol)) Then

DataValue = CDec(dr.GetString(iCol))

End If

Case "Integer"

DataValue = CDec(dr.GetInt32(iCol))

Case "Double"

DataValue = CDec(dr.GetDouble(iCol))

Case "Decimal"

DataValue = dr.GetDecimal(iCol)

End Select

Catch ex As Exception

Me.ComponentMetaData.FireError(0, Me.ComponentMetaData.Name, "GetDecimal " & ex.Message, String.Empty, 0, True)

End Try

Return DataValue

End Function

 

Function GetInteger(ByVal dr As DataTableReader, ByVal iCol As Integer) As Integer

Dim DataValue As Integer

Try

Select Case dr.GetDataTypeName(iCol)

Case "String"

If IsNumeric(dr.GetString(iCol)) Then

DataValue = CInt(dr.GetString(iCol))

End If

Case "Integer"

DataValue = dr.GetInt32(iCol)

Case "Double"

DataValue = CInt(dr.GetDouble(iCol))

Case "Decimal"

DataValue = CInt(dr.GetDecimal(iCol))

End Select

Catch ex As Exception

Me.ComponentMetaData.FireError(0, Me.ComponentMetaData.Name, "GetInteger " & ex.Message, String.Empty, 0, True)

End Try

Return DataValue

End Function

End Class

Posted by DrJohn with no comments
Filed under: ,
More Posts « Previous page