Microsoft Access: an elegant solution to Data Warehouse metadata

Before you send me flame mail telling me I must be off my rocker, I am not proposing Microsoft Access store any data! We all know that its JET-based database engine has severe limitations in terms of speed and scalability. No, what I am proposing is that Access is used as is a very quick way for you to deliver an easy-to-use front-end for your business users to edit metadata which is actually stored in a SQL Server back-end database.

Whether you have a classic data warehouse architecture, or some botched system that evolved over the years, you will no doubt have some metadata that is vital to your data warehouse yet did not come from any source system. Usually this data needs to be changed by business users on an infrequent basis.

In big data warehouse (DWH) implementations, where money is no object, you can take the time to build an entire web-based application to edit and manage metadata. This is fantastic for those with deep pockets and long delivery timescales, but realistically not every customer can afford such luxuries. On the other hand Microsoft Access provides a quick and inexpensive way to deliver a mechanism to edit metadata using Linked Tables. This blog provides a tutorial on how to create a front-end metadata editing tool with Microsoft Access and also provides a useful starting template for you to build your own metadata editor. So here goes.

Target Architecture

Before showing how to build your first front-end using my template, let's talk about the technical architecture.

 

The best approach is to store your metadata in a separate database. This offers better control over security as users can be given full permission on the metadata database but limited rights to the other databases in the data warehouse.  Also, the metadata database can have full recovery model, so it can be restored to any point; something that is not always applied to a staging database. However, your architecture may be a sub-set of this. Needless to say, the template I provide will prove useful in any situation.

Example front-end

The following screen shots shows exactly what can be achieved using Microsoft Access. Now these are actually very crude by comparison to what can be achieved by proper MS-Access developers, but they are quick and easy to build for those with limited time and skills. Also business users find them easy to understand.

My example screens allow the business user to edit a Region hierarchy which becomes a dimension in the OLAP cube. The region hierarchy is roughly geographical in nature, but it is business focused and not related to geo-political boundaries and therefore not be available in any source system; hence the need to hold this data in the data warehouse. The hierarchy has three-levels: the top level has Regions, the mid-tier has Locations and the bottom level holds Business Units. Clearly we could have more levels, but this is sufficient for my example.

Been able to edit this kind of structure delivers a lot of power to your business users as they are now able to control how all reports are presented to end-users and how numbers roll-up in the OLAP cube. Nothing to be sneezed at!

The tables in the metadata database that support the region hierarchy are normalized for editing and de-normalized by a view when been sucked into the DWH for use in the OLAP cube. So our MS-Access database has to edit the following SQL Server tables:


Main Menu

The main menu is the first thing the user sees. This is their jumping off point for editing the metadata. Of course the screen must be in their language, not techno-babble!


Edit Regions

Clicking "Regions" will show the following screen.


Although crude, this allows the user to a) change the name of a region and b) add new regions. Foreign key relationships or triggers in the underlying SQL Server database should stop users deleting rows unless you want them too.

In a classic DWH implementation, the user should indicate that a region is defunct by changing its status. In my simple example, they do so by deleting the row. Both approaches can be implemented using MS-Access, it is just a matter of design.

Edit Locations

The Edit Locations form is more sophisticated as it provides drop-down lists for the user to select the region (i.e. from what the user entered on the Edit Regions form). This is because the metadata tables are normalized for editing and de-normalized by a view when been sucked into the OLAP cube.


Edit Location to Business Unit Mapping

The final screen allows the user to map their business data to the bottom level of the regions hierarchy. For my client, this screen related locations to the identifiers used by the finance department to classify all revenue and costs.


The easy way to create Linked Tables

In the template I provide a very simple mechanism to help you create your linked tables as well as switching between servers (i.e. between test and production database servers). However, if you are new to MS-Access you will not appreciate the pain of creating and managing them manually, so let's create a linked table the standard way first.

To create a linked table through MS-Access, you go to the External Data tab and click More and select ODBC Connection as shown below.


You next select Link to a data source by creating a linked table and click OK. This will display the Select Data Source dialog. Now select your data source or create a new ODBC connection file. Be careful to select the correct SQL Server Client Library for your version of SQL Server. For example:


Once you have created your ODBC connection, MS-Access will show you a list of all the tables in your SQL Server database where you can select multiple tables to link.


Click OK and the tables are linked. Well that seemed easy enough, so where is the pain? Well the pain comes when you want to switch servers. The connection string for each table is held in the metadata associated with the table. It is far too easy to connect different tables to different servers during development (e.g. some to localhost, others to a dev server etc.), secondly the Linked Table Manager provided in MS-Access is not the most intuitive UI for changing the database links. Indeed it does not tell you on which server the tables reside! For that, you need to hover over the linked table in the Navigation Pane as shown below. Not too convenient!


The other thing about these newly linked tables is that the schema is displayed, which means the first thing you will do is edit the name displayed in MS-Access. Again a pain if you have to drop and recreate tables when switching between servers.

Change server and create linked tables all in one step

My template provides a far simpler mechanism to simultaneously change server and re-create all the linked tables in one step. You simply edit the SourceServers table and fill in names and descriptions of all your servers.


You then edit the LinkedTables table and fill in the names of the tables you want to link along with the name of the database in which they are stored.


Then select Change Server from the main menu and a list of servers will appear. Simply select your server and click OK. The code will delete all existing linked tables and then re-create them based on the information provided in the LinkedTables table. So changing between development and production servers is now extremely easy!

Basic Editing

The top level of the hierarchy is Region and is held as a simple list of names with an identity column. We only need a basic editor for the user to edit this list, so we will use the default datasheet view provided by MS-Access. To invoke the basic editor, simply double-click on the Region table and you will see the datasheet view (displayed below). Remember, if the left-hand Navigation Pane is not visible, simply click F11.


 

Editor with drop-down selections

Our Location table has a foreign key relationship with the Region table. Rather than have our users remember a lot of meaningless IDs, we want to provide them with a drop down menu. To do this, click on the Location table in the Navigation Pane and then open the Create tab and select the Forms Wizard as shown below.


Next you will see the Form Wizard's field selection dialog. Select all fields.


On the next Layout selection dialog, select Tabular.


On the Style dialog, pick your preferred style. I recommend sticking to the Office theme if you are unsure. On the final dialog, name the new form and click Finish.

You will now see the basic Location editing form ready for you to enter data. It looks much better than the basic form we created for Regions, but has the problem that the user has to remember the RegionID in order to make an entry.


We get around this by changing the RegionID field to use a combo box. Switch into design mode using the View option on the Home tab and select the RegionID field. Right-click and select change to combo box.


Now change the Row Source property by clicking the three dots in the Property Sheet (press F4 if this is not visible).


Use the query designer to create the query:

SELECT Region.RegionID, Region.RegionName FROM Region;

This SQL should be displayed in the Row Source property once you exit the designer.

Finally change the Column Count property to 2 and set the column widths to 0cm;4cm


This has the effect of hiding the RegionID from the user as shown below.


Edit Location Mapping table

The LocationMapping table maps business unit IDs from the source finance system to locations. So here the drop down list we provide to the user has to come from the source finance system. As this is a data warehouse, we have already extracted and cleaned that data with SSIS and written it into the staging database. So we simply need to create a linked table to the relevant table in the staging database. Again this is easily done using the LinkedTables table as this has a SourceDatabase column as shown below. Of course you will need to click the Change Server button to create the new linked table.


Using the same technique as we did with the RegionID column in the Location form, we can create a form containing two drop-down lists as shown below.


Creating the Main Menu

When the user first opens the database, you want to display a menu of options so they can find things easily. To create a new form, simply choose Form on the Create tab. Then switch to design mode and expand the detail section to reveal a white cross-hatched area. As you are in design mode, you will see the Form Design Tools tabs on the ribbon. Select Design tab and click Button. Now highlight an area of form and draw a button. When a dialog pops up, simply click Cancel as we do not want any standard option. Open the Property Sheet and give the button a sensible Name on the Other tab (e.g. cmdRegion). On the Event tab, select [Event Procedure] in the on-click event and click the three dots box to open the Visual Basic editor.


To open the region table directly for editing (i.e. because we do not need a special form) we use the OpenTable command as follows:

DoCmd.OpenTable "Region", acViewNormal, acEdit

We now repeat the steps to create buttons for our other forms. However, as we have created special forms to edit these tables we need to use the OpenForm command as follows:

DoCmd.OpenForm "Location", acNormal, , , acFormEdit

For some obscure reason MS-Access still uses macros to achieve certain tasks. In order to have the main menu appear when the user first opens the database, you need to create macro called AutoExec which does this.


User Permissions

If you have a separate metadata database, then your users only need read-write permissions to that database. If the MS-Access database uses other tables to populate drop-down lists, then the user will need read-only access to those tables. This can be managed on a table by table basis or by simply adding the user to a database role with db_datareader and/or db_datawriter roles.

Gotha: Tables must have a Primary Key

MS-Access can only edit tables that have primary keys defined. If no primary key is defined, then you will find that your form is locked in read-only mode with no indication whatsoever as to why you cannot edit the data. For me this was a painful learning curve, as I had most of my tables with primary keys but the odd one or two without. Given the lack of feedback from MS-Access as to why the forms were locked in read-only mode, it was quite some time until that I realised what caused the problem.

Deployment

Clearly your business users need MS-Access installed on their desktop in order to use you new metadata editor. I would recommend developing the database using the same version of MS-Access as your users have installed. Although my screen shots are of MS-Access 2007, this application can be built using MS-Access 2003 as well.

Although convenient, I would not recommend using a file share to deploy the database as I have found that slow VPN connections can corrupt the MS-Access database making it unusable for everyone. Instead, have the users copy the database to their desktop or install on their PC to a standard location. You could even write an installer, although that is rather over the top as a simple batch file would do!

The other thing to be careful about is the version of SQL Client libraries on the user's desktop. I would recommend using what is already available rather than requiring a separate install. Clearly testing on several machines will highlight any issues in this area. If you need to change the version of SQL Client used by my template, simply edit the ODBC_DRIVER constant in the VB module called Common.

Replication of metadata to the staging database

If you are not happy using cross-database joins in your stored procedures, you may well need to replicate the metadata to the staging database for use in the data warehouse. This can be done in a number of ways using SSIS or SQL replication. However, this is a topic outside the scope of this blog entry.

MS-Access Template

The MS-Access Template and SQL scripts to create the demo region hierarchy is available for download here: Metadata_Editor.zip.

Good luck!

Dr. John

 

Posted by DrJohn with 2 comment(s)
Filed under:

SSAS: Microsoft release fix for “Kerberos killing MDX” issue

In a previous post I reported how small MDX queries worked fine when client and server were Windows Vista/Windows Server 2008 but large MDX queries died due to the connection been forcibly closed by the transport layer. It turned out that this issue was due to a bug in the Kerberos.dll on these AES aware operating systems. AES = Advanced Encryption Standard.

Microsoft have how released a hot fix for this issue.  Read John Desch's blog post for details.

Note that this bug is not present in the Windows Server 2008 R2 or Windows 7 release.

 

Posted by DrJohn with 1 comment(s)
Filed under: ,

SSAS: Kerberos kills ‘large’ MDX queries on Windows Server 2008

Kerberos is nasty! I have had several encounters with Kerberos and none of them have been pleasant. I try to avoid Kerberos like try to avoid the dentist. However, some projects demand that Kerberos authentication be used to cater for the 'third hop' and then the pain begins.

The symptoms of my latest encounter were somewhat odd. Local MDX queries worked fine, but queries performed over the network failed with the message:

Executing the query ...
The connection either timed out or was lost.
Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host.
An existing connection was forcibly closed by the remote host
Execution complete

Of course the error message does not tell us anything useful. Just that someone does not like us today!

At first we thought there was a fault with the MDX query. But no, the MDX query ran fine against the local development server and also ran fine against the production server when run in SQL Server Management Studio in local mode (i.e. not across the network). But as soon as we ran the query across the network, we got the error.

Suspecting a hardware fault on the brand new server, the IT support team checked out the memory and changed the network cards. No change; the queries still failed.

Anyone familiar with SQL Server Analysis Services (SSAS) knows that it uses a Windows-based security model (i.e. the user must have Windows account in order to connect to the OLAP database). Unfortunately, this becomes a problem when the user's credentials need to be passed from machine to machine before finally arriving at the OLAP server. Basically the NTLM protocol does not support delegation across more than two hops.

To cater for the third 'hop' you have to co-opt the help of the 'dark side': i.e. Kerberos and SPNs. Servers have to be 'registered for delegation' in Active Directory and SPNs have to be set up to configure Kerberos. This is a black art and my colleague and I have the battle scars to prove it!

Our configuration was not unusual. Users connected to a web server running SharePoint 2007 and SQL Server Reporting Services (SSRS). SSRS then queries SQL Server Analysis Services (SSAS) database which resides on the backend 'data server'. The complication occurs when users connect over the internet; they are authenticated by Microsoft ISA server which passes their credentials on to the web server which in turn passes their credentials to SSRS which in turn passes their credentials on to SSAS.

Now, this infrastructure was working fine in the current SQL Server 2005 / Windows Serve 2003 implementation. However, the new SQL Server 2008 / Windows Server 2008 implementation was having trouble with a single hop, so we were in big trouble!

Suspecting a corrupt installation of SSAS, we did a re-installation of the whole SQL Server 2008 suite, but the MDX queries still failed. So next we did a fresh install of the Windows Server 2008 operating system along with SQL Server 2008, but the MDX queries still failed!

At this point we still believed that all MDX queries failed. But then I observed that the filters on the SSRS reports were been populated, whereas running the main report query generated the same ugly error message.

Been convinced my MDX was at fault, I started a trace of activity on the Analysis Services service which showed that small MDX queries worked fine (all the relevant log entries were present) whereas, large MDX queries failed and the only log entry was an "Audit Logout notification" event.

I started to play with the MDX and found that if I padded out a very simple query with comments it started to fail around the 1096 character mark i.e. simply adding /******/ until the whole statement was over 1kB caused the query to fail! Clearly something major was going wrong!

Still believing it to be a hardware or software implementation error, we proceeded to install SQL Server 2008 OLAP database on another identical box. No problem! MDX of any size worked fine. So it must be a hardware or software implementation fault????

After much scratching of his head, my good friend and colleague Eric Moutell eventually announced that the only difference between the two machines is that one had been set up with SPN entries to allow Kerberos authentication. So he deleted the entries, rebooted the world, rebooted the world again and finally any size MDX query worked on the original box. Ah ha! We were getting somewhere. So now we knew that nasty old Kerberos was having a laugh at our expense!

We got in contact with Microsoft support and after sending detailed information about the bug and several gigabytes of server logs, they eventually admitted we had found a bug in Windows Server 2008 which they have agreed to fix!

For those of you experiencing a similar problem, I reproduce the email from Microsoft support which you may find useful. In our environment, the server is running Windows Server 2008 64-bit with Microsoft SQL Server 2008 with the latest cumulative update patch CU3. The client machines we tried were either Windows Server 2008 64-bit or 32-bit running Microsoft SQL Server 2008 CU3.

From: Camino De Vicente Rodriguez
Sent: 16 March 2009 13:58
Subject: RE: Your case with Microsoft (SRZ090205000308)
Escalation engineers have confirmed that is a problem on 
Kerberos.dll related to encryption with AES on Windows 2008 and Vista.
They have reported to the Product Group and,
after analyzing it, they have accepted to fix it.
The estimated time for the fix is May 
(it is required two month cycle hotfix test pass).
I am sending you the Workarounds if you cannot wait until May:
If Kerberos authentication is a requirement, run Analysis Services 
on a Windows 2003 Server, since Windows 2003 Server is not AES aware.
Use Windows 2003 Server, Windows XP, or Windows 2000 Server 
to run client applications that will be connecting to the
Analysis Server configured for Kerberos authentication and running on Windows 2008.
Since these operating systems are not AES aware,
that will avoid use of AES for encryption/decryption.
Avoid configuring the Analysis Server to use Kerberos 
Authentication, since this will result in the AES
not being used for encryption/decryption
Add ";SSPI=NTLM" or ";Integrated Security=SSPI" to the connection string, 
which will force use of NTLM and avoid use of AES for
encryption/decryption
Best regards,
Camino de Vicente
EMEA GTSC Development Support Engineer
Posted by DrJohn with 5 comment(s)
Filed under: ,

SSAS writeback error was ‘an accident with a contraceptive and a time machine’

Those of you that heard the hit radio series 'The Hitchhikers Guide to the Galaxy' or read Douglas Adams' book will recognise the title of my blog entry as been the reason quoted by Zaphod when asked to explain why he was Zaphod Beeblebrox the First and his father was Zaphod Beeblebrox the Third. Apparently it was all down to 'an accident with a contraceptive and a time machine' whereby Zaphod Beeblebrox had travelled back in time and fathered his own ancestor.

In my case the 'time machine' was a source code control system that provided the ability to go 'back in time' and revert my SSAS project to a previous incarnation. My 'accident' was to combine some of the 'past' with the 'present' to end up with a creation of which Frankenstein would have been proud.

The key problem of using a source code control system with an Analysis Services project is that the file-based source code control model just does not fit in with the way SSAS works. For example, adding a security role to a cube not only creates a new role file but also fundamentally alters the content of both the .cube file and all the dimension files for which you define attribute security. So when you come to check-in the project, you find that two or three files have changed when you only expected to alter one! So when it comes to reverting to a previous version you really have to take all the files from a specific date, not just one or two files.

The horror of my own Frankenstein creation did not get recognised until several days after its birth. When testing part of the application's functionality that used cube writeback we got the obscure message

Errors in the high-level relational engine.
Only one-to-many relationships are supported for parent-child dimensions.
The '' table contains many-to-many relationships between parent and child columns.

As you see, the table name was missing from the error message, so we were bemused as to which table was at fault. If the rest of the error message were to be believed, the data was incorrect.  Well we checked the data and it was fine. Moreover the cube did not complain when processing the data.  We deleted and recreated the writeback tables, so they are not at fault. When we disabled writeback, the error message disappeared.  

Well, there is nothing more frustrating or time-consuming than a misleading error message and a bunch of red-herrings. We wasted many, many hours trying to figure out the cause of the error. A search the forums revealed nothing so I posted my own entry – to which no one replied. Eventually the issue was raised with Microsoft. After much teeth gnashing, they found the source of the problem: lack of a primary key on a table in the DSV. Well the primary key was certainly there at some point as the writeback had worked in the past. However, because I had used source control to 'travel back in time' and revert some files to an earlier version, I had clearly got my revisions well and truly mixed up. My 'contraceptive' had not worked and Frankenstein's monster was duly born.

So beware all of you who play with 'contraceptives' and 'time-machines'!

Posted by DrJohn with no comments
Filed under:

MDX + T-SQL: Combining relational and multi-dimensional data into one query result set

When it comes to system integration, Microsoft provides such a plethora of options it is far too easy to forget about some of them and/or think about their practical application. This is particularly true of the SQL Server BI stack which is so huge in its scope.

Our need was to populate a relational database using the results of an MDX query so we could do some data reconciliation to prove the new cube provided the results we wanted. Given my skills in C#, my immediate thought was to write a simple console application which performed an MDX query and populate a database table using SQL Bulk Insert. However, this option would not be too easy the rest of the team to support as they do not have the relevant C# skills or experience of ADOMD.NET.

Vincent Rainardi (author of Building a Data Warehouse: with examples in SQL Server) pointed me the direction of OpenQuery and linked servers. This is a remarkably simple and flexible solution to a great many system integration problems, so I thought I had better share it with a wider audience.

OpenQuery and OpenRowSet both execute pass-through queries against a specific server. The 'query string' to be executed is not evaluated by the local server but simply 'passed through' to the target server which is expected to return a row set. The 'query string' can be anything, including MDX, so long as it is understood by the target server. Well this works a treat with MDX and Analysis Services.

To use this technique, you will first need to create a linked SSAS server using either SQL Server Management Studio or in script using the stored procedure master.dbo.sp_addlinkedserver. Here is an example:

Once linked, you can then query your cube using MDX and combine the results with the content of a SQL Server table. Alternatively you can simply insert the results of the MDX query into a database table. For example, the following screenshot shows the MDX executed in SQL Server Management Studio:

Whereas the following screenshot shows the results of the equivalent OpenQuery:

 

Note now the column names contain the unique name of the attribute. Before showing how to deal with these, let's just look one surprising element of the functionality offered by this technique.

If you use a hierarchy in your query, then you will get extra columns describing the values of each hierarchy level. For example, if we change the above query to use the [Customer].[Customer Geography] hierarchy then you get an extra column in the query results describing the [Customer].[Customer Geography].[Country] level.

This is even more dramatic when using a parent-child hierarchy; a query against the Accounts dimension will bring back all six levels.

Dealing with Attribute Unique Names

Of course, you may find the attribute unique name a little long and cumbersome to use in your T-SQL. The easiest way to remove them is to use an alias. However, you still need to type that lengthy name in the first place.

The easy way around this is to copy the query results into Excel and then obtain the column header text from there. However, by default, the copying of column headers is switched off, so you may want to switch on the "Include column headers when copying or saving results" option in SQL Server Management Studio which can be found under Tools->Options->Query Results->Results to Grid. Here is a screenshot to make it easier:

Using an Alias 

As you will know, the square bracket in T-SQL is used as a delimiter in much the same way as it is in MDX. So simply typing:

[Customer].[Customer Geography].[Country].[MEMBER_CAPTION] AS Country

will cause a "multi-part identifier could not be bound" error. To avoid this, use double quotes around the attribute unique name as follows:

"[Customer].[Customer Geography].[Country].[MEMBER_CAPTION]" AS Country,

Numerical Advantage

The problem with OpenQuery is that every column has a data type of string, including the numerical columns. This means that the data type must be changed if you want to perform numerical computations on the figures using T-SQL. So our two numerical columns need to be wrapped in CAST & CONVERT as follows:

CAST(CONVERT(float, "[Measures].[Internet Gross Profit]") AS money) AS GrossProfit

Combining multi-dimensional and relational data together

In the case where you are creating a management report that combines the profitability figures of each division with the relevant manager's comments as to why the figure have (or have not) been achieved, then the use of OpenQuery becomes invaluable. This is because the only way to handle such a situation in the cube would be to create another dimension which links at the relevant level of granularity and enable cube write-back so the manager can update dimensional properties. A lot of hassle for a few pieces of textual information!

With OpenQuery, this becomes trivial as the text can be held in a supplementary database table and combined with the results of the MDX query.

Now the problem you may have spotted is that OpenQuery returns the MEMBER_CAPTION which will probably not be sufficiently unique to match an entry in the relational database. What you really need is the MEMBER_KEY. This can be added as a column using a calculated member as follows:

WITH MEMBER [Measures].[OrganisationID] AS [Organization].[Organizations].MEMBER_KEY

As the Adventure Works database does not contain a 'comments' table, let's create one and insert some rows using the new INSERT feature in SQL Server 2008.

CREATE TABLE dbo.ManagersComments(
    OrganisationID int NOT NULL,
    [Manager's Name] nvarchar(50) NULL,
    [Manager's Comments] nvarchar(255) NULL
);    
INSERT INTO ManagersComments(OrganisationID, [Manager's Name], [Manager's Comments])
    VALUES (6, 'John Doe', 'My team failed again. You know how it is...'),
                (5, 'John Smith', 'My team are all stars.');

So let's bring MDX and SQL data together into one query. This is best achieved with a common table expression as the layout can be far more readable:

WITH MdxQuery
(    
    [Region],
    OrganisationID,
    [Net Sales],
    [Cost of Sales],
    [Gross Margin],
    [Gross Margin%]
)
AS
(
    SELECT
        "[Organization].[Organizations].[Organization Level 04].[MEMBER_CAPTION]" AS [Region],
        CONVERT(int, "[Measures].[OrganisationID]") AS OrganisationID,
        CAST(CONVERT(float, "[Measures].[Net Sales]") AS money) AS [Net Sales],
        --"[Measures].[Net Sales]" AS [Net Sales],
        CAST(CONVERT(float, "[Measures].[Cost of Sales]") AS money) AS [Cost of Sales],
        CAST(CONVERT(float, "[Measures].[Gross Margin]") AS money) AS [Gross Margin],
        CONVERT(decimal(4, 2), (CONVERT(decimal(19, 17), "[Measures].[Gross Margin%]") * 100)) AS [Gross Margin%]
    FROM OPENQUERY(AdvertureWorksServer,
        'WITH MEMBER [Measures].[OrganisationID] AS
            [Organization].[Organizations].MEMBER_KEY
        MEMBER [Measures].[Net Sales] AS
            ([Account].[Accounts].&[50], [Measures].[Amount])
        MEMBER [Measures].[Cost of Sales] AS
            ([Account].[Accounts].&[55], [Measures].[Amount])
        MEMBER [Measures].[Gross Margin] AS
            ([Account].[Accounts].&[49], [Measures].[Amount])
        MEMBER [Measures].[Gross Margin%] AS
            [Measures].[Gross Margin] / [Measures].[Net Sales]
        SELECT
        {
            [Measures].[OrganisationID], [Measures].[Net Sales], [Measures].[Cost of Sales],
            [Measures].[Gross Margin], [Measures].[Gross Margin%]
        } ON 0,
        {
            [Organization].[Organizations].&[14].Children
        }
        ON 1
        FROM [Finance]
        WHERE ([Date].[Fiscal].[Fiscal Quarter].&[2004]&[1])'
    )
),
CombineMdxWithTable (    
    [Region],
    [Manager's Name],
    [Net Sales],
    [Cost of Sales],
    [Gross Margin],
    [Gross Margin%],
    [Manager's Comments]
)
AS
(
    SELECT
        A.[Region],
        B.[Manager's Name],
        A.[Net Sales],
        A.[Cost of Sales],
        A.[Gross Margin],
        A.[Gross Margin%],
        B.[Manager's Comments]
FROM MdxQuery A
    LEFT JOIN dbo.ManagersComments B ON
        A.OrganisationID = B.OrganisationID
)
SELECT * FROM CombineMdxWithTable

 

Here are the results of our query:

The Big Surprise - Support for more than two axes!

What I found most surprising about the functionality offered by OpenQuery is that it will deal with extra dimensions/axes sensibly.  Of course any multi-dimensional query can always be transformed into a two-dimensional recordset; each cell simply becomes a row in the recordset, with one column for each dimension. So for example, a four dimensional MDX query returning a two measures can be easily transformed into a two dimensional recordset containing six columns. Each row represents a cell and has two numerical columns.

Of course, if you try adding more than two axes to an MDX query in SQL Server Management Studio, this is the resulting error you will see:

Results cannot be displayed for cellsets with more than two axes.

Given that OpenQuery does such a good job of flattening the resultset into two dimensions, SQL Server Management Studio could really make more of an effort!

In reality, similar results are returned by a two-dimensional MDX query where the rows axes is a CrossJoin of all the relevant dimensions. It also has the advantage that the query would be able to take advantage of the server's AutoExists performance optimization and therefore fewer rows will be returned. However, it does not sound quite so sexy as a true multi-dimensional query. Ah, well. Technology cannot always be sexy!

Posted by DrJohn with 2 comment(s)
Filed under: ,

Goodbye Vista: Installing Windows Server 2008 on a Dell XPS laptop

Vista was driving me nuts! I could not believe how slow it was. The hard disk was always thrashing, the machine ran hot and the battery life was pants. I had spent hours optimizing my environment; first deleting all the 'bloat ware' installed by Dell, then stopping every service I did not recognise and finally removing every entry I could from the start-up registry entries (i.e. HKEY_CURRENT_USER and HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Run).

The biggest culprit for wasting system resources was Vista's SuperFetch. Which idiot thought this up?? The service reads all the files on the computer and 'analyses' what you have done before and predicts what you may need in the immediate future and then loads the application and data into memory so you get fantastic performance if, and this is a big 'if', you happen to ask for the application or data that it thinks you want. In reality, the performance of the whole PC suffers as this silly service just wastes your PC's resources doing things that you do not need. Ahhhhh! I wish I had a shotgun and a time machine to get that S.O.A.B.!

My laptop is no slouch either. A Dell XPS M1530 laptop with Intel Core 2 Duo 2.00GHz, 4GB RAM, 250GB 5,600RPM hard drive and a 256MB NVIDIA GeForce 8600M GT graphics card driving a 1680x1050 pixel TrueLife TFT. So I was particularly annoyed by the lack of performance provided by Vista. The laptop was rated as 4.9 out of 5 in Vista's own performance benchmark; 5 been the best machines that were around when they were developing Vista. Clearly they were benchmarking how much system resource they could waste rather than focus on end-user performance!

As I develop server-based applications on the SQL Server 2005 BI suite, I had installed VMWare Workstation so I could run multiple virtual machines – one for each of my client's server configurations. I find VMWare much more flexible than Virtual PC, but then again, it's not free! What seemed silly is that the Windows Server 2003 instances running inside the virtual machine seemed much faster than Vista itself, especially when copying files across the network!

When I heard the news that SQL Server 2008 had finally shipped, I decided that it was time to do something radical: upgrade to Windows Server 2008 as the native OS and install SQL Server 2008! As a consultant on the road, I cannot afford to be without my machine for too long, so I decided the safest option would be to buy another hard drive and take the Vista disk out and replace it with a shiny new 7,200RPM hard drive. Well performance is everything!

Before taking the Vista hard drive out, I created backups onto my NAS drive and also copied onto a memory stick all the device drivers that Dell had kindly created in the c:\drivers folder (after checking they were the latest versions of course!).

The installation of Windows Server 2008 on my laptop went like a dream. Every time the OS found a new device, I simply pointed it at the root of the memory stick and, bingo, it installed without a glitch. The only devices that failed were the ones that Dell had forgotten to place in the c:\drivers folder or for my own printer/scanner. These were soon downloaded from Dell and Canon's web sites.

Why did it go so smoothly? Well Windows Server 2008 shares the same kernel as Vista and the device driver model is the same. So a Vista device driver will work with Windows Server 2008 and visa versa. However, be aware that this will only work with the 32-bit version of Windows Server 2008 as very few device drivers are available for 64-bit Windows.

A very Desktop Experience

Of course Vista does have some charms and I soon came to miss certain applications like Windows Photo Gallery and Windows Media Player. I tried finding them on the Microsoft Download site but to no avail. Windows Media Player is built into Vista and therefore not provided as a separate download. I searched the internet for clues as to where I could get them from, but to no avail. Several days went by before I found www.win2008workstation.com which has many tips on how to turn Windows Server 2008 into a workstation. This includes how to turn on something called the Desktop Experience. I followed the procedure and my desired applications magically appeared. However, some undesirable applications also popped up like Windows Defender (what junk!), Windows Mail, Windows Contacts and Windows Calendar. Unfortunately, you get all or nothing. There is no fine control over which of the Vista applications get installed. At least you can disable Windows Defender by disabling the related service.

Virtual CD-ROM

Eager to install SQL Server 2008 from a freshly downloaded ISO image, I started scouting for a way to load an ISO images as a virtual drive. I figured there must be a way to load an ISO image built into the OS itself. However, I could not find the right command-line, but I am sure there is someone out there that knows how to do it! So I tried my old favourite Virtual CD-ROM Control Panel which is a free download originally developed by Microsoft but now provided as an unsupported utility. To my surprise this worked a treat. Amazing since it was last updated in 2006!

Burning Desire

Over the years I have been a great lover of Nero Burning ROM. The main application had a simple but powerful user interface. However, the Nero package has grown out of all recognition over the years and offers so many utilities to do things I never imagined I needed (i.e. bloat ware) that I refuse to install it on my machine. Unfortunately the original Nero Burning ROM cannot be purchased separately, so I needed an alternative. After a perusal of Wikipedia, I found some freeware called ImgBurn. The UI is almost identical to Nero Burning ROM and seems to work well.

Well Connected

One slightly problematic device I had was my 3 USB mobile internet modem. This worked fine on first installation then stopped working for no apparent reason. I figured out it may be something to do with the Windows Firewall which, if I remember correctly, first appeared when I switched on the Desktop Experience. However, after tinkering with the firewall settings the device would still not work. I tried installing the latest device drivers, but to no avail. I uninstalled the device software and decided to ring 3's support line, confident in the knowledge that they would be no help at all; after all how many customers have mobile internet installed on a server?? Of course, several days passed before I resolved to get the problem fixed. Just prior to the call I thought I would prepare by reinstalling the device so the details of the problem would be fresh in my mind. Well blow me, the damn thing started working!

Mouse Problems

The one device that I have real problems with is my Logitech V470 Cordless Laser Mouse for Bluetooth. I can add it as a Bluetooth device, but it will not actually work. If I install the latest SetPoint software from Logitech, the mouse pointer jumps all over the place like a demented wasp. Ideas anyone?

My tool bag

I often find it useful to know what other consultants in the field carry around in their 'tool bag'. Well, here is my set of applications currently installed on the laptop:

  • Microsoft Windows Server 2008 including IIS 7.0
  • Microsoft SQL Server 2008 Developer Edition including SSAS, SSRS and SSIS and the new Adventure Works 2008 database samples.
  • Microsoft Visual Studio 2008
  • MSDN Library
  • Microsoft Office 2007 with "Save as PDF" enhancement
  • Microsoft Data Mining Add-ins for Excel 2007
  • UltraEdit – fantastic text and code editor I have been using for many, many years
  • Password Safe – I use this freeware to hold the username/password to over 100 web sites and bank accounts. They are all safe inside a heavily encrypted file which can be safely backed up to my NAS drive or memory stick without any danger of someone else been able to open the file.
  • PowerMarks – the easy way to move your internet favourites from machine to machine
  • Adobe Acrobat Reader – of course!
  • Primo PDF – freeware alternative to Adobe Acrobat Distiller for creating PDF files
  • Paint.NET – what a great free alternative to Adobe Photoshop Elements
  • Ultratagger – easy to use MP3 tagging software
  • FileZilla – great freeware FTP application
  • WinZip – the de facto standard
  • Apple iTunes for my iPod and (future) iPhone
  • ImgBurn DVD/CD-ROM burning software
  • 3 USB mobile internet

So if you are thinking of making the move from Vista to Windows Server 2008, do it now! The stability, performance, flexibility and power of Windows Server 2008 can be yours.

Good luck!

 

Updated: 9-Apr-2009

Going 64-bit

If you are thinking of making the switch to 64-bit, then do it!  Originally, I only had the option of 32-bit drivers from Dell.  So as soon as I found out that 64-bit drivers were available, I did the upgrade.  Everything went even more smoothly than the original conversion, so I was very impressed.  See comment below for more info.

Posted by DrJohn with 4 comment(s)

Busting for a Wii at SQL Bits

"I'm busting for a Wii!" was my wife's exclamation in the centre of Warrington. We had been an attempting to avoid traffic on the M6 but got snarled up in even more traffic in central Warrington. I did not understand at first, but then I saw the cheeky kid's grin on the advert on the back of a bus. In good northern dialect, he was not just bursting, but "busting" with excitement about the latest Nintendo games machine. What surprised me more was what followed. My wife gave me a 30 minute rant questioning exactly why I had not yet brought her a games machine. Of course my teenage daughter joined in the fray whilst my teenage son remained quite. Games machines have long been the domain of spotty teenage boys or young men with time on their hands and few friends. They are certainly not the domain of middle age females with a technology adversion! So I was rather taken aback by the whole situation.

However, this weekend the SQL Bits conferance delivered in more ways than I could possibly imagine. The event was well organised and well attended and all the lectures on the "track" I was on were well presented.

Andrew Sadler (formerly of ProClarity and now a consultant with Edenbrook) did a breakneck demonstration of the Microsoft BI stack including PerformancePoint and Excel Services. As I work with this stuff every day, I did not learn anything particularly new; it is just great to see someone else's take of the Microsoft offerings and exactly how they present it all as a 'coherent' strategy. There was clearly pent up frustration in the audience that the ProClarity product had been left on the shelf by Microsoft. A great tool been suffocated by the giant. I just hope that PerformancePoint v2 (whenever it surfaces) actually provides functionality as good, or better, than ProClarity's.

Simon Sabin did a great job of introducing some of the new features in the T-SQL language. To be honest, the audience got most animated when they realised Simon had simply highlighted the name of a table and hit some key and SQL Server Management Studio performed a query that listed not only the table definition but primary keys etc. He then showed us all how to configure keyboard shortcuts in Tools->Options->Environment->Keyboard. It seems that any single parameter stored proc (including your own) can be mapped to a keyboard shortcut and it will be passed the highlighted text when invoked. Simple, effective and one of the best things I learnt out of the session! Not that I am decrying Simon's excellent session, but something simple like a keyboard shortcut can save a lot of time in the long run, and these things are invaluable.

Allan Mitchell performed a great demo of the Data Mining Add-In for Excel. This is a tool that I always keep meaning to use on a project, but somehow never get around to getting it out of the bag and dusting it off. The new version of the DM Add-In for Excel 2007 demos really well. I will have to make a determined effort to do something with this technology soon!

It was great to meet Chris Webb for the first time. Having read the blogs and the book, it was good to get to know the guy in person. I will be using Intelligencia Query for BIDS and Reporting Services in my next project jointly developed by Chris and Andrew Wiles who also presented his 'reusable' approach to BI the day. Colin Hardie of the IMGroup ably demonstrated the new MERGE feature in T-SQL as well as several other approaches to handling Type 2 slowly changing dimensions.

It was great to meet old friends, put faces to people I had only previously met in 'cyberspace' and meet some new contacts. Microsoft provided free drinks after the gig and, even better, I won a Wii in the RedGate prize draw. Even my wife agrees that it was a good conference!

Posted by DrJohn with no comments
Filed under:

SSAS Stored Procedures - connecting to the transactional data source

There are often circumstances where an SSAS stored procedure needs to connect to the transactional database to perform a query.  This is certainly the case in a near-real-time OLAP solution where the cube sits directly on top of the application database and dynamic dimension security is implemented by a stored proc. Here the SSAS stored procedure has to query the transactional database in order to get the user's authorisation. Clearly the SSAS database has a data source and it uses this connection to process the cube. So how do we get the connection string?

Well the short code snippet below provides the solution. The code simply connects back to the current database using AMO and obtains the connection string from the data source object. It then removes the redundant "Provider=SQLNCLI.1;" before returning a valid SQL connection string that can be used by SqlConnection().

Of course, you must provide your stored proc assembly with the "Unrestricted" permission set and set its impersonation mode to "Service Account" for the connection string to work.

using AMO = Microsoft.AnalysisServices;

using Microsoft.AnalysisServices.AdomdServer;

...

/// <summary>

/// Obtains the SqlConnection string used by the OLAP database

/// </summary>

/// <returns></returns>

public static string GetSqlConnectionString()

{

    // connect to the current instance of Analysis Services and return the transactional data source

    AMO.Server svr = new AMO.Server();

    svr.Connect("Data Source=" + Context.CurrentServerID + ";Initial Catalog=" + Context.CurrentDatabaseName);

    // get the connection to the transactional database from the OLAP connection manager

    string connStr = svr.Databases.GetByName(Context.CurrentDatabaseName).DataSources[0].ConnectionString.Replace("Provider=SQLNCLI.1;", "");

    svr.Disconnect();

    return (connStr);

}

Posted by DrJohn with no comments
Filed under: ,

The FACT is that it is not so DIM after all

I have always looked down upon the Fact and Dim prefixes applied to the tables within the AdventureWorksDW database with derision. "Why," I asked myself, "would anyone feel the need to use such prefixes? Clearly this is BI for dummies." Given that most BI projects use a nice clean, dedicated data warehouse database where all the tables are imported into the cube, perhaps my derision was justified.

However, in my current project I have been building a near-real-time OLAP solution directly on top of an application database. Of course, to isolate the cube from potential schema changes, I created an abstraction layer in the form of a set of views. However, as the project has progressed and the large team of developers has created more and more database objects, it has become increasingly difficult to find the views created for the cube among the debris.

I recently had to re-build the entire cube from scratch due to a bug caused by reverting to a previous version in source control. Of course, I took the opportunity to consolidate all my T-SQL code into a few files and adopt a consistent naming convention for all my views. So what naming convention did I adopt? Well Fact and Dim of course! Why? Well because anyone following on from my work who is half-familiar with the BI sample databases will immediately recognise the relevance.

I subsequently found out the big benefit of adopting this convention – it makes creating a DSV much, much easier! A quick search for all objects containing 'Fact' and 'Dim' soon had my new DSV populated with all the relevant objects.

So the Fact is that this practice is not so Dim after all!

Posted by DrJohn with 2 comment(s)
Filed under:

SSAS: Working with date and time measures to calculate average elapse time

There are some things in the life that are so implicit and taken for granted that nobody ever bothers to tell you about them or write up any documentation. This is how I feel the implicit support for date and time measures must have come about in Microsoft SQL Server 2005 Analysis Services (SSAS). Certainly I could not find any documentation about the topic and yet intriguingly both Measures and Calculated Members have "Simple Date" and "Simple Time" in their drop down list of possible text formats.

Clearly SSAS will allow you to add attributes to a dimension that have the data type DateTime. However, SSAS does not allow you to create a measure based on DateTime data type. Now this is a bit of a drawback when you want to calculate the average time a process took, which is exactly what I needed to do. After searching the internet, I was none the wiser as to how to solve the problem so some experimentation was in order.

An obvious approach would be to create a measure which contains the elapse time for the process in minutes. This would certainly give the right answer when averaged. However, presenting the resulting number in a format that was meaningful to the end-user is beyond standard out-of-the-box functionality. Especially when the elapse time needs to be presented in financial speak such as T+1 15:30 (i.e. the next day at 3:30pm). And since the end-user would be browsing the cube with Excel 2007, standard out-of-the-box functionality is all I could use.

My data was pretty simple:

CREATE TABLE dbo.WhenEventHappened(

SurrogateKey int NOT NULL,

MeasurementTypeID int NOT NULL,

ReferenceDate datetime NOT NULL,

DateTimeEventHappened datetime NOT NULL

) ON [PRIMARY]

 

Where ReferenceDate holds the date the event is relative to and DateTimeEventHappened clearly holds the date & time when the event took place. So for example, if we use the financial services analogy the trade took place on the ReferenceDate and it was settled on DateTimeEventHappened. So the elapse time is (DateTimeEventHappened – ReferenceDate) in days, hours and minutes. In my data, ReferenceDate is always a date with no time element.

The SQL Server DateTime data type

You probably know that behind the scenes SQL Server holds dates as a decimal number. The whole number contains the number of days since 1900 and the fraction represents the date. For example, 39415.5 represents 2007-12-01 12:00:00

So, if

ReferenceDate= 2007-12-01 00:00:00

and

DateTimeEventHappened=2007-12-01 12:00:00

then

(DateTimeEventHappened – ReferenceDate)= 1900-01-01 12:00:00 which is held as 0.5

OK so this gives me the elapse time, but now do I get it into the cube? Well the answer is simple. Convert the DateTime into a decimal number. To do this you need to apply the relevant cast/convert such as:

CONVERT(float, (A.DateTimeEventHappened - A.ReferenceDate)) AS EventDateDiffDecimal

To import this data into the cube, I created a view that performed the relevant cast/converts.


Now, for illustration of what happens between SQL and SSAS, I have added some redundant columns to my view just to make it obvious what is really going on here. So here is my data, including the extra ReferenceDateDecimal and DateTimeEventHappenedDecimal columns.  I have also created some simple data which is shown below:


Next I simply added this new view to the cube's Data Source View and added the EventTimeDecimal and EventDateDiffDecimal plus the extra columns to my example cube as a new measure group which automatically sets the AggregateFunction to Sum. So here is the result:


Now apply magic!

OK, so the above is not so useful, so let's apply some magic. Simply set the FormatString and AggregateFunction for each measure as follows:

Column

Format String

AggregateFunction

Reference Date Decimal

dd-mmm-yyyy

AverageOfChildren

Date Time Event Happened Decimal

dd-mmm-yyyy hh:mm

AverageOfChildren

Event Time Decimal

hh:mm

AverageOfChildren

Event Date Diff Decimal

dd-mmm-yyyy hh:mm

AverageOfChildren

So after re-deploying the cube, we get:


 

 

Oh look, wrong dates!

Well we have dates, but not the correct ones! It seems that date zero in SQL Server is 01-Jan-1990 and date zero in SSAS is 30-Dec-1899! To fix the problem, we simply add two to the values provided by our EventElapsePeriod view. After processing the cube we now get the correct dates:


However, I want the EventDateDiffDecimal to be presented in the format T+1 15:30. So I need to subtract one from the EventDateDiffDecimal value so that I get 1-Jan as my date. Then I can format the value, ignoring the month and year using the FormatString "\T\+d hh:mm". Now purist among you may point out that this strategy will not work when the date gets bigger than 31 as it will flick over T+1 again. This is true, but I have defined EventDateDiffDecimal as a semi-additive measure so I will not hit this problem.

So here is the final formatted data:


Note how the Grand Total is correctly calculated. Just magic!

If you would like to try this for yourself, I have published the T-SQL scripts and cube for download here: DateTimeMeasures.zip  Note that the ZIP file also contains the SQL script to re-create the sample table and view used in this article.

Good Luck!

Posted by DrJohn with 2 comment(s)
Filed under:

Cleaning Address Data with SSIS using a Web Service - Explained

Jamie Thompson gave me some very helpful feedback on my recently published article over at SQLCentral.com called Cleaning Address Data with SSIS Using a Web Service and my previous blog entry on Calling SharePoint web services from the data flow. Jamie pointed out that I had assumed my audience would be fully familiar with the .NET framework and terms such as WSDL and Proxy Class need further explanation. So here is my attempt to clarify some issues he pointed out.

  • WSDL
    WSDL is short for Web Service Description Language and is the definition of what the web service provides and how it can be invoked. A WSDL document is an XML file which lists all the methods provided by the web service and their parameters. The WSDL document is returned by a web service when "?wsdl" is appended to the web service's URL (e.g. http://services.postcodeanywhere.co.uk/uk/lookup.asmx?wsdl).
  • Proxy Class
    A proxy class simplifies the interaction between your application and a web service. The .NET Framework provides a command-line utility called wsdl.exe which generates the code for a proxy class automatically. In Visual Studio, the Add Web Reference command automatically executes wsdl.exe and generated the proxy class for you. However, with SSIS you need to create a proxy class manually using a command-line such as this:
    e.g.
    wsdl /language:VB http://services.postcodeanywhere.co.uk/uk/lookup.asmx?wsdl /out:c:\PostcodeAnywhere.vb
  • GAC
    The Global Assembly Cache (GAC) is the central repository for sharing assemblies (DLLs). In order to register an assembly in the GAC, it must be strong named (i.e. must have a unique hash value, public key, locale and version number).

If you want to know more about the .NET Framework, then read O'Reilly's .NET Framework Essentials, 3rd Edition. 

 

Posted by DrJohn with no comments
Filed under: , ,

SSIS: Getting data from Excel files using Aspose.Cells

Excel spreadsheets with complicated layouts really do not lend themselves to processing with the standard Microsoft OLE DB drivers. Indeed, extracting one or two cells from different worksheets can be positively painful! This is where the power and flexibility of Aspose.Cells comes into its own.

Aspose.Cells provides an API that looks very similar to the OLE automation API exposed by Excel in the Office InterOp assemblies. However, you do not get all the headaches normally associated with the InterOp assemblies (i.e. modal dialogs, memory leaks, multiple Excel instances etc. etc.). With Aspose.Cells you do not need Excel to be installed on your server. Instead you only need a single assembly (Aspose.Cells.dll ) and the accompanying license file (Aspose.Cells.lic). Simple!

Working with Aspose.Cells in SSIS

To work with Aspose.Cells, you need to use the Script Task or Script Component. My personal preference is to use the Script Component in its asynchronous transformation mode. This way I can feed in a list of Workbooks that I want the component to process on its input buffer and the component can write the data read from the Workbooks onto multiple output buffers (one buffer for each type of data).

As SSIS uses Visual Studio for Applications (VSA), you do not have the flexible "Add Reference" dialog found in Visual Studio. In order to get around this issue while you are developing, you need to copy the Aspose.Cells assembly and associated license file into your .NET folder, which is usually:

C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727

Restart Visual Studio and you will see Aspose.Cells listed on VSA's "Add Reference" dialog. Now you can add the appropriate "imports" statement to the top of your module and start coding. Easy!

Later, when you move to production, you will need to register the Aspose.Cells assembly in the GAC by simply dropping the assembly into the following folder:

C:\WINDOWS\assembly

Dealing with the Aspose.Cells License file

Aspose.Cells has a license file which can either be in the same directory as the assembly, or in your own location. The easiest way to handle this in SSIS is to use a File Connection Manager which records the location of the license file. Then in code, you will need to add:

Dim license As Aspose.Cells.License = New Aspose.Cells.License()

license.SetLicense(Me.Connections.AsposeCellsLicense.ConnectionString)

 

Example Code

My sample code reads a list of workbooks from the input buffer and writes the extracted data to the output buffer.  The data flow is shown below along with the code from my asynchronous Script Component. Enjoy!


 

' Aspose.Cells.dll must be registered in the GAC for production

' AND the C:\WINNT\Microsoft.NET\Framework\v2.0.50727 for development

 

' standard libraries

Imports System

Imports System.Data

Imports System.Math

Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper

Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

' additional libraries

Imports System.IO

Imports System.Collections.Generic

Imports Aspose.Cells

 

Public Class ScriptMain

Inherits UserComponent

 

Public Overrides Sub WorkbookList_ProcessInputRow(ByVal Row As WorkbookListBuffer)

 

' Imports all data from the the Excel workbooks provided on the input

 

' Note that each workbook contains several worksheets.

' For efficiency, we open the workbook once and then

' process all the worksheets in the file

 

Try

' The previous step in the package downloaded the workbook.

Dim fInfo As FileInfo = New FileInfo(Path.Combine(Me.Variables.ImportFolder, Row.URL))

If fInfo.Exists Then

 

' Set Aspose.Cells license

Dim license As Aspose.Cells.License = New Aspose.Cells.License()

license.SetLicense(Me.Connections.AsposeCellsLicense.ConnectionString)

 

' Open the workbook

Dim wrkBook As Workbook = New Workbook()

wrkBook.Open(fInfo.FullName)

 

' process each worksheet

For Each wrkSheet As Worksheet In wrkBook.Worksheets

Try

' Check we have a valid worksheet by testing

' for text prompts in specific locations

If ValidWorksheetLayout(wrkSheet) Then

' process each worksheet found in the workbook

' that has the correct format

ImportWorksheetData(Row.WorkbookID, wrkSheet, Row)

End If

Catch exi As Exception

Me.ComponentMetaData.FireInformation(0, Me.ComponentMetaData.Name, "WorkbookList_ProcessInputRow: Invalid Worksheet Layout: " & wrkSheet.Name, String.Empty, 0, True)

End Try

Next

Else

Me.ComponentMetaData.FireError(0, Me.ComponentMetaData.Name, "WorkbookList_ProcessInputRow: Missing Excel File: " & fInfo.FullName, String.Empty, 0, True)

End If

Catch ex As Exception

Me.ComponentMetaData.FireError(0, Me.ComponentMetaData.Name, "WorkbookList_ProcessInputRow: " & ex.Message, String.Empty, 0, True)

End Try

End Sub

 

Private Function ValidWorksheetLayout(ByVal wrkSheet As Worksheet) As Boolean

' checks if data is in expected locations. If not, then return false.

If wrkSheet.Cells("B3").Value.ToString = "Profit and loss projections" And wrkSheet.Cells("B14").Value.ToString = "Revenue" Then

Return True

End If

Return False

End Function

 

Private Sub ImportWorksheetData(ByVal WorkbookID As Integer, ByVal wrkSheet As Worksheet, ByVal Row As WorkbookListBuffer)

' imports all data from the worksheet

Dim iRow As Integer

Dim iCol As Integer

Dim iCount As Integer

Dim dataRange As DataTable

Dim dr As DataTableReader

Dim HeaderRow As List(Of Integer)

Try

' heading row is in seperate location to data, so import heading first

' note that Aspose.Cells uses zero-based row and column numbers in ExportDataTable call

' so Cell=4,Row=12 is actually cell E13

Try

' read single header row into generic list

HeaderRow = New List(Of Integer)

dataRange = wrkSheet.Cells.ExportDataTable(12, 4, 1, 5)

dr = dataRange.CreateDataReader()

If dr.Read Then

For iCount = 0 To 4

HeaderRow.Add(GetInteger(dr, iCount))

Next

End If

 

Catch ex0 As Exception

Me.ComponentMetaData.FireError(0, Me.ComponentMetaData.Name, "ImportWorksheetData ex0: " & ex0.Message, String.Empty, 0, True)

End Try

 

Try

dataRange = wrkSheet.Cells.ExportDataTable(14, 2, 2, 7)

dr = dataRange.CreateDataReader()

While dr.Read

ProcessDataTable(WorkbookID, dr, 2, 6, HeaderRow)

End While

 

Catch ex1 As Exception

Me.ComponentMetaData.FireError(0, Me.ComponentMetaData.Name, "ImportWorksheetData ex1: " & ex1.Message, String.Empty, 0, True)

End Try

 

Try

dataRange = wrkSheet.Cells.ExportDataTable(19, 2, 2, 7)

dr = dataRange.CreateDataReader()

While dr.Read

ProcessDataTable(WorkbookID, dr, 2, 6, HeaderRow)

End While

Catch ex2 As Exception

Me.ComponentMetaData.FireError(0, Me.ComponentMetaData.Name, "ImportWorksheetData ex2: " & ex2.Message, String.Empty, 0, True)

End Try

 

Try

dataRange = wrkSheet.Cells.ExportDataTable(23, 2, 9, 7)

dr = dataRange.CreateDataReader()

While dr.Read

ProcessDataTable(WorkbookID, dr, 2, 6, HeaderRow)

End While

Catch ex3 As Exception

Me.ComponentMetaData.FireError(0, Me.ComponentMetaData.Name, "ImportWorksheetData ex3: " & ex3.Message, String.Empty, 0, True)

End Try

 

Catch exOuter As Exception

Me.ComponentMetaData.FireError(0, Me.ComponentMetaData.Name, "ImportWorksheetData: outer " & exOuter.Message, String.Empty, 0, True)

End Try

End Sub

 

Public Sub ProcessDataTable(ByVal WorkbookID As Integer, ByVal dr As DataTableReader, ByVal iStartDataCol As Integer, ByVal iDataColCount As Integer, ByVal HeaderRow As List(Of Integer))

Dim iCol As Integer

Dim DataValue As Decimal

Dim bAddRow As Boolean

Dim TimePeriodID As Integer

Dim OutputCount As Integer = 0

 

Try

For iCol = iStartDataCol To iDataColCount

bAddRow = False

Try

If Not dr.IsDBNull(iCol) Then

DataValue = GetDecimal(dr, iCol)

If DataValue <> 0 Then bAddRow = True

If bAddRow Then

With Me.WorksheetDataBuffer

.AddRow()

' get header from the row we read into a list earlier

.FinancialYear = HeaderRow(iCol - 2)

.WorkbookID = WorkbookID

.RowName = dr.GetString(0)

.DataValue = DataValue

End With

End If

End If

Catch ex1 As Exception

Me.ComponentMetaData.FireError(0, Me.ComponentMetaData.Name, "ProcessDataTable: Inner " & ex1.Message, String.Empty, 0, True)

End Try

Next

 

Catch ex As Exception

Me.ComponentMetaData.FireError(0, Me.ComponentMetaData.Name, "ProcessDataTable: Outer " & ex.Message, String.Empty, 0, True)

End Try

End Sub

 

Function GetDecimal(ByVal dr As DataTableReader, ByVal iCol As Integer) As Decimal

Dim DataValue As Decimal

Try

Select Case dr.GetDataTypeName(iCol)

Case "String"

If IsNumeric(dr.GetString(iCol)) Then

DataValue = CDec(dr.GetString(iCol))

End If

Case "Integer"

DataValue = CDec(dr.GetInt32(iCol))

Case "Double"

DataValue = CDec(dr.GetDouble(iCol))

Case "Decimal"

DataValue = dr.GetDecimal(iCol)

End Select

Catch ex As Exception

Me.ComponentMetaData.FireError(0, Me.ComponentMetaData.Name, "GetDecimal " & ex.Message, String.Empty, 0, True)

End Try

Return DataValue

End Function

 

Function GetInteger(ByVal dr As DataTableReader, ByVal iCol As Integer) As Integer

Dim DataValue As Integer

Try

Select Case dr.GetDataTypeName(iCol)

Case "String"

If IsNumeric(dr.GetString(iCol)) Then

DataValue = CInt(dr.GetString(iCol))

End If

Case "Integer"

DataValue = dr.GetInt32(iCol)

Case "Double"

DataValue = CInt(dr.GetDouble(iCol))

Case "Decimal"

DataValue = CInt(dr.GetDecimal(iCol))

End Select

Catch ex As Exception

Me.ComponentMetaData.FireError(0, Me.ComponentMetaData.Name, "GetInteger " & ex.Message, String.Empty, 0, True)

End Try

Return DataValue

End Function

End Class

Posted by DrJohn with no comments
Filed under: ,

Downloading Excel files from SharePoint using SSIS

In my previous post, I outlined how to obtain a recursive list of all Excel files held in a SharePoint document library. In SSIS, I use this code to upsert records into a table which identifies which files are new or modified. Clearly I could process the file in-situ, but that would over complicate my code with data streams which SSIS would not understand – especially if you are attempting to use the standard Microsoft.Jet.OLEDB.4.0 engine to process the file! So downloading the file to a local folder is the easiest option. Fortunately, I spotted Greg Enslow's post which pointed me in the direction of using the WebClient library.

In my control flow I obtain a list all the files I need to process and store it in an ADO.NET recordset. I then iterate over this list using a ForEach task, which sets various variables containing the URL, file extension and the document's name and it's GUID in SharePoint. My script task then downloads each file so it can be processed by subsequent tasks in the control flow.


 

I have expanded on Greg's original code to support file system locations as well as SharePoint URLs. Enjoy!

Imports System

Imports System.Data

Imports System.Math

Imports Microsoft.SqlServer.Dts.Runtime

Imports System.Net

Imports System.IO

 

Public Class ScriptMain

 

Public Sub Main()

' downloads the file from SharePoint or a file system location to a local folder

 

Dim taskResult As Integer = Dts.Results.Success

 

Try

' obtain location of local folder from variable

Dim dir As DirectoryInfo = New DirectoryInfo(Dts.Variables("ImportFolder").Value.ToString())

If dir.Exists Then

 

' Create the filename for local storage using

' the GUID from SharePoint as this will be unique.

Dim file As FileInfo = New FileInfo(dir.FullName & "\" & Dts.Variables("WorkbookGUID").Value.ToString() & Dts.Variables("Extension").Value.ToString())

If Not file.Exists Then

 

' get the path of the file we need to download

Dim fileUrl As String = Dts.Variables("EncodedAbsUrl").Value.ToString()

If fileUrl.Length <> 0 Then

' download the file from SharePoint or Archive file system to local folder

Dim client As New WebClient()

 

If Left(fileUrl, 4).ToLower() = "http" Then

'download the file from SharePoint

client.Credentials = System.Net.CredentialCache.DefaultCredentials

client.DownloadFile(fileUrl, file.FullName)

Else

' copy file from remote file system

System.IO.File.Copy(fileUrl, file.FullName)

End If

Else

Throw New ApplicationException("EncodedAbsUrl variable does not contain a value!")

End If

End If

Else

Throw New ApplicationException("ImportFolder does not exist!")

End If

Catch ex As Exception

 

Dts.Events.FireError(0, String.Empty, ex.Message, String.Empty, 0)

 

taskResult = Dts.Results.Failure

 

End Try

Dts.TaskResult = taskResult

End Sub

 

End Class

Posted by DrJohn with no comments
Filed under: ,

SSIS: Calling SharePoint web services from the data flow

In my previous post, I outlined how to call a SharePoint web service to obtain a list of all Excel files held in a document library.  However, that code was developed in Visual Studio where adding a reference to a web service is simplicity itself.  Doing the same in SSIS is more of a challenge as Visual Studio for Applications (VSA) does not have the Add Web Reference command familiar to Visual Studio developers.

To keep my script component simple, I decided to place it in the data flow so that I could pass a list of SharePoint sites on the input buffer and the component could output a list of Excel files on the output buffer.  If you are familiar with the Script Component you will know that it can be used in three forms: as a Source, a Destination and as a Transformation.  In its default form, the Transformation is synchronous (i.e. one line of output is written for every line of input).  As I wanted the script component to generate more rows on its output than it received, the first thing I needed to do was to switch it to asynchronous mode by changing the SynchronousInputID of the output buffer to zero.

To call the SharePoint web service from within my SSIS script component I needed to create a Visual Basic proxy class using wsdl.exe using the following command-line:

wsdl /language:VB http://yourhost/_vti_bin/lists.asmx?wsdl /out:SharePointList.vb

This proxy class was then imported into the Script Component by selecting "Add Existing Item..." from the Project menu.   I needed to delete the first few "garbage" characters and add references to the System.Xml and System.Web.Services .NET assemblies before the proxy class would compile.

Next I converted the C# code I developed in my console application (see previous post) to VB.NET using the excellent conversion routine from DeveloperFusion and pasted it into my script component.  After modifying the code to work with the input and output buffers the whole package worked fine.  Below is my final data flow.

Beware!  If you need to call two different SharePoint web services from within the same script component, you will need to edit the code generated by wsdl.exe and add a Namespace in order to avoid conflicts.


 

Posted by DrJohn with 3 comment(s)
Filed under: ,

Converting C# to VB.NET for use in SSIS

If you much prefer C# over VB.NET but are forced to use VB.NET as that is the only language supported by SSIS, you will love the tool I discovered.  Some cool guys over at DeveloperFusion have created an on-line C# to VB.NET convertor which makes the transition simplicity itself.  It also works the other way around. So if you want to convert ugly VB.NET into cool C#, this is your answer.  Easy!

Posted by DrJohn with 2 comment(s)
Filed under: ,
More Posts Next page »