There are often circumstances where an SSAS stored procedure needs to connect to the transactional database to perform a query.  This is certainly the case in a near-real-time OLAP solution where the cube sits directly on top of the application database and dynamic dimension security is implemented by a stored proc. Here the SSAS stored procedure has to query the transactional database in order to get the user's authorisation. Clearly the SSAS database has a data source and it uses this connection to process the cube. So how do we get the connection string?

Well the short code snippet below provides the solution. The code simply connects back to the current database using AMO and obtains the connection string from the data source object. It then removes the redundant "Provider=SQLNCLI.1;" before returning a valid SQL connection string that can be used by SqlConnection().

Of course, you must provide your stored proc assembly with the "Unrestricted" permission set and set its impersonation mode to "Service Account" for the connection string to work.

using AMO = Microsoft.AnalysisServices;

using Microsoft.AnalysisServices.AdomdServer;

...

/// <summary>

/// Obtains the SqlConnection string used by the OLAP database

/// </summary>

/// <returns></returns>

public static string GetSqlConnectionString()

{

    // connect to the current instance of Analysis Services and return the transactional data source

    AMO.Server svr = new AMO.Server();

    svr.Connect("Data Source=" + Context.CurrentServerID + ";Initial Catalog=" + Context.CurrentDatabaseName);

    // get the connection to the transactional database from the OLAP connection manager

    string connStr = svr.Databases.GetByName(Context.CurrentDatabaseName).DataSources[0].ConnectionString.Replace("Provider=SQLNCLI.1;", "");

    svr.Disconnect();

    return (connStr);

}

Posted by DrJohn | with no comments
Filed under: ,

I have always looked down upon the Fact and Dim prefixes applied to the tables within the AdventureWorksDW database with derision. "Why," I asked myself, "would anyone feel the need to use such prefixes? Clearly this is BI for dummies." Given that most BI projects use a nice clean, dedicated data warehouse database where all the tables are imported into the cube, perhaps my derision was justified.

However, in my current project I have been building a near-real-time OLAP solution directly on top of an application database. Of course, to isolate the cube from potential schema changes, I created an abstraction layer in the form of a set of views. However, as the project has progressed and the large team of developers has created more and more database objects, it has become increasingly difficult to find the views created for the cube among the debris.

I recently had to re-build the entire cube from scratch due to a bug caused by reverting to a previous version in source control. Of course, I took the opportunity to consolidate all my T-SQL code into a few files and adopt a consistent naming convention for all my views. So what naming convention did I adopt? Well Fact and Dim of course! Why? Well because anyone following on from my work who is half-familiar with the BI sample databases will immediately recognise the relevance.

I subsequently found out the big benefit of adopting this convention – it makes creating a DSV much, much easier! A quick search for all objects containing 'Fact' and 'Dim' soon had my new DSV populated with all the relevant objects.

So the Fact is that this practice is not so Dim after all!

Posted by DrJohn | 2 comment(s)
Filed under:

There are some things in the life that are so implicit and taken for granted that nobody ever bothers to tell you about them or write up any documentation. This is how I feel the implicit support for date and time measures must have come about in Microsoft SQL Server 2005 Analysis Services (SSAS). Certainly I could not find any documentation about the topic and yet intriguingly both Measures and Calculated Members have "Simple Date" and "Simple Time" in their drop down list of possible text formats.

Clearly SSAS will allow you to add attributes to a dimension that have the data type DateTime. However, SSAS does not allow you to create a measure based on DateTime data type. Now this is a bit of a drawback when you want to calculate the average time a process took, which is exactly what I needed to do. After searching the internet, I was none the wiser as to how to solve the problem so some experimentation was in order.

An obvious approach would be to create a measure which contains the elapse time for the process in minutes. This would certainly give the right answer when averaged. However, presenting the resulting number in a format that was meaningful to the end-user is beyond standard out-of-the-box functionality. Especially when the elapse time needs to be presented in financial speak such as T+1 15:30 (i.e. the next day at 3:30pm). And since the end-user would be browsing the cube with Excel 2007, standard out-of-the-box functionality is all I could use.

My data was pretty simple:

CREATE TABLE dbo.WhenEventHappened(

SurrogateKey int NOT NULL,

MeasurementTypeID int NOT NULL,

ReferenceDate datetime NOT NULL,

DateTimeEventHappened datetime NOT NULL

) ON [PRIMARY]

 

Where ReferenceDate holds the date the event is relative to and DateTimeEventHappened clearly holds the date & time when the event took place. So for example, if we use the financial services analogy the trade took place on the ReferenceDate and it was settled on DateTimeEventHappened. So the elapse time is (DateTimeEventHappened – ReferenceDate) in days, hours and minutes. In my data, ReferenceDate is always a date with no time element.

The SQL Server DateTime data type

You probably know that behind the scenes SQL Server holds dates as a decimal number. The whole number contains the number of days since 1900 and the fraction represents the date. For example, 39415.5 represents 2007-12-01 12:00:00

So, if

ReferenceDate= 2007-12-01 00:00:00

and

DateTimeEventHappened=2007-12-01 12:00:00

then

(DateTimeEventHappened – ReferenceDate)= 1900-01-01 12:00:00 which is held as 0.5

OK so this gives me the elapse time, but now do I get it into the cube? Well the answer is simple. Convert the DateTime into a decimal number. To do this you need to apply the relevant cast/convert such as:

CONVERT(float, (A.DateTimeEventHappened - A.ReferenceDate)) AS EventDateDiffDecimal

To import this data into the cube, I created a view that performed the relevant cast/converts.


Now, for illustration of what happens between SQL and SSAS, I have added some redundant columns to my view just to make it obvious what is really going on here. So here is my data, including the extra ReferenceDateDecimal and DateTimeEventHappenedDecimal columns.  I have also created some simple data which is shown below:

SQL Data

Next I simply added this new view to the cube's Data Source View and added the EventTimeDecimal and EventDateDiffDecimal plus the extra columns to my example cube as a new measure group which automatically sets the AggregateFunction to Sum. So here is the result:

Cube Measures

Now apply magic!

OK, so the above is not so useful, so let's apply some magic. Simply set the FormatString and AggregateFunction for each measure as follows:

Column

Format String

AggregateFunction

Reference Date Decimal

dd-mmm-yyyy

AverageOfChildren

Date Time Event Happened Decimal

dd-mmm-yyyy hh:mm

AverageOfChildren

Event Time Decimal

hh:mm

AverageOfChildren

Event Date Diff Decimal

dd-mmm-yyyy hh:mm

AverageOfChildren

So after re-deploying the cube, we get:

Formatted Cube Data 

 

Oh look, wrong dates!

Well we have dates, but not the correct ones! It seems that date zero in SQL Server is 01-Jan-1990 and date zero in SSAS is 30-Dec-1899! To fix the problem, we simply add two to the values provided by our EventElapsePeriod view. After processing the cube we now get the correct dates:

Cube dates plus 2

However, I want the EventDateDiffDecimal to be presented in the format T+1 15:30. So I need to subtract one from the EventDateDiffDecimal value so that I get 1-Jan as my date. Then I can format the value, ignoring the month and year using the FormatString "\T\+d hh:mm". Now purist among you may point out that this strategy will not work when the date gets bigger than 31 as it will flick over T+1 again. This is true, but I have defined EventDateDiffDecimal as a semi-additive measure so I will not hit this problem.

So here is the final formatted data:

CubeDateTPlus1

Note how the Grand Total is correctly calculated. Just magic!

If you would like to try this for yourself, I have published the T-SQL scripts and cube for download here.  Note that the ZIP file also contains the SQL script to re-create the sample table and view used in this article.

Good Luck!

Posted by DrJohn | 1 comment(s)
Filed under:

Jamie Thompson gave me some very helpful feedback on my recently published article over at SQLCentral.com called Cleaning Address Data with SSIS Using a Web Service and my previous blog entry on Calling SharePoint web services from the data flow. Jamie pointed out that I had assumed my audience would be fully familiar with the .NET framework and terms such as WSDL and Proxy Class need further explanation. So here is my attempt to clarify some issues he pointed out.

  • WSDL
    WSDL is short for Web Service Description Language and is the definition of what the web service provides and how it can be invoked. A WSDL document is an XML file which lists all the methods provided by the web service and their parameters. The WSDL document is returned by a web service when "?wsdl" is appended to the web service's URL (e.g. http://services.postcodeanywhere.co.uk/uk/lookup.asmx?wsdl).
  • Proxy Class
    A proxy class simplifies the interaction between your application and a web service. The .NET Framework provides a command-line utility called wsdl.exe which generates the code for a proxy class automatically. In Visual Studio, the Add Web Reference command automatically executes wsdl.exe and generated the proxy class for you. However, with SSIS you need to create a proxy class manually using a command-line such as this:
    e.g.
    wsdl /language:VB http://services.postcodeanywhere.co.uk/uk/lookup.asmx?wsdl /out:c:\PostcodeAnywhere.vb
  • GAC
    The Global Assembly Cache (GAC) is the central repository for sharing assemblies (DLLs). In order to register an assembly in the GAC, it must be strong named (i.e. must have a unique hash value, public key, locale and version number).

If you want to know more about the .NET Framework, then read O'Reilly's .NET Framework Essentials, 3rd Edition. 

 

Posted by DrJohn | with no comments
Filed under: , ,

Excel spreadsheets with complicated layouts really do not lend themselves to processing with the standard Microsoft OLE DB drivers. Indeed, extracting one or two cells from different worksheets can be positively painful! This is where the power and flexibility of Aspose.Cells comes into its own.

Aspose.Cells provides an API that looks very similar to the OLE automation API exposed by Excel in the Office InterOp assemblies. However, you do not get all the headaches normally associated with the InterOp assemblies (i.e. modal dialogs, memory leaks, multiple Excel instances etc. etc.). With Aspose.Cells you do not need Excel to be installed on your server. Instead you only need a single assembly (Aspose.Cells.dll ) and the accompanying license file (Aspose.Cells.lic). Simple!

Working with Aspose.Cells in SSIS

To work with Aspose.Cells, you need to use the Script Task or Script Component. My personal preference is to use the Script Component in its asynchronous transformation mode. This way I can feed in a list of Workbooks that I want the component to process on its input buffer and the component can write the data read from the Workbooks onto multiple output buffers (one buffer for each type of data).

As SSIS uses Visual Studio for Applications (VSA), you do not have the flexible "Add Reference" dialog found in Visual Studio. In order to get around this issue while you are developing, you need to copy the Aspose.Cells assembly and associated license file into your .NET folder, which is usually:

C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727

Restart Visual Studio and you will see Aspose.Cells listed on VSA's "Add Reference" dialog. Now you can add the appropriate "imports" statement to the top of your module and start coding. Easy!

Later, when you move to production, you will need to register the Aspose.Cells assembly in the GAC by simply dropping the assembly into the following folder:

C:\WINDOWS\assembly

Dealing with the Aspose.Cells License file

Aspose.Cells has a license file which can either be in the same directory as the assembly, or in your own location. The easiest way to handle this in SSIS is to use a File Connection Manager which records the location of the license file. Then in code, you will need to add:

Dim license As Aspose.Cells.License = New Aspose.Cells.License()

license.SetLicense(Me.Connections.AsposeCellsLicense.ConnectionString)

 

Example Code

My sample code reads a list of workbooks from the input buffer and writes the extracted data to the output buffer.  The data flow is shown below along with the code from my asynchronous Script Component. Enjoy!

Aspose.Cells Script Component in the Data Flow 

' Aspose.Cells.dll must be registered in the GAC for production

' AND the C:\WINNT\Microsoft.NET\Framework\v2.0.50727 for development

 

' standard libraries

Imports System

Imports System.Data

Imports System.Math

Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper

Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

' additional libraries

Imports System.IO

Imports System.Collections.Generic

Imports Aspose.Cells

 

Public Class ScriptMain

Inherits UserComponent

 

Public Overrides Sub WorkbookList_ProcessInputRow(ByVal Row As WorkbookListBuffer)

 

' Imports all data from the the Excel workbooks provided on the input

 

' Note that each workbook contains several worksheets.

' For efficiency, we open the workbook once and then

' process all the worksheets in the file

 

Try

' The previous step in the package downloaded the workbook.

Dim fInfo As FileInfo = New FileInfo(Path.Combine(Me.Variables.ImportFolder, Row.URL))

If fInfo.Exists Then

 

' Set Aspose.Cells license

Dim license As Aspose.Cells.License = New Aspose.Cells.License()

license.SetLicense(Me.Connections.AsposeCellsLicense.ConnectionString)

 

' Open the workbook

Dim wrkBook As Workbook = New Workbook()

wrkBook.Open(fInfo.FullName)

 

' process each worksheet

For Each wrkSheet As Worksheet In wrkBook.Worksheets

Try

' Check we have a valid worksheet by testing

' for text prompts in specific locations

If ValidWorksheetLayout(wrkSheet) Then

' process each worksheet found in the workbook

' that has the correct format

ImportWorksheetData(Row.WorkbookID, wrkSheet, Row)

End If

Catch exi As Exception

Me.ComponentMetaData.FireInformation(0, Me.ComponentMetaData.Name, "WorkbookList_ProcessInputRow: Invalid Worksheet Layout: " & wrkSheet.Name, String.Empty, 0, True)

End Try

Next

Else

Me.ComponentMetaData.FireError(0, Me.ComponentMetaData.Name, "WorkbookList_ProcessInputRow: Missing Excel File: " & fInfo.FullName, String.Empty, 0, True)

End If

Catch ex As Exception

Me.ComponentMetaData.FireError(0, Me.ComponentMetaData.Name, "WorkbookList_ProcessInputRow: " & ex.Message, String.Empty, 0, True)

End Try

End Sub

 

Private Function ValidWorksheetLayout(ByVal wrkSheet As Worksheet) As Boolean

' checks if data is in expected locations. If not, then return false.

If wrkSheet.Cells("B3").Value.ToString = "Profit and loss projections" And wrkSheet.Cells("B14").Value.ToString = "Revenue" Then

Return True

End If

Return False

End Function

 

Private Sub ImportWorksheetData(ByVal WorkbookID As Integer, ByVal wrkSheet As Worksheet, ByVal Row As WorkbookListBuffer)

' imports all data from the worksheet

Dim iRow As Integer

Dim iCol As Integer

Dim iCount As Integer

Dim dataRange As DataTable

Dim dr As DataTableReader

Dim HeaderRow As List(Of Integer)

Try

' heading row is in seperate location to data, so import heading first

' note that Aspose.Cells uses zero-based row and column numbers in ExportDataTable call

' so Cell=4,Row=12 is actually cell E13

Try

' read single header row into generic list

HeaderRow = New List(Of Integer)

dataRange = wrkSheet.Cells.ExportDataTable(12, 4, 1, 5)

dr = dataRange.CreateDataReader()

If dr.Read Then

For iCount = 0 To 4

HeaderRow.Add(GetInteger(dr, iCount))

Next

End If

 

Catch ex0 As Exception

Me.ComponentMetaData.FireError(0, Me.ComponentMetaData.Name, "ImportWorksheetData ex0: " & ex0.Message, String.Empty, 0, True)

End Try

 

Try

dataRange = wrkSheet.Cells.ExportDataTable(14, 2, 2, 7)

dr = dataRange.CreateDataReader()

While dr.Read

ProcessDataTable(WorkbookID, dr, 2, 6, HeaderRow)

End While

 

Catch ex1 As Exception

Me.ComponentMetaData.FireError(0, Me.ComponentMetaData.Name, "ImportWorksheetData ex1: " & ex1.Message, String.Empty, 0, True)

End Try

 

Try

dataRange = wrkSheet.Cells.ExportDataTable(19, 2, 2, 7)

dr = dataRange.CreateDataReader()

While dr.Read

ProcessDataTable(WorkbookID, dr, 2, 6, HeaderRow)

End While

Catch ex2 As Exception

Me.ComponentMetaData.FireError(0, Me.ComponentMetaData.Name, "ImportWorksheetData ex2: " & ex2.Message, String.Empty, 0, True)

End Try

 

Try

dataRange = wrkSheet.Cells.ExportDataTable(23, 2, 9, 7)

dr = dataRange.CreateDataReader()

While dr.Read

ProcessDataTable(WorkbookID, dr, 2, 6, HeaderRow)

End While

Catch ex3 As Exception

Me.ComponentMetaData.FireError(0, Me.ComponentMetaData.Name, "ImportWorksheetData ex3: " & ex3.Message, String.Empty, 0, True)

End Try

 

Catch exOuter As Exception

Me.ComponentMetaData.FireError(0, Me.ComponentMetaData.Name, "ImportWorksheetData: outer " & exOuter.Message, String.Empty, 0, True)

End Try

End Sub

 

Public Sub ProcessDataTable(ByVal WorkbookID As Integer, ByVal dr As DataTableReader, ByVal iStartDataCol As Integer, ByVal iDataColCount As Integer, ByVal HeaderRow As List(Of Integer))

Dim iCol As Integer

Dim DataValue As Decimal

Dim bAddRow As Boolean

Dim TimePeriodID As Integer

Dim OutputCount As Integer = 0

 

Try

For iCol = iStartDataCol To iDataColCount

bAddRow = False

Try

If Not dr.IsDBNull(iCol) Then

DataValue = GetDecimal(dr, iCol)

If DataValue <> 0 Then bAddRow = True

If bAddRow Then

With Me.WorksheetDataBuffer

.AddRow()

' get header from the row we read into a list earlier

.FinancialYear = HeaderRow(iCol - 2)

.WorkbookID = WorkbookID

.RowName = dr.GetString(0)

.DataValue = DataValue

End With

End If

End If

Catch ex1 As Exception

Me.ComponentMetaData.FireError(0, Me.ComponentMetaData.Name, "ProcessDataTable: Inner " & ex1.Message, String.Empty, 0, True)

End Try

Next

 

Catch ex As Exception

Me.ComponentMetaData.FireError(0, Me.ComponentMetaData.Name, "ProcessDataTable: Outer " & ex.Message, String.Empty, 0, True)

End Try

End Sub

 

Function GetDecimal(ByVal dr As DataTableReader, ByVal iCol As Integer) As Decimal

Dim DataValue As Decimal

Try

Select Case dr.GetDataTypeName(iCol)

Case "String"

If IsNumeric(dr.GetString(iCol)) Then

DataValue = CDec(dr.GetString(iCol))

End If

Case "Integer"

DataValue = CDec(dr.GetInt32(iCol))

Case "Double"

DataValue = CDec(dr.GetDouble(iCol))

Case "Decimal"

DataValue = dr.GetDecimal(iCol)

End Select

Catch ex As Exception

Me.ComponentMetaData.FireError(0, Me.ComponentMetaData.Name, "GetDecimal " & ex.Message, String.Empty, 0, True)

End Try

Return DataValue

End Function

 

Function GetInteger(ByVal dr As DataTableReader, ByVal iCol As Integer) As Integer

Dim DataValue As Integer

Try

Select Case dr.GetDataTypeName(iCol)

Case "String"

If IsNumeric(dr.GetString(iCol)) Then

DataValue = CInt(dr.GetString(iCol))

End If

Case "Integer"

DataValue = dr.GetInt32(iCol)

Case "Double"

DataValue = CInt(dr.GetDouble(iCol))

Case "Decimal"

DataValue = CInt(dr.GetDecimal(iCol))

End Select

Catch ex As Exception

Me.ComponentMetaData.FireError(0, Me.ComponentMetaData.Name, "GetInteger " & ex.Message, String.Empty, 0, True)

End Try

Return DataValue

End Function

End Class

Posted by DrJohn | with no comments
Filed under: ,

In my previous post, I outlined how to obtain a recursive list of all Excel files held in a SharePoint document library. In SSIS, I use this code to upsert records into a table which identifies which files are new or modified. Clearly I could process the file in-situ, but that would over complicate my code with data streams which SSIS would not understand – especially if you are attempting to use the standard Microsoft.Jet.OLEDB.4.0 engine to process the file! So downloading the file to a local folder is the easiest option. Fortunately, I spotted Greg Enslow's post which pointed me in the direction of using the WebClient library.

In my control flow I obtain a list all the files I need to process and store it in an ADO.NET recordset. I then iterate over this list using a ForEach task, which sets various variables containing the URL, file extension and the document's name and it's GUID in SharePoint. My script task then downloads each file so it can be processed by subsequent tasks in the control flow.

Control Flow 

I have expanded on Greg's original code to support file system locations as well as SharePoint URLs. Enjoy!

Imports System

Imports System.Data

Imports System.Math

Imports Microsoft.SqlServer.Dts.Runtime

Imports System.Net

Imports System.IO

 

Public Class ScriptMain

 

Public Sub Main()

' downloads the file from SharePoint or a file system location to a local folder

 

Dim taskResult As Integer = Dts.Results.Success

 

Try

' obtain location of local folder from variable

Dim dir As DirectoryInfo = New DirectoryInfo(Dts.Variables("ImportFolder").Value.ToString())

If dir.Exists Then

 

' Create the filename for local storage using

' the GUID from SharePoint as this will be unique.

Dim file As FileInfo = New FileInfo(dir.FullName & "\" & Dts.Variables("WorkbookGUID").Value.ToString() & Dts.Variables("Extension").Value.ToString())

If Not file.Exists Then

 

' get the path of the file we need to download

Dim fileUrl As String = Dts.Variables("EncodedAbsUrl").Value.ToString()

If fileUrl.Length <> 0 Then

' download the file from SharePoint or Archive file system to local folder

Dim client As New WebClient()

 

If Left(fileUrl, 4).ToLower() = "http" Then

'download the file from SharePoint

client.Credentials = System.Net.CredentialCache.DefaultCredentials

client.DownloadFile(fileUrl, file.FullName)

Else

' copy file from remote file system

System.IO.File.Copy(fileUrl, file.FullName)

End If

Else

Throw New ApplicationException("EncodedAbsUrl variable does not contain a value!")

End If

End If

Else

Throw New ApplicationException("ImportFolder does not exist!")

End If

Catch ex As Exception

 

Dts.Events.FireError(0, String.Empty, ex.Message, String.Empty, 0)

 

taskResult = Dts.Results.Failure

 

End Try

Dts.TaskResult = taskResult

End Sub

 

End Class

Posted by DrJohn | with no comments
Filed under: ,

In my previous post, I outlined how to call a SharePoint web service to obtain a list of all Excel files held in a document library.  However, that code was developed in Visual Studio where adding a reference to a web service is simplicity itself.  Doing the same in SSIS is more of a challenge as Visual Studio for Applications (VSA) does not have the Add Web Reference command familiar to Visual Studio developers.

To keep my script component simple, I decided to place it in the data flow so that I could pass a list of SharePoint sites on the input buffer and the component could output a list of Excel files on the output buffer.  If you are familiar with the Script Component you will know that it can be used in three forms: as a Source, a Destination and as a Transformation.  In its default form, the Transformation is synchronous (i.e. one line of output is written for every line of input).  As I wanted the script component to generate more rows on its output than it received, the first thing I needed to do was to switch it to asynchronous mode by changing the SynchronousInputID of the output buffer to zero.

To call the SharePoint web service from within my SSIS script component I needed to create a Visual Basic proxy class using wsdl.exe using the following command-line:

wsdl /language:VB http://yourhost/_vti_bin/lists.asmx?wsdl /out:SharePointList.vb

This proxy class was then imported into the Script Component by selecting "Add Existing Item..." from the Project menu.   I needed to delete the first few "garbage" characters and add references to the System.Xml and System.Web.Services .NET assemblies before the proxy class would compile.

Next I converted the C# code I developed in my console application (see previous post) to VB.NET using the excellent conversion routine from DeveloperFusion and pasted it into my script component.  After modifying the code to work with the input and output buffers the whole package worked fine.  Below is my final data flow.

Beware!  If you need to call two different SharePoint web services from within the same script component, you will need to edit the code generated by wsdl.exe and add a Namespace in order to avoid conflicts.

Upsert Data Flow 

Posted by DrJohn | 2 comment(s)
Filed under: ,

If you much prefer C# over VB.NET but are forced to use VB.NET as that is the only language supported by SSIS, you will love the tool I discovered.  Some cool guys over at DeveloperFusion have created an on-line C# to VB.NET convertor which makes the transition simplicity itself.  It also works the other way around. So if you want to convert ugly VB.NET into cool C#, this is your answer.  Easy!

Posted by DrJohn | 2 comment(s)
Filed under: ,
My challenge was simple.  I needed to develop an SSIS package that would download and extract data from every Excel file held in document libraries across several SharePoint sites.  SSIS was the natural choice as the data needed to be cleaned and validated before being imported into a database. However, SSIS is not great with web services – especially in the data flow. As I not worked with the SharePoint web services much, I started with a good old Console application.
 
MOSS, or more accurately, WSS provides a whole host of web services to obtain information about SharePoint sites.  However, figuring out which method to invoke and what parameters to pass is more problematic.  Especially as many of the parameters are chunks of Collaborative Application Mark-up Language (CAML) – a dialect of XML developed by Microsoft specifically for use with SharePoint.
 
A False Start
My first console app simply obtained the GUID of the document library using the GetListCollection() method of the Lists web service.  The GUID was then passed to the GetListItems() method which duly provided all documents and folders at the top level of the document library.  It then seemed logical to me to recursively call the GetListItems() method using the GUID of each sub-folder.  On no, how wrong could I be!  The GetListItems() method simply chokes on these folder GUIDs.
 
On searching the internet I found many other incorrect forum posts and blog entries about the same topic – but no working solutions.  I also made an extensive search of my eBook collection – but again no solutions – which overall motivated me to write this blog entry.
 
The solution - RTFM
Well, if I had read the whole page in the manual, I would have got to the solution earlierCrying.  The key to my puzzle was the QueryOptions XML fragment which has both a Folder element and the all important <ViewAttributes Scope="Recursive" /> element. Using these elements together, it is possible to obtain a list of all documents in all subfolders in the list. Indeed, it does not even bother returning the subfolder details!
 
So here is the code for my working C# sample.
 

using System;

using System.Collections.Generic;

using System.Text;

using System.Xml;

using System.Web.Services;

using System.Web;

using System.Net;

 

namespace ConsoleApplication1

{
    class Program
    {

        static void Main(string[] args)
        {

string siteUrl = @"http://yourserver/sites/yoursite";

string documentLibraryName = @"Shared Documents";

SharePointList.Lists wsList = new SharePointList.Lists();

wsList.Credentials = System.Net.CredentialCache.DefaultCredentials;

WebProxy proxyObj = new WebProxy("yourproxy", 80);

wsList.Proxy = proxyObj;

wsList.Url = siteUrl + @"/_vti_bin/lists.asmx";

 

// get a list of all top level lists

XmlNode allLists = wsList.GetListCollection();

// load into an XML document so we can use XPath to query content

XmlDocument allListsDoc = new XmlDocument();

allListsDoc.LoadXml(allLists.OuterXml);

// allListsDoc.Save(@"c:\allListsDoc.xml"); // for debug

XmlNamespaceManager ns = new XmlNamespaceManager(allListsDoc.NameTable);

ns.AddNamespace("d", allLists.NamespaceURI);

 

// now get the GUID of the document library we are looking for

XmlNode dlNode = allListsDoc.SelectSingleNode("/d:Lists/d:List[@Title='" + documentLibraryName + "']", ns);

if (dlNode == null)

{

Console.WriteLine("Document Library '{0}' not found!", documentLibraryName);

}

else

{

// obtain the GUID for the document library and the webID

string documentLibraryGUID = dlNode.Attributes["ID"].Value;

string webId = dlNode.Attributes["WebId"].Value;

 

Console.WriteLine("Opening folder '{0}' GUID={1}", documentLibraryName, documentLibraryGUID);

 

// create ViewFields CAML

XmlDocument viewFieldsDoc = new XmlDocument();

XmlNode ViewFields = AddXmlElement(viewFieldsDoc, "ViewFields", "");

AddFieldRef(ViewFields, "GUID");

AddFieldRef(ViewFields, "ContentType");

AddFieldRef(ViewFields, "BaseName");

AddFieldRef(ViewFields, "Modified");

AddFieldRef(ViewFields, "EncodedAbsUrl");

//viewFieldsDoc.Save(@"c:\viewFields.xml"); // for debug

 

// create QueryOptions CAML

XmlDocument queryOptionsDoc = new XmlDocument();

XmlNode QueryOptions = AddXmlElement(queryOptionsDoc, "QueryOptions", "");

AddXmlElement(QueryOptions, "Folder", documentLibraryName);

AddXmlElement(QueryOptions, "IncludeMandatoryColumns", "FALSE");

// this element is the key to getting the full recusive list

XmlNode node = AddXmlElement(QueryOptions, "ViewAttributes", "");

AddXmlAttribute(node, "Scope", "Recursive");

queryOptionsDoc.Save(@"c:\queryOptions.xml"); // for debug

 

// obtain the list of items in the document library

XmlNode listContent = wsList.GetListItems(documentLibraryGUID, null, null, ViewFields, null, QueryOptions, webId);

 

XmlDocument xmlResultsDoc = new XmlDocument();

xmlResultsDoc.LoadXml(listContent.OuterXml);

ns = new XmlNamespaceManager(xmlResultsDoc.NameTable);

ns.AddNamespace("z", "#RowsetSchema");

// xmlResultsDoc.Save(@"c:\listContent.xml"); // for debug

 

XmlNodeList rows = xmlResultsDoc.SelectNodes("//z:row", ns);

if (rows.Count == 0)

{

Console.WriteLine("No content found");

}

foreach (XmlNode row in rows)

{

Console.WriteLine(row.Attributes["ows_ContentType"].Value + " " + row.Attributes["ows_GUID"].Value + " :: " + row.Attributes["ows_BaseName"].Value);

}

}

Console.WriteLine("Done");

Console.Read();

}

 

public static XmlNode AddXmlElement(XmlNode parent, string elementName, string elementValue)

{

XmlNode element = parent.AppendChild(parent.OwnerDocument.CreateNode(XmlNodeType.Element, elementName, ""));

if (elementValue != "")

element.InnerText = elementValue;

return (element);

}

 

public static XmlNode AddXmlElement(XmlDocument parent, string elementName, string elementValue)

{

XmlNode element = parent.AppendChild(parent.CreateNode(XmlNodeType.Element, elementName, ""));

if (elementValue != "")

element.InnerText = elementValue;

return (element);

}

 

public static XmlNode AddXmlAttribute(XmlNode element, string attrName, string attrValue)

{

XmlNode attr = element.Attributes.Append((XmlAttribute)element.OwnerDocument.CreateNode(XmlNodeType.Attribute, attrName, ""));

if (attrValue != "")

attr.Value = attrValue;

return (attr);

}

 

public static void AddFieldRef(XmlNode viewFields, string fieldName)

{

    XmlNode fieldRef = AddXmlElement(viewFields, "FieldRef", "");

    AddXmlAttribute(fieldRef, "Name", fieldName);

}

}

}


Posted by DrJohn | 2 comment(s)
Filed under: ,