SQL Server Community Blogs

Voices of the SQL Server Community
Welcome to SQL Server Community Blogs Sign in | Join | Help
in Search

Jorg Klein's Microsoft Business Intelligence Blog [Macaw]

  • SSIS - Convert various String date formats to DateTime with the Script Task

    The script below is really nice if you need to convert String dates that come in different (unexpected) formats.
    Just copy and paste the code in your Script Task (paste the function outside Main()) and it works right away.

    Supported date formats:

    • YYMMDD
    • YY-MM-DD
    • YYYYMMDD
    • YYYY-MM-DD

    Of course it's possible to add code for more date formats yourself. If you want to, copy and paste your code in a comment. I will then add the code to this blog.

    ---------------------------------------------------------------------------------------------------------------------------

    Public Shared Function GetDateFromString(ByVal stringDate As String) As DateTime

            Dim datetimeResult As DateTime

            Try

                Dim centuryToAdd As Integer = 1900

                If (Convert.ToInt32(stringDate.Substring(0, 2)) < 80) Then

                    centuryToAdd = 2000

                End If

                If (stringDate.Length = 6) Then

                    'Format is: YYMMDD

                    datetimeResult = New DateTime((centuryToAdd + Convert.ToInt32(stringDate.Substring(0, 2))), Convert.ToInt32(stringDate.Substring(2, 2)), Convert.ToInt32(stringDate.Substring(4, 2)), 0, 0, 0)

                    Return datetimeResult

                End If

                If (stringDate.Length = 8) Then

                    If (stringDate.IndexOf("-") > 0) Then

                        'Format is: YY-MM-DD

                        datetimeResult = New DateTime((centuryToAdd + Convert.ToInt32(stringDate.Substring(0, 2))), Convert.ToInt32(stringDate.Substring(3, 2)), Convert.ToInt32(stringDate.Substring(6, 2)), 0, 0, 0)

                        Return datetimeResult

                    End If

                    'Format is: YYYYMMDD

                    datetimeResult = New DateTime(Convert.ToInt32(stringDate.Substring(0, 4)), Convert.ToInt32(stringDate.Substring(4, 2)), Convert.ToInt32(stringDate.Substring(6, 2)), 0, 0, 0)

                    Return datetimeResult

                End If

                If (stringDate.Length = 10) Then

                    'Format is: YYYY-MM-DD

                    datetimeResult = New DateTime(Convert.ToInt32(stringDate.Substring(0, 4)), Convert.ToInt32(stringDate.Substring(5, 2)), Convert.ToInt32(stringDate.Substring(8, 2)), 0, 0, 0)

                    Return datetimeResult

                End If

                Return Convert.ToDateTime(stringDate)

            Catch e As Exception

            End Try

            'No date format found: Return unknown(1/1/1900)

            datetimeResult = New DateTime(1900, 1, 1, 0, 0, 0)

            Return datetimeResult

        End Function


    ---------------------------------------------------------------------------------------------------------------------------

    If you want to convert a String SSIS variable and load it into a DateTime SSIS variable, use the following code in your Script Task:

    Dts.Variables("someDateTimeVariable").Value = GetDateFromString(Dts.Variables("someStringVariable").Value.ToString)

     
  • MCITP - I passed the 70-446 “PRO: Designing a Business Intelligence Infrastructure by Using Microsoft SQL Server 2005” exam!

    Today I passed the Microsoft IT Professional(MCITP) 70-446 “PRO: Designing a Business Intelligence Infrastructure by Using Microsoft SQL Server 2005” exam.
    This makes me one of the 337(January, 2008) Microsoft Certified IT Professionals worldwide! (
    Nr. of MCP's worldwide
    )


    MCITP

    To try the 70-446 exam its required to gain the 70-445 MCTS certification first. I have
    passed the 70-445 exam in November last year.

    The first conclusion I can make is that, strangely, I found the 70-445 exam more difficult than the 70-446 exam. I scored 70% on the 70-445 exam and managed to score 93% on the 70-446 exam. Partially this is because I gained more work experience in the last months and partially it’s because the 70-446 exam contains easier question types.

    The exam topics of exam 70-446 (full list: Microsoft Learning)
    The following list includes the topic areas covered on this exam. The percentage indicates the portion of the exam that addresses a particular skill.
    ·        
    Planning BI Solutions (15 percent)
    ·        
    Designing SSIS Solutions (21 percent)
    ·        
    Designing SSRS Solutions (14 percent)
    ·        
    Designing SSAS Solutions (22 percent)
    ·        
    Deploying and Optimizing SSAS Solutions (15 percent)
    ·         Designing Data Mining Solutions (13 percent)

    If we compare this with exam 70-445 we see that data mining is still quite important. Further we see that SSRS gets less attention in exam 70-446.

    70-445 vs 70-446
      70-445 70-446
    SSIS 27% 21%
    SSAS 30% 37%
    SSRS 32% 14%
    Data mining 12% 13%
    Planning BI Solutions - 15%


    Data mining 
    Just like with exam 70-445 data mining is an important topic of the exam! It’s important to know when to use the different algorithms that are shipped with SSAS.

    Training Kit
    Unfortunately there is no training kit available for this exam. You can find the book on some online bookstores but all with the following message: not available. After some research I found out that this book will never come out. I think MS Press will wait until the 70-446 exam for SQL Server 2008 is available.

    Number of questions and completing time
    The exam contains 6 case studies with each 9 or 10 questions. You have 30 minutes for each case, any remaining time from a case won’t go to the next case, so just use your 30 minutes. The 30 minutes were enough for me. I did not study the entire case study in detail before I looked at the questions. I think this is the best thing to do; quickly scan the case study, then look at the questions and then scan the case study for required information regarding the question.
    Just like the 70-445 exam you will require a minimum score of 70% for the exam to pass.

    Question types
    I already mentioned that the 70-446 exam question types were easier than the 70-445 question types. In this exam I only found the question type below, where just one answer can be selected. 70-445 had another 3 question types that were a lot harder then the one below.

    QType

    Some useful tips
    ·         Make sure you know exactly how many cube processing types there are and how they work.
    ·        
    You need to be able to read database schemes.
    ·         The different OLAP types, check this
    blog of me for more information.
    ·         Data mining is important!

    What should you study
    The lack of a training kit and other study materials designed for this exam makes your studying difficult. What should you study and where should you start? 

    Microsoft does have a preparation guide for this exam on the Microsoft learning website. On this page you’ll find the following interesting topics:

    Preparation tools and resources
    This section contains links to classroom courses, Microsoft E-learning resources, Microsoft Press books and practice tests.
    I did all the classroom courses and they helped me prepare for the exam! If you have the possibility to take those courses you definitely should!
    I personally don’t advise you to buy all the Microsoft Press books to study. It’s just too much information and it will cost a lot of study time.
    Practice tests are very useful as preparation, I would advice everybody to purchase a set of preparation questions.

    Skills being measured
    This list contains all the possible exam topics. If you want to prepare yourself efficiently you should just study every item on this list. Search on Google and books online and you will find enough information! I think this way of preparing for the exam is the best way!

    How did I study
    I studied all the topics of the Skills being measured list that you will find in the preparation guide. Next to studying the Skills being measured, I followed all Microsoft Official Courses that are recommended as preparation for this exam:

    Course 2794: Designing a Business Intelligence Solution Architecture for the Enterprise Using Microsoft SQL Server 2005 (two days)

    Course 2795: Designing an ETL Solution Architecture Using Microsoft SQL Server 2005 Integration Services (two days)

    Course 2796: Designing an Analysis Solution Architecture Using Microsoft SQL Server 2005 Analysis Services (three days)

    Course 2797: Designing a Reporting Solution Architecture Using Microsoft SQL Server 2005 Reporting Services (two days)

    Links
    Below some links to other useful sites/weblogs about the 70-446 exam:

    If you have any questions, leave them as a comment and I will answer them, if I can. Also, if you have any information regarding the 70-446 MCITP exam, please leave a comment.

    Good Luck!

  • SSAS – MOLAP, ROLAP and HOLAP storage types

    A big advantage of a BI solution is the existence of a cube. Data and aggregations are stored in a optimized format to offer very fast query performance.
    Sometimes, a big disadvantage of storing data and aggregations in a cube is the latency that it implies. SSAS processes data from the underlying relational database into the cube. After this is done the cube is no longer connected to the relational database so changes to this database will not be reflected in the cube. Only when the cube is processed again, the data in the cube will be refreshed.

    SSAS 2005 gives you the possibility to choose different storage types for the following objects:

    • Cubes
    • Partitions
    • Dimensions

    MOLAP
    (Multi dimensional Online Analytical Processing)
    MOLAP is the most used storage type. Its designed to offer maximum query performance to the users. Data AND aggregations are stored in optimized format in the cube. The data inside the cube will refresh only when the cube is processed, so latency is high.

    ROLAP (Relational Online Analytical Processing)
    ROLAP does not have the high latency disadvantage of MOLAP. With ROLAP, the data and aggregations are stored in relational format. This means that there will be zero latency between the relational source database and the cube.
    Disadvantage of this mode is the performance, this type gives the poorest query performance because no objects benefit from multi dimensional storage.
     

    HOLAP (Hybrid Online Analytical Processing)
    HOLAP is a storage type between MOLAP and ROLAP. Data will be stored in relational format(ROLAP), so there will also be zero latency with this storage type.
    Aggregations, on the other hand, are stored in multi dimensional format(MOLAP) in the cube to give better query performance. SSAS will listen to notifications from the source relational database, when changes are made, SSAS will get a notification and will process the aggregations again.
    With this mode it’s possible to offer zero latency to the users but with medium query performance compared to MOLAP and ROLAP.


    The different storage types of SSAS:

      Data storage Aggregations storage Query performance Latency
    MOLAP

    Cube

    Cube High

    High

    HOLAP

    Relational database

    Cube

    Medium

    Low (none)

    ROLAP

    Relational database

    Relational database

    Low

    Low (none)



    Conclusion
    SSAS offers three storage types that give you all the flexibility you need. You can choose between high performance and high latency on one side(MOLAP) and lower performance but low latency(ROLAP) on the other side. There is also a possibility to choose a way in between(HOLAP). 

     

  • SSIS – Non-blocking, Semi-blocking and Fully-blocking components

    How can you recognize these three component types, what is their inner working and do they acquire new buffers and/or threads?

    Synchronous vs Asynchronous

    The SSIS dataflow contain three types of transformations. They can be non-blocking, semi-blocking or fully-blocking. Before I explain how you can recognize these types and what their properties are its important to know that all the dataflow components can be categorized to be either synchronous or asynchronous.

    ·         Synchronous components
    The output of an synchronous component uses the same buffer as the input. Reusing of the input buffer is possible because the output of an synchronous component always contain exactly the same number of records as the input. Number of records IN == Number of records OUT.

    ·         Asynchronous components
    The output of an asynchronous component uses a new buffer. It’s not possible to reuse the input buffer because an asynchronous component can have more or less output records then input records.

    The only thing you need to remember is that synchronous components reuse buffers and therefore are generally faster than asynchronous components, that need a new buffer.

    All source adapters are asynchronous, they create two buffers; one for the success output and one for the error output. All destination adapters on the other hand, are synchronous.


    Non-blocking, Semi-blocking and Fully-blocking

    In the table below the differences between the three transformation types are summarized. As you can see it’s not that hard to identify the three types.
    On the internet are a lot of large and complicated articles about this subject, but I think it’s enough to look at the core differences between the three types to understand their working and (dis)advantages:

     

    Non-blocking

    Semi-blocking

    Fully-blocking

    Synchronous or asynchronous

    Synchronous

    Asynchronous

    Asynchronous

    Number of rows in == number of rows out

    True

    Usually False

    Usually False

    Must read all input before they can output

    False

    False

    True

    New buffer created?

    False

    True

    True

    New thread created?

    False

    Usually True

    True



    All SSIS 2005 transformations categorized:

    Non-blocking transformations

    Semi-blocking transformations

    Blocking transformations

    Audit

    Data Mining Query

    Aggregate

    Character Map

    Merge

    Fuzzy Grouping

    Conditional Split

    Merge Join

    Fuzzy Lookup

    Copy Column

    Pivot

    Row Sampling

    Data Conversion

    Unpivot

    Sort

    Derived Column

    Term Lookup

    Term Extraction

    Lookup

    Union All

     

    Multicast

     

    Percent Sampling

    Row Count

    Script Component

    Export Column

    Import Column

    Slowly Changing Dimension

    OLE DB Command

     

     

  • SSIS – Lookup Transformation is case sensitive

    A while ago I figured out that the lookup transformation is case sensitive.
    I used a lookup to find dimension table members in for my fact table records. This was done on a String business key like ‘AA12BB’. I attached a table for the error output and after running the package I found one record in this table.

    This record had a business key like ‘Aa12BB’. I searched the dimension table for this missing record and it surprised me, it DID exist but with the following business key: ‘AA12BB’. It seemed the lookup transformation is case sensitive.

    Next thing I tried was a T-SQL query in the management studio of SQL Server 2005. In the WHERE clause I referred to the business key: ‘Aa12BB’. The query returned the record with business key ‘AA12BB’. Conclusion: SQL Server is not case sensitive but the SSIS lookup component IS case sensitive… Interesting.


    Solution:
    After some research I found a few solutions for this interesting  feature of the lookup transformation. Before I explain these solutions you must know something about the inner working of the lookup component.

    A lookup transformation uses full caching by default. This means that the first thing it does on execution, is loading all the lookup data in its cache. When this is done it works as expected, but with case sensitivity.

    The solution is to set the CacheType property of the lookup transformation to Partial or None, the lookup comparisons will now be done by SQL Server and not by the SSIS lookup component.
    Another solution is to format the data before you do the lookup. You can do this using the T-SQL LOWER() or UPPER() functions. These functions can be used in a query or for example in a derived column SSIS component.

     

  • SSRS – Static column headers in a Matrix

    How do you create a static column header centered above your dynamic columns? One way to try achieving this is to place a textbox above your dynamic columns. One thing is for sure, the textbox will never be on the perfect centered location and what if the number of dynamic columns grow or shrink? 
     
    Thing you need to do is to create a static column group. You do this by adding a new column group to the matrix and give it a static expression, for example: =”static”
    Now make it the top group by clicking Up for the static column group on the Groups tab of the matrix’s properties. You can also achieve this by just dragging the column group up in the layout view.



    The result, a centered and perfect aligned column header with the text “YTD” above some dynamic columns containing years:


     

  • SSRS - Custom expressions for subtotals in a matrix

    If you want custom expressions for your subtotals in a matrix, for example to calculate an average instead of the default sum, you need to use the InScope() and Iif() functions in your data field…


    When you create a matrix with SSRS 2005 you get the following default groups:
    A row group named:               matrix1_RowGroup1
    A column group named:          matrix1_ColumnGroup1

    With the normal functionalities you can’t change much on the behavior of your subtotals in your matrix. When you create a subtotal it calculates a subtotal and that’s about it ;-)


    If you use the following expression in the data field of your matrix you can take full control on the behavior of all your subtotals:

    =Iif(InScope("matrix1_ColumnGroup1"),

    Iif(InScope("matrix1_RowGroup1"),

                                    "In Cell",

                                    "In Subtotal of RowGroup1"),

                Iif(InScope("matrix1_RowGroup1"),

                                    "In Subtotal of ColumnGroup1",

                                    "In Subtotal of entire matrix"))

               

         

    Replace "In Cell", "In Subtotal of RowGroup1", "In Subtotal of ColumnGroup1" and/or "In Subtotal of entire matrix" with the expressions or fields that you want.

     


    For example, if you want to calculate an average:

    Replace "In Cell" with Sum(Fields!Amount.Value)

    Replace "In Subtotal of RowGroup1" with Avg(Fields!Amount.Value)

     

    More information about the InScope() function on MSDN

  • SSRS – Matrix that adds a new column each time 5 rows are filled with data

    What if you want a dynamic list of values in a matrix but with a maximum of 5 rows. How do you create a matrix like this? I thought this should be an easy job but I found out it was not really simple…I tried to create a matrix like this for a dynamic list of countries. In this blog I will explain how you can achieve this with a few simple steps.                                         
     1.       You need to create an MDX(I used a SSAS datasource) query that returns the list of countries with a numbering:

      
       2.       Next thing you need to do is create a matrix:

      


    3.       Next and last thing you need to do is the following:

    •       Use the following expression for the row group: =(Fields!Country_Number.Value - 1)     Mod 5
    •       Use the following expression for the column group: =Floor((Fields!Country_Number.Value - 1) / 5)


    Result:

  • SSRS – Invalid row heights, BUG?

    Because I got a few reactions regarding the screenshots below (people thought something was wrong with the screenshots) please note that :

    Some of the screenshots below look awful because I selected all the text in the matrices with CTRL-A. I did this to make the differences in row heights clear to see. I also made the numbers in the matrix unreadable.


    Problem:
    As you can see in the screenshot below the row height of rows that contain empty cells