February 2008 - Posts

I was talking with David Hobbs-Mallyon (SQL Server Product Manager here in the UK - a good guy to know for swag and believe me Simon Sabin and I whipped a load away from yesterday's launch event for give aways at SQLBits (thanks Ellie too!), anyway I digress), I was chatting with David about how BI and Data Mining is a growing area of the product, seems apt and goes with one of the messages from the launch presentation I'd heard before talking to David; anyway - this event is nearly full so I though I'd bring it to your attention in case you'd missed it.

http://msevents.microsoft.com/CUI/EventDetail.aspx?EventID=1032363487&Culture=en-GB

This event is brought to you by Microsoft and Hitachi Consulting...

Event Overview

Most of us have heard of the magic of Data Mining. No doubt you know that it is part of Business Intelligence platform, offered by Microsoft SQL Server 2008 and 2005, and Microsoft Office System 2007 amongst others. We all know of companies that made fortunes by having carefully extracted intelligence from mountains of data–with Data Mining. Let us demystify this technology in our seminar. In four easy-to-understand yet packed with practical information sessions you will learn about what Data Mining and Business Intelligence can do for you, how to deploy and manage it, how to use it, and how to make it available to other parts of your IT environment. While in the past it may have taken a university degree in Statistics to make use of Data Mining, Microsoft has taken the technology to a new level, making it accessible to all IT Professionals and, with your help, to all of your users. We promise that after attending this seminar Data Mining will no longer seem like black magic to you. Perhaps, we may even help you embark on a new path in your career towards becoming someone akin to a Keeper of Enterprise Intelligence. Let us share our enthusiasm with you.

08:50- 09:20 Registration

09:20-09:40 Opening and Introductions

After introductions, we begin the day with an overview of the agenda highlighting the key topics to be covered and the logistics of the event.

 

09:40-11:00 Introduction to Data Mining

To commence, we will discuss the concepts and the terminology used in the discipline of Data Mining. To make this session as practical as possible, we will then review the common scenarios and applications for the use of Data Mining. We will also look at the “bigger picture” of the discipline of Business Intelligence and see how Data Mining is a part of it. Also in this session we will introduce the fundamental process for data mining, looking at the concepts of data assets and their preparedness. This session will end with a look at the technology product roadmap showing you relationships between Data Mining and technologies of Microsoft SQL Server 2008 & 2005, Microsoft Office 2007, and other systems. At the end of this session you should have a good understanding of applications of Data Mining.

 

11:00-11:15 Break

 

11:15-12:30 Working with Data Mining

You are ready to mine data–what are the steps and what is the recommended order? This session covers the already introduced Data Mining Process in detail. We will study its main steps: model preparation, model training, testing and evaluation of the built model, deployment, and ongoing model maintenance. We will spend time looking at possible exceptions and problems that you may be faced with in this process, such as missing or inconsistent data, or even data that seems fine but produces strange results. In the end, we want to make sure that the intelligence you are gathering is of quality that you expected. At the end of this session you will know how to use data mining well.

 

12:30-13:30 Lunch

 

13:30-14:45 Using Data Mining in Your IT Systems (Part 1)


14:45-15:00 Break

 

15:00-16:15 Using Data Mining in Your IT Systems (Part 2)

The two afternoon sessions will apply your knowledge of Data Mining to practical situations and cases that you are likely to encounter in your professional life. We will look at a number of canonical applications of data mining from the perspective of a practical scenario in order to show you how to correctly select the best features of Data Mining technologies. For example, when we look at the scenario of customer segmentation in a consumer-oriented company, we will help you chose the best of the data mining algorithms available and configure its parameters correctly. We will pay attention to common issues that may arise, such as the seasonality of data over a sales year, and we will help you decide when it is correct to rely on sampled data and when you may want to use the entire database instead. By following this pattern a few times we hope to cover all the necessary technicalities of data mining toolkit so that you are ready to use this powerful technology straight away. Before we close this session we will present you with uses of data mining that benefit your own, day-to-day, tasks as an IT Professional, System Administrator, or, perhaps, a Security Officer. For instance, we will show you how you could use data mining to better understand your infrastructure performance characteristics, to build new, higher-level data sources, or, perhaps, to discover insecure chains of infrastructure events that could lead to fraud. We hope that at the end of these two sessions we will have enabled you to work with the intelligence that previously was hidden, inaccessible, or just unknown across your IT systems.

 

16:15-16:45 Questions and Answers

For those attendees wishing to ask more questions or to discuss additional issues that were not covered during the day we plan a short Q&A session. Please prepare any questions in advance, if possible.

 

Speaker

In his role as Strategic Consultant at Project Botticelli Ltd, Rafal is responsible for analyzing and forecasting trends in the field of Information Technology. Rafal works closely with teams of up to 150 software developers, as well as with investors and their boards of directors, to practice the best principles of the Microsoft Solutions Framework and the Microsoft Operations Framework. He has helped to build and restructure software development houses and IT consultancies to promote best efficiencies and productivity while stimulating excellent team spirit across organisations. He specializes in several areas: IT architecture models, statistics and more recently data mining, security and cryptography, and management of solution delivery. Rafal is a frequent and popular speaker at Microsoft events having presented at prestigious Bill Gates and Steve Ballmer conferences across Europe, Middle East and Africa. His uniquely energetic speaking style should keep you engaged no matter how complex the subject.

I was lucky enough to receive an invite to the official UK launch of Windows 2008, VS 2008 and SQL 2008 yesterday in London. Some very good case studies demo'd from a number of folk doing some really great things - to name but two EasyJet and Cambridge University - EasyJet demo'd a very impressive proof of concept for their website which I really do hope launches later this year - a fantastic bringing together of SilverLight, SQL spatial stuff, Ajax and Virtual Earth.

But my absolute favourite because I can relate to it thus is the power and flexibility of SQL Server was the demo from Cambrige University - basically they've got all the plant information Darwin's teacher collected in a database, apparently he labelled every sample ever taken with date and location - they've used the spatial stuff in SQL Server 2008 to determine the answer to a number of questions - I think the most impressive bit was they could trace the guy's route on a holiday he took up into Scotland - crazy; oh - and they did the app and data load/analysis in less that 5 days.

There seems to be a real drive for Business Intelligence, but it's not the BI I've come to know as BI; the entry bar is a lot lower now - rather than BI it's more a PI (Personal Intelligence) because it's at that level; giving me the user the tools to do my own analysis.

Around 5.30pm we hooked up via a live Satellite link to the launch with Steve Ballmer; like Simon says I've absolutely no idea what that first guy was talking about [or on] but Steve came out fighting - a very inspirational speaker that never fails to impress and motivate.

Exciting times! [again]

Looking forward to SQL Server 2008 real launch later this year; from my perspective I'll be working on getting some dates booked so we can do some real launch events August onwards.

Now, back to work and the reality I've got to finish my presentation for SQL Bits.

Tony.

Hi all, it's been a crazy busy year so the user group took a bit of a back burner for a while - and what with SQLBits roaring success it's been difficult.

Both Simon Sabin and Martin Bell are now doing a lot to help me steer the group forward again, Martin is doing some great work with meetings up in Scotland - the next one is 10th May in Glasgow.

The new sqlserverfaq.com (user group home) site is nearly complete - should release a couple of weeks after SQLBits - I was hoping it would be ready for launch at SQLBits but alas - client commitments etc... have prevented that (if some ASP.NET skilled dev fancies helping and getting some exposure for their talents then email me tonyrogerson@torver.net, the site design is done - it's just hooking stuff up....)

21st Feb London
17th April, London
19th June, London
21st Aug, London

20th March, TVP
17th July, TVP
18th Sept, TVP
20th Nov, TVP
15th Jan 2009

Fancy presenting? Anything from 15 - 60 minutes? Give me a bell on 0796 816 0362 or email me tonyrogerson@torver.net

 

The guys have now just posted their 52nd show; some realy good stuff in the mix and not technical specific.

http://www.nxtgenug.net/Podcasts.aspx?PodcastID=52

 

Derived tables really are a wonderful tool in our tool chest when it comes to writing highly scalable SQL but only when they are used correctly. I'm currently writing my presentation for SQLBits and whilst digging around on the Derived Tables demos it jogged my memory of this problem but on 2005 I could no longer repro it - I thought I was going mad, but no - no madness they've just improved the 2005 implementation and remove the odd problem.

In SQL Server 2000 if you use a UDF in derived table and later refernce that column, the chances are the UDF will be executed again rather than using the value returned in the derived table.

Best seen as an example...

drop function dbo.fn_strip_time

go

 

--  Because we can't use CURRENT_TIMESTAMP in a UDF

create view vw_NOW as select NOW = current_timestamp

go

 

create function dbo.fn_strip_time(

    @value datetime

) returns datetime

as

begin

    declare @val2 datetime

    set @val2 = cast( convert( char(8), @value, 112 ) as datetime )

 

    --  force a delay of 3 seconds

    declare @now datetime

    select @now = now from vw_now

 

    while ( select now from vw_now ) <= dateadd( second, 3, @now )

        set @val2 = @val2

 

    return( @val2 )

 

end

go

 

--  Yes, this takes 3 seconds

print dbo.fn_strip_time( current_timestamp )

go

 

create table #tb(

    somedate    datetime not null

    )

 

insert #tb values( current_timestamp )

 

select * into #tb2 from #tb

 

select dateadd( day, 1, stripped_date_col )

from (

    select dbo.fn_strip_time( somedate ) as stripped_date_col

    from #tb

    ) as dt

    inner join #tb2 t on t.somedate >= dt.stripped_date_col and t.somedate < current_timestamp

Notice that final SQL statement takes 6 seconds on SQL 2000 build 2187, on 2005 SP2 build 3159 it takes just 3 seconds which is what we want.

 

Last year we promised more usergroup meetings and this year we will deliver.

We've arranged to have a user group meeting every 3rd Thursday of the month. We will be starting off in London on the 21st of February and then alternating between Reading and London every month.

As mentioned the first one is a week on Thursday at Microsofts new London office you can find more details of the event here http://www.sqlserverfaq.com/?eid=108.

As we get the agendas sorted for the following events we will post those on the site as well. There is so much going on this year there is plenty of content to talk about.

If you want to speak at a meeting feel free to contact Tony or Simon

 

There are often times where we need to hold history in the base table, for instance a commission rates table. Here is an example of checking the validity of the window without resorting to a trigger – I basically use a UDF (User Defined Function) on the CHECK constraint...

UDF needs to be created first or the CREATE TABLE will fail...

create function dbo.fn_check_sales_commission_window (

        @sales_commission_id int,

        @individual_id   int,

        @comm_start_date smalldatetime,

        @comm_end_date   smalldatetime )

returns varchar(3)

as

begin

    declare @status varchar(3)

 

    if exists (

        select *

        from sales_commission

        where individual_id = @individual_id

          and id <> @sales_commission_id

          and ( @comm_start_date between comm_start_date and coalesce( comm_end_date, @comm_start_date  )

             or @comm_end_date between comm_start_date and coalesce( comm_end_date, @comm_end_date ) )

        )

        set @status = 'BAD'

 

    else

        set @status = 'OK'

 

    return @status

 

end

Now create the table...

create table sales_commission (

    id  int not null identity constraint pk_sales_commission primary key clustered,

 

    individual_id  int not null,

 

    comm_rate decimal( 5, 2 ) not null,

 

    comm_start_date smalldatetime not null check( comm_start_date = cast( convert( char(8), comm_start_date, 112 ) as smalldatetime ) ),

    comm_end_date   smalldatetime null  check( comm_end_date is null or comm_end_date = cast( convert( char(8), comm_end_date, 112 ) as smalldatetime ) ),

        constraint uk_sales_commission unique ( individual_id, comm_start_date ),   

        constraint ck_sales_commission_window check( dbo.fn_check_sales_commission_window( id, individual_id, comm_start_date, comm_end_date ) = 'OK' )

)

go

Onto testing...

insert sales_commission( individual_id, comm_rate, comm_start_date, comm_end_date ) values ( 1, 20, '20080101', null )

insert sales_commission( individual_id, comm_rate, comm_start_date, comm_end_date ) values ( 1, 20, '20080511', null )  -- Fails

 

update sales_commission set comm_end_date = '20080510' where individual_id = 1      -- OK

 

insert sales_commission( individual_id, comm_rate, comm_start_date, comm_end_date ) values ( 1, 20, '20080511', null )  -- Now works

 

insert sales_commission( individual_id, comm_rate, comm_start_date, comm_end_date ) values ( 1, 20, '20070101', '20080201' ) -- Fails, End Date end up in range

 

insert sales_commission( individual_id, comm_rate, comm_start_date, comm_end_date ) values ( 1, 20, '20070101', '20071231' ) -- Works


Does it work when you are inserting a set?

truncate table sales_commission

 

 

insert sales_commission( individual_id, comm_rate, comm_start_date, comm_end_date )

    select 1, 20, '20080101', null

    union all

    select 1, 20, '20080511', null  -- This one is wrong and should cause a Failure


It does! Cool.

I'm not going to get into maintenance of the CHECK CONSTRAINT because that will spoil the fun, anyway if you try and update the UDF you rightly get this error...

Msg 3729, Level 16, State 3, Procedure fn_check_sales_commission_window, Line 24

Cannot ALTER 'dbo.fn_check_sales_commission_window' because it is being referenced by object 'ck_sales_commission_window'.


To get round this you need to drop the contraint first and use ALTER TABLE to put it back on.

Be warned that the UDF is executed row by row (for each row inserted or updated as they get updated/inserted) so an inconsistency may occur if you are aggregating rather than just doing existance checking like in my specific business case - heres an example that shows the update is applied row by row and that the CHECK is done row by row, so the update is not complete...

create table checks (

    flag char(1) not null

)

 

insert checks values( 'A' )

insert checks values( 'A' )

insert checks values( 'A' )

insert checks values( 'A' )

insert checks values( 'A' )

 

go

 

create function dbo.fn_check ( @flag char(1) )

    returns varchar(3)

as

begin

    declare @status varchar(3)

 

    if exists ( select * from checks where flag <> @flag )

        set @status = 'BAD'

    else

        set @status = 'OK'

 

    return @status

 

end

go

 

alter table checks add constraint checks_udf_lookup check ( dbo.fn_check( flag ) = 'OK' )

go

 

If we had transaction consistency then this should work fine because all rows would be updated to 'B' at once and then the check constraint executed; alas, it doesn't; because what is happening is that sql updates row 1, checks it, updates row 2, checks it etc...

 

update checks

    set flag = 'B'

 

One last point, you notice I made it a table level CHECK constraint - it needs to be table level so all the columns are available to put in the parameter list.

Tony.

An update to my SMTP Email VB.NET CLR stored procedure, realised it only accepts 4000 bytes max, to fix this use SqlFacet eg...

<SqlFacet(MaxSize:=-1)> ByVal Body As SqlString 

Example usage....

Imports System

Imports System.Data

Imports System.Data.SqlClient

Imports System.Data.SqlTypes

Imports Microsoft.SqlServer.Server

Imports System.Runtime.InteropServices

Imports System.Net.Mail

 

 

Partial Public Class StoredProcedures

    <Microsoft.SqlServer.Server.SqlProcedure()> _

    Public Shared Function SMTPFullEmail(ByVal SMTPServer As SqlString, _

                                         ByVal SenderEmail As SqlString, _

                                         ByVal FromEmail As SqlString, _

                                         ByVal FromName As SqlString, _

                                         ByVal ToEmail As SqlString, _

                                         ByVal ReplyTo As SqlString, _

                                         ByVal Subject As SqlString, _

                                         <SqlFacet(MaxSize:=-1)> ByVal Body As SqlString, _

                                         ByVal AttachmentFileName As SqlString, _

                                         <Out()> ByRef SendStatus As SqlString) As SqlInt32

 

        Dim Message As New MailMessage

 

        '   Set up the message

        Message.Sender = New MailAddress(SenderEmail.ToString)

        Message.From = New MailAddress(FromEmail.ToString, FromName.ToString)

 

        Message.To.Add(ToEmail.ToString)

        Message.Subject = Subject.ToString

        Message.Body = Body.ToString

        Message.IsBodyHtml = True

        If AttachmentFileName.ToString <> "" Then Message.Attachments.Add(New Attachment(AttachmentFileName.ToString))

 

        Dim smtp As New SmtpClient

 

        '   Send it

        smtp.Host = SMTPServer.ToString

 

        Try

            smtp.Send(Message)

 

            SendStatus = CType("OK", SqlString)

            SMTPFullEmail = 0

 

        Catch ex As Exception

            SendStatus = CType(ex.Message, SqlString)

            SMTPFullEmail = 1

 

        End Try

 

    End Function

 

End Class

 

Making the leap into Advanced SQL
There are three major features in SQL Server that are usually under used or not used optimally; these are - a) Derived Tables, b) Common Table Expressions and c) The CASE expression. In this session I look at the in's and out's of each feature from basics right through to advanced techniques and considerations.

Fancy a trip to Glasgow or are you going to miss my session at SQLBits in March at Birmingham?

Go and join the folks at Scottish Developers at their event on 10th May in Glasgow.