One of the pioneers of the Relational Database, Chris Date is giving a 2 day seminar in Edinburgh (13th and 14th May 2010) based around his new book "SQL and Relational Theory - How to Write Accurate SQL Code" which if you don't already have I'd say is a must buy.

When I first saw this and what he will cover I thought, oh yer - this is going to cost the earth, well it doesn't - its £750 for the two days and there are discounts available for multiple bookings, being a member of the UK SQL Server User Group or members of the academic community.

More information here: http://www.justsql.co.uk/chris_date/cjd_edin_may_2010.htm.

The person you need to contact is Peter Robson (rde@justsql.com), if you are a member of the UK SQL Server User Group just say so and mention my name.

If you are interested - talk to Peter this week because he needs to formulate numbers by the end of March, you may miss out otherwise.

The detail of what Chris Date will cover is shown below, note this will be technology neutral, what no T-SQL you ask; don't worry - SQL Server uses ANSI 92 SQL so the majority of syntax he shows will work, but frankly this is more about thinking Relationally than being a code monkey.

1. Setting the scene

  • Codd's relational model
    Model vs. implementation
    Properties of relations
    Base vs. derived relations
    Relations vs. relvars
    Values vs. variables


2. Types and domains

  • Domains are types
    Types and operators
    System vs. user defined types
    Scalar vs. nonscalar types
    Scalar types in SQL
    SQL type checking and coercion
    SQL collations
    SQL row and table types


3. Tuples and relations, rows and tables

  • What's a tuple?
    Rows in SQL
    What's a relation?
    Relations are n-dimensional
    Relational comparisons
    TABLE_DUM and TABLE_DEE
    Tables in SQL
    SQL column naming


4. No duplicates, no nulls

  • What's wrong with duplicates?
    Avoiding duplicates in SQL
    What's wrong with nulls?
    Avoiding nulls in SQL
    A remark on outer join


5. Base relvars, base tables

  • Data definition
    INSERT, DELETE, UPDATE
    Updating is set level
    Relational assignment
    Candidate and foreign keys
    Tables and predicates
    Tables vs. types


6. SQL and relational algebra I:
The original operators

  • Result types
    Restriction, projection, join
    Union, intersection, difference
    WITH and complex expressions
    What expressions mean
    Evaluating SQL expressions
    Optimization
    Importance of column names


7. SQL and relational algebra II:
Additional operators

  • Semijoin and semidifference
    Extend
    Image relations
    Divide
    Aggregation and summarization
    Grouping and ungrouping
    "What if" queries
    What about ORDER BY?


8. SQL and constraints

  • Type constraints
    Type constraints in SQL
    Database constraints
    Database constraints in SQL
    The role of transactions
    Immediate vs. deferred checking
    Multiple assignment
    Constraints vs. predicates
    The Golden Rule
    Correctness vs. consistency


9. SQL and views

  • Views are relvars
    The Principle of Interchangeability
    Views and predicates
    Retrieval operations
    Views and constraints
    Updating operations
    What are views really for?
    Views and snapshots


10. SQL and logic I:
Relational calculus

  • Propositions and predicates
    Quantification: EXISTS, FORALL, UNIQUE
    Range variables and correlation
    Calculus expressions
    SQL support
    Transforming expressions


11. SQL and logic II:
Using logic to write SQL code

  • Important identities
    SQL and implication
    SQL and FORALL
    Correlated subqueries
    Naming subexpressions
    Dealing with ambiguity
    Using COUNT
    ALL or ANY comparisons
    GROUP BY and HAVING


12. Further SQL topics

  • Explicit tables
    Range variables
    Table, row, and scalar subqueries
    "Possibly nondeterministic" expressions
    Cursor operations
    Empty set issues
    A BNF grammar


13. The relational model

  • The relational model vs. others
    The relational model defined
    What remains to be done?
    The future of SQL


14. Database design theory

  • The place of design theory
    FDs and BCNF
    JDs and 5NF
    6NF
    Normalization is not a panacea
    But don't denormalize!
    The Principle of Orthogonal Design
    Remarks on physical design

 

I started writing an article for my blog on surrogate keys drawing in the original research by E F Codd and C J Date, its getting a bit big :) so I'm going to chop it up into a number of posts over the coming weeks depending on my time.

I'm interested in your thoughts and if you disagree please let me know but more importantly give me references so I can go back and read what you've read and either modify my position or not :), either way it will improve the article.

Hope it makes sense.  

Surrogate keys

There are two factions in the world of Database Design that enact wars against the use of surrogate keys, there is a lot of misunderstanding around what a surrogate key actually is and the need for them.

This article intends to draw a line under that battle and defend once and for all the use of surrogate keys, to do that we need to look at what problems they solve, the theory and rationale behind them and how to implement them in your chosen product.

I'm going to cover a lot:

  • Rules for a Surrogate key
  • Problems they solve
  • The "Theory" - we will look at the research done primarily by E. F. Codd and C. J. Date but will also look at other authors and their take on the subject.
  • How to implement a surrogate key
    • o Database Design (includes a discussion on Normalisation and when you'd introduce a surrogate)
    • o Techniques for generating the surrogate key value (IDENTITY, NEWID(), NEWSEQUENTIALID, Hashing and do it yourself)
    • o Considerations around Referential Integrity
    • o Considerations around External Verification of the surrogate key value and why even the notion is invalid
    • o Locking Considerations - effect on concurrency
  • Example Tricks using the surrogate key for example Table Partitioning

Rules for a Surrogate Key

  1. They must never be related (directly or indirectly) to the physical hardware or storage - to use the Relational phrase - they are not tuple-id's (row/record locators).
  2. Once a value has been used - it must never be reused by something else - basically once the value "5" has been used, then the surrogate key generator must never generate the number "5" again even if the original row has been deleted.
  3. They are read only - once generated that value must never be changed.
  4. The value must be atomic, that means no sets - just a constant for instance "5".
  5. The value should never be exposed outside the limits of the application as a whole (the database is not a black box) - I explain more later but basically the User of your application for instance the Call Centre staff entering client details, the external machine calling your web service should never see the surrogate key value (they should see one of the candidate key values instead) - that is because the surrogate key cannot be used for verification purposes but the candidate key can.
  6. A surrogate key can be the sole key for a table in a situation where there are no natural occurring candidate keys for example a Message Board. When used in this scenario arguably it’s no longer a surrogate key but an artificial one but I would still follow rules 1 – 4. Having said that you need to consider verification for instance if the rows were re-inserted and the artificial key re-generated then the same rows (entities) should get the same artificial key value; there are hashing solutions to provide that facility but the obvious problem of duplicate rows come into play – that discussion is outside the scope of surrogate keys.

The problem they solve

Surrogate keys form an abstraction from the candidate keys on a table, it can be argued that creating a surrogate key on a table where a single column candidate key exists makes no sense and that their real use is for making composite keys simpler to use throughout the application - a fair point but there is more to consider.

I'd say the real benefit of a surrogate key is in the abstraction it provides thus removing concurrency problems caused should the candidate key values you've used in foreign key clauses in related tables change - a surrogate key value once generated can never change and its value can never be reused even if the rows have been deleted (Codd and Date - see later).

I've summarised below some of the main points I see surrogates provide us:

  • Joins between tables are done on a single expression unlike say a join using a composite key for instance instead of writing: ON Manufacturers.Maker = Registrations.Maker AND Manufacturers.Model = Registrations.Model you would write ON Manufacturers.ManufacturerID = Registrations.ManufacturerID.
  • Reduce concurrency problems caused when values of existing candidate key being used in foreign key references change.
  • Removes any possibility of inconsistencies in data type, length and collation creeping into the schema, for instance in the Manufacturers table the column Maker is defined as varchar(250) with a collation of Latin CI, in the Registrations table the column Maker is defined varchar(200) with a collation of Spanish.
  • Join performance, this is really debatable; with today's hardware and improved code efficiency within the Optimiser and Storage Engine I don't think this really matters (that much) anymore.
  • The surrogate key can provide a method for partitioning of your data - I'll show that later with an example of using partitioned views (as opposed to the Enterprise Edition feature Partitioning). The effect of partitioning your data helps reduce locking contention both within the storage engine (allocating new extents) and for ACID in terms of consistency - less blocking!

The Theory

E. F. Codd and C. J. Date are seen as the parents of the Relational Model, their initial and continued research are the underpinnings of Database Design and why Relational Databases exist.

Codd on Surrogate Keys

In the paper ACM Transactions on Database Systems, Vol. 4, No. 4, December 1979 pages 409 - 410 Codd introduces the concept of a surrogate key, the key points of his text are as follows:

{snip}

"There are three difficulties in employing user-controlled keys as permanent surrogates for entities.

(1) The actual values of user-controlled keys are determined by users and must therefore be subject to change by them (e.g., if two companies merge, the two employee databases might be combined with the result that some or all of the serial numbers might be changed).

(2) Two relations may have user-controlled keys defined on distinct domains (e.g., one uses social security, while the other uses employee serial number) and yet the entities denoted are the same.

(3) It may be necessary to carry information about an entity either before it has been assigned a user-controlled key value or after it has ceased to have one (e.g., an applicant for a job and a retiree).

These difficulties have the important consequence that an equi-join on common key values may not yield the same result as a join on common entities. A solution-proposed in part in [4] and more fully in [14]-is to introduce entity domains which contain system-assigned surrogates. Database users may cause the system to generate or delete a surrogate, but they have no control over its value, nor is its value ever displayed to them.

{snip}

Users will often need entity identifiers (such as part serial numbers) that are totally under their control, although they are no longer compelled to invent a user-controlled key if they do not wish to.

They will have to remember, however, that it is now the surrogate that is the primary key and provides truly permanent identification of each entity. The capability of making equi-joins on surrogates implies that users see the headings of such columns but not the specific values in those columns."

What Codd means by "User of the database" is open to debate in respect of whether the surrogate value should be displayed to them, C J Date makes the point that if the value is not displayed it breaks Codd's own Information Principal. I believe the User of the database is not the Developer or Database Administrators - they are not users of the Database - they build systems that have users a user being a person using the application or a machine using a web service for instance.

His point about "The capability of making equi-joins on surrogates implies that users see the headings of such columns" means that we do see the columns - we'd have to in order to join our tables together! Ok, some products do have this ability - they internally have their own surrogate keys that are not exposed and are used in joins under the covers, but that is not a requirement by Codd just a nice feature of those products - in SQL Server land we unfortunately do not have that feature.

Date on Surrogate Keys

From C. J. Date's book "An Introduction to Database Systems, C J Date, 8th Edition":

Page 434, reference 14.21 - he refers to P. Hall, J. Owlett and S J P Todd "Relations and Entities" and says:

"Surrogate Keys are keys in the usual Relational sense but have the following specific properties:

They always involve exactly one attribute.

Their values serve solely as surrogate (hence the name) for the entities they stand for ......

When a new entity is inserted into the database, it is given a surrogate key value that has never been used before and will never be used again, even if the entity in question is subsequently deleted.

Ideally surrogate keys value would be system-generated.

Surrogates must not be concealed from the user because of the Information Principal - tuple ID's should be however."


He finishes:

"In a nutshell: Surrogates are a model concept; tuple IDs are an implementation concept"

C J Date makes the distinction between a tuple ID (basically a row ID) and a surrogate key; the row ID changes as rows get removed from the table - that should never be exposed from the database even for developers or Database Administrators, the SQL language has the ROW_NUMBER() function to provide us with a numbering system for a result.

 

Granularity defines “the lowest level of detail”; but what is meant by “the lowest level of detail”?

Consider the Transactions table below:

create table Transactions (

    TransactionID       int not null     primary key clustered,

    TransactionDate     date not null,

    ClientID            int not null,

    StockID             int not null,

    TransactionAmount   decimal( 28, 2 ) not null,

    CommissionAmount    decimal( 28, 5 ) not null

)

 

A Client can Trade in one or many Stocks on any date – there is no uniqueness to ClientID, Stock and TransactionDate.

The level of granularity on the table is TransactionID because that is our lowest level of detail, TransactionID uniquely identifies the row – there are no other candidate keys, the columns ClientID, StockID and TransactionDate form a context to the TransactionID.

Consider now that ClientID, StockID and TransactionDate are unique and enforced by a unique constraint (a Client can trade multiple stocks but only once in a given day) – what would the level of granularity now be?

The TransactionID in this case would be acting as a surrogate for the candidate key (ClientID, StockID, TransactionDate) therefore the granularity can be said to be TransactionID or all of the columns ClientID, StockID and TransactionDate and those columns are in no specific order when specified.

When using a separate key specific for the business “Transaction” i.e TransactionID the ClientID, StockID and TransactionDate do not form part of the granularity at all, in fact they provide a context to the Transaction, and the context itself is hierarchical because Clients trade in Stocks on a given Date (see below):

The above really talks about non-aggregated data, data you tend to find in the OLTP system.

When looking at aggregated data you would take all columns as the granularity because all the columns together determine the level of detail you are looking at. Aggregated data can be thought of as dimensional, unfortunately SQL does not give us operators to work in dimensional instead we’d use MDX. But it’s important to understand that an aggregated value sits in time and space so to speak and you access it through any member columns of the granular key.

If we had aggregated on ClientID, StockID, TransactionDate (see below) then CommissionAmountTotal has granularity of all the columns (ClientID, StockID and TransactionDate) – remember the columns have no ordering – the granularity key is all of them.

select ClientID,

       StockID,

       TransactionDate, sum( CommissionAmount ) as CommissionAmountTotal

from Transactions

group by ClientID, StockID, TransactionDate

 

This can be proved because you can rewrite the group by clause within any combination and still get the same answer.

When working upon aggregated data if you miss one of the columns out from the granular key then you are implying “All” so effectively the granularity has not changed on the query, but if aggregating a level up then the result of that instance of the query execution will have a different granularity:

 The query below the source granularity is TransactionID, the result granularity is ClientID.

select ClientID, sum( CommissionAmount ) as CommissionAmountTotal

from Transactions

group by ClientID

The query below the source granularity is still TransactionID within the derived table, the outer query the source granularity is {ClientID, StockID, TransactionDate} and the resulting granularity is ClientID.

select ClientID, SUM( CommissionAmount ) as CommissionAmountTotal

from (

    select ClientID,

           StockID,

           TransactionDate, sum( CommissionAmount ) as CommissionAmountTotal

    from Transactions

    group by ClientID, StockID, TransactionDate

 

    ) as agg( ClientID, StockID, TransactionDate, CommissionAmountTotal )

group by ClientID


In summary granularity has a lot to do with perspective and how the lowest level of data is identified, the level of granularity can change – it has a source and result when aggregating. The level of granularity is important because it helps us understand what aggregations can be performed on what data to stop double counting for instance.

 

Ahhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh

I am so "unhappy" {toned down} that this is not more clear and perhaps even something built into the dam product.

I got this error on build:

Error 1 'Request for the permission of type 'System.Web.AspNetHostingPermission, System, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089' failed.'

The fix is CASPOL but finding the right arguments, anyway this blog did the trick:

http://www.sellsbrothers.com/news/showTopic.aspx?ixTopic=1519

 

Get in quick because this will fill up: http://www.microsoft.com/uk/techdays/Registration.aspx

Event Overview

This is the UK launch event for SQL Server 2008 R2.  The R2 release showcases Microsoft’s continued commitment to business intelligence and mission-critical workloads.  This includes:
•        A trusted and scalable platform
•        IT and developer efficiency
•        Managed self-service business intelligence
The event will have sessions highlighting these capabilities, as well as diving into specific topics, such as consolidating SQL Server databases, and tips and techniques for Performance Monitoring and Tuning.  We will also take a look at our newly released Cloud platform SQL Azure.

Already into February and after a nice long break we are ramping up the user group meetings again - physical and offline. There is a lot coming up in 2010 including the R2 release in May which should be good; lots of new tools on the market especially in the BI and HA space. Do you want to get involved? We are always looking for venues, speakers and organisers to help keep the show on the road - if you are interested in getting involved just email.

Anyway, I've listed out some of the stuff coming up in the next few weeks. The user group team have been busy talking about new ideas and how we can all engage and push our skill set forward and help us do our day jobs better - more in the coming weeks.

Cardiff - Feb, 25th: 18:45 to 21:00

BI Fundementals: Building an Enterprise ETL Framework in SSIS - Adam Morton

SQL Server Upgrade: notes from the field and best practices - Satya Shyam K Jayanty

Registration and more info


Edinburgh - Feb, 25th: 18:00 to 21:00

End to End SQL Server Performance Troubleshooting - Iain Kick

Second session to be confirmed soon.

Registration and more info


London - March, 17th: 18:00 to 21:00

We are sorting the sessions out this week - it will be a Business Intelligence focused event.

Registration and more info


Reading - April, 21st: 17:30 to 21:00

We are sorting the sessions out this week - it will be a Developer, High Availability / DBA focused event.

Registration and more info


Manchester - April/May: 17:30 to 21:00

Just finalising a venue and sessions, if you are interested do let us know suggested dates and topics - what night suits best?

Registration and more info


Live Meetings (Weekly Online Events starting March)

We are starting the weekly online short meetings again; we'll be starting with some introductary sessions aimed at novice feature users.

If you have specific topics then reply to this email with ideas.

I'm planning one on Referential Integrity.

Social Events

I'm a good few weeks into my MSc in BI now and loving it - 17 of us on the course, Mike Whitehorn is a great tutor and doing it remotely is working which I was worried about. Lots of work - an assignment per week, its eating up around 10 hours per week of my time but I love it.

Anyway, why mention this? Well, about 10 of us based around London have started meeting up and talking geek - its great meeting up with folks that you can literally just talk technical with - the amount of stuff we talk about - and have a laugh too you'd be amazed.

Ok, the MSc meetings are special so I'm not opening those up, but the concept works - so I'm going to start planning some pub based gatherings - "SELECT * INTO #pub WHERE love_databases in ( 'yes', 'sort of', 'well - I like beer' )" - if you want to do one in your area just send me an email and I'll help get you sorted.

Many thanks,
Tony Rogerson, SQL Server MVP
http://sqlserverfaq.com
http://sqlblogcasts.com/blogs/tonyrogerson
http://twitter.com/tonyrogerson

It's beyond me to create a one to one relationship in SQL Server if you are inserting directly into the related tables; in this article I discuss just how you go about actually implementing a workable one to one relationship. Please feel free to comment or send me email to tonyrogerson@sqlserverfaq.com to discuss.

 

First of all let’s define what we mean by a “one to one relationship”.

A relationship is the definition of two “relations” (tables) having some logical data relationship or dependency such as one to one, one to zero or many, one to many etc.

The one to one relationship means that there must be at least one row in the left table and one corresponding row in the right table, a good definition can be found here: http://folkworm.ceri.memphis.edu/ew/SCHEMA_DOC/comparison/erd.htm (Understanding Entity Relationship Diagrams – section “relationships”).


In SQL Server terms that would be modelled something like below...


create
table Customers (


 

       CustomerSurrogateID  int    not null      identity     

              constraint pk_Customers primary key clustered,

      

       CustomerID           char(5)       not null

              constraint uk_Customer_CustomerID unique,

             

       Name   varchar(100)

       )

      

      

create table CustomersDetail (


 

       CustomerSurrogateID  int    not null

              constraint pk_CustomersDetail primary key clustered

              constraint fk_CustomersDetail_Customers foreign key references Customers( CustomerSurrogateID ),

             

       InitialOrderDate     datetime not null

       )

go


 

If you tried to create a foreign key reference between Customers and CustomersDetail actually in the CREATE TABLE statement you will have noticed that it fails because you haven’t created the CustomersDetail table – this eludes to the problem we are facing when it comes to inserting the data.


 

--     Can only add the foreign key reference here...

alter table Customers add constraint fk_Customers_CustomersDetail foreign key ( CustomerSurrogateID ) references CustomersDetail( CustomerSurrogateID )

go


In the schema above I use a surrogate key on Customers, a surrogate key being an automatically generated value that is part of the logical model and an attribute on the table but has no meaning within the business, it can be exposed to the application developer because it is stable and never changes, I mention that because often there is confusion between a surrogate key and a tuple id (row id), the difference is that the surrogate key never changes whereas the tuple id will change as deletes and inserts occur in the table, anyway, you can’t get access to the tuple-id in SQL Server.


Ok, we have our schema so what happens when we try and insert into the two tables?


We start our transaction so we keep consistency on the inserts into the two tables:



begin
tran


We now do the insert:


insert
Customers ( CustomerID, Name ) values( 'abcde', 'Funky Parts Ltd' )


What happened? Well, we get the error message below:


Msg 547, Level 16, State 0, Line 1

The INSERT statement conflicted with the FOREIGN KEY constraint "fk_Customers_CustomersDetail". The conflict occurred in database "MScBI2010", table "dbo.CustomersDetail", column 'CustomerSurrogateID'.

The statement has been terminated.


 

Why? SQL Server is right to barf; its barf’d because when it checks for a corresponding row in the CustomersDetail table there isn’t one. Now think – is it a simple case of inserting the row in the CustomersDetail table first? Well no, that’s because it’s exactly the same circumstance – there would need to be a corresponding row in the Customers table – basically you are stuck and our use of the FOREIGN KEY constraint for RI has failed us, well – failed us in the practical sense – but it did do its job!


Moving on, why do we have the one to one relationship in the first place? Why not just merge the two sets of columns into a single table and be done with it? Logically the split may make sense, also from an implementation point of view you may want to do it to reduce locking contention thus increasing concurrency.


Worse still, what happens when we execute the above in a transaction? Would you expect the constraint violation to cause the transaction to rollback?



begin
tran


 

insert Customers ( CustomerID, Name ) values( 'abcde', 'Funky Parts Ltd' )


 

select [count(*)]       = ( select COUNT(*) from Customers )

select [scope_identity] = scope_identity()             --     what is the surrogate value?

select [@@trancount]    = @@TRANCOUNT                  --     did it rollback?


 

rollback


 

Msg 547, Level 16, State 0, Line 3

The INSERT statement conflicted with the FOREIGN KEY constraint "fk_Customers_CustomersDetail". The conflict occurred in database "MScBI2010", table "dbo.CustomersDetail", column 'CustomerSurrogateID'.

The statement has been terminated.


 

count(*)

-----------

0


 

scope_identity

---------------------------------------

NULL


 

@@trancount

-----------

1


Two things have happened; firstly the row did not get inserted – the clue there is the result “the statement has been terminated”, remember “statement”; but the @@trancount proves we are still in the original outer transaction.


Realistically (<sarcasm>and we all do this don’t we</>) we should use BEGIN TRY / CATCH to trap the error – example below:



begin
tran


 

begin try


 

       insert Customers ( CustomerID, Name ) values( 'abcde', 'Funky Parts Ltd' )


 

end try

begin catch


 

       rollback tran

       print 'rolled back'


 

end catch


 

if @@TRANCOUNT > 0

begin

       commit tran

       print 'committed'

end



Moving forward I see the two tables logically connected and as such I think we can model a view across the pair of them in the Physical Model, the view is below:



create
view vw_CustomersCustomerDetails

       with schemabinding

as

       select cCustomerID = c.CustomerID, cCustomerSurrogateID = c.CustomerSurrogateID, cName = c.Name,

                 cdCUstomerSurrogateID = cd.CustomerSurrogateID,

                 cdDummyInsert = cd.DummyInsert,

                 cdInitialOrderDate = cd.InitialOrderDate,

                 cdSomeUniqueColReqd = cd.SomeUniqueColReqd

       from dbo.Customers c

              inner join dbo.CustomersDetail cd on cd.CustomerSurrogateID = c.CustomerSurrogateID

go


 

We pull all the columns from both tables and use an INNER JOIN to form the relationship. The naming of the columns needs thinking about and is a personal choice, from my experience I’d say that you ought to name them firstly prefixing with the source table then source column name for example Customers_CustomerID.


We can now insert into the view:



insert
vw_CustomersCustomerDetails ( cCustomerID ) values ( 1 )


 

(1 row(s) affected)


Let’s look at our row:

select * from vw_CustomersCustomerDetails


cCustomerID cCustomerSurrogateID cName                                                                                                cdCUstomerSurrogateID cdDummyInsert cdInitialOrderDate      cdSomeUniqueColReqd

----------- -------------------- ---------------------------------------------------------------------------------------------------- --------------------- ------------- ----------------------- -------------------


 

(0 row(s) affected)


 

Interestingly although we did the insert, got one row affected we do not have in rows returned.


Try the insert again...


insert
vw_CustomersCustomerDetails ( cCustomerID ) values ( 1 )


Msg 2627, Level 14, State 1, Line 1

Violation of UNIQUE KEY constraint 'uk_Customer_CustomerID'. Cannot insert duplicate key in object 'dbo.Customers'.

The statement has been terminated.


Lol, we have 0 rows returned from querying vw_Customer sCustomerDetails yet we get a violation of the unique key on Customers.


Ok, something really daft is going on here and it’s breaking Codds Rule 6 (
http://www.databaseanswers.org/codds_rules.htm); the insert should not work because the View is not updating both tables; this is simply a side effect of the way that the query optimiser expands the View SQL into the main body of the query which I’ve covered on my blog here: http://sqlblogcasts.com/blogs/tonyrogerson/archive/2008/01/03/views-they-offer-no-optimisation-benefits-they-are-simply-inline-macros-use-sparingly.aspx.


We can prevent the engine from doing this silliness by using what is called an INSTEAD OF trigger, there are two types of trigger an INSTEAD OF which is before the update/insert/delete takes place and puts you in control of what will happen because that is the nature of that feature – you are proving logic “instead of” the original dml statement and the after trigger which is post update/insert/delete.



create
trigger trg_CustomersCustomerDetails_instof

       on vw_CustomersCustomerDetails instead of insert

as

begin


 

       --     First insert the Customer, the trigger on Customer

       --     will take care of inserting the CustomerDetail dummy record.

       insert Customers ( CustomerID, Name )

              select cCustomerID, cName

              from inserted

             

       --     Now update the dummy record with the CustomersDetail

       update CustomersDetail

              set DummyInsert = NULL,           --     we now have real data

                     InitialOrderDate = i.cdInitialOrderDate,

                     SomeUniqueColReqd = i.cdSomeUniqueColReqd

       from inserted i

              inner join Customers c on c.CustomerSurrogateID = i.cdCustomerSurrogateID


 

end

go


The above creates our INSTEAD OF trigger thus intercepting the INSERT on the view; we must do the “Insert” processing ourselves, the rows inserted into the view are contained within the system generated “inserted” table.


We are little out of sequence so let’s get back to the solution; we are basically going to use a dummy row in the CustomersDetail table, I can hear people drawing breath already in terms of RI – how can we enforce uniqueness etc. I’ve thought about that and will show you later on.


We need to modify our “right” table and make all the columns except our key Nullable. We also implement our CHECK constraints with slightly difference logic as you can see on the CustomersDetail table and the InitialOrderDate. The CHECK CONSTRAINT must be a table level CHECK CONSTRAINT otherwise bets are off eg.



create
table Customers (


 

       CustomerSurrogateID  int    not null      identity     

              constraint pk_Customers primary key clustered,

      

       CustomerID           char(5)       not null

              constraint uk_Customer_CustomerID unique,

             

       Name   varchar(100)

       )

      

create table CustomersDetail (


 

       CustomerSurrogateID  int    not null

              constraint pk_CustomersDetail primary key clustered

              constraint fk_CustomersDetail_Customers foreign key references Customers( CustomerSurrogateID ),

      

       DummyInsert                char(1) null

              constraint uk_CustomersDetail check( coalesce( DummyInsert, 'N' ) in ( 'Y', 'N' ) ),

             

       InitialOrderDate     datetime null,

      

       SomeUniqueColReqd    int null,

      

              check( ISNULL( DummyInsert, 'N' ) = 'Y' or ( DummyInsert IS NULL and InitialOrderDate IS NOT NULL ) )


 

       )

go


 

If you’ve blatted the tables remember to recreate the trigger:



create
trigger trg_CustomersCustomerDetails_instof

       on vw_CustomersCustomerDetails instead of insert

as

begin


 

       --     First insert the Customer, the trigger on Customer

       --     will take care of inserting the CustomerDetail dummy record.

       insert Customers ( CustomerID, Name )

              select cCustomerID, cName

              from inserted

             

       --     Now update the dummy record with the CustomersDetail

       update CustomersDetail

              set DummyInsert = NULL,           --     we now have real data

                     InitialOrderDate = i.cdInitialOrderDate,

                     SomeUniqueColReqd = i.cdSomeUniqueColReqd

       from inserted i

              inner join Customers c on c.CustomerSurrogateID = i.cdCustomerSurrogateID


 

end

go


The above is not enough for our solution to work properly; inserting into the view is fine now but we need to have trigger logic on both the Customers and CustomersDetail tables.


 

create trigger trg_Customers_ins on Customers for insert

as

begin


 

       --     Make sure there is a row in the table we have a one to one with.

       --     It cannot contain data.

       insert CustomersDetail ( CustomerSurrogateID, DummyInsert )

              select CustomerSurrogateID, 'Y'

              from inserted


 

end

go


 

create trigger trg_CustomersDetail_del on CustomersDetail for delete

as

begin

       --     Need to remove the corresponding Customers row to keep consistency

      

       delete Customers

       where CustomerSurrogateID in ( select CustomerSurrogateID

                                      from deleted )


 

end

go


The solution basically inserts a dummy row into the CustomersDetail table – horrible I know, but we incorporate a DummyInsert column that is set to Y whenever that row is a Dummy; that then allows us to still use CHECK constraints; UNIQUE constraints are implemented using a unique filtered index such as below:



create
unique index fltd_CustomersDetail_InitialOrderDate

       on CustomersDetail( SomeUniqueColReqd )

       where DummyInsert IS NULL         --     Ignore dummy inserts because of one to one relationship

go


In summary; implementing a true one to one relationship in SQL Server is far from easy but I’ve shown you “a way” of doing it. The method uses the dummy row approach which is the only way that would work, but ordinarily if you just use a trigger on the left table to insert the dummy row there is nothing forcing the application developer to remember to go on and update the columns in the right table – inserting through the view forces that; lastly – the use of the DummyInsert column enables you to properly enforce constraints on columns that really do allow Nulls. On the columns that are logically NOT NULL you can do a CHECK CONSTRAINT similar to CHECK ( ( DummyInsert IS NULL AND {your col} IS NOT NULL) OR DummyInsert IS NULL).

UPDATE: 2010-01-23T14:20:00

Been thinking further on this and there are couple of holes that need to be plugged, one of them is updating the surrogate key on the CustomerOrders table - that needs "something" on there to make it insert only and not updateable - any suggestions, hint: something in an update trigger :)

UPDATE: 2010-01-28T14:39:00

Talking in the private groups with Adam Machanic he rightly points out we need deferrable constraints that operate when the final transaction commits which is right - never thought about that, anyway checking the ISO standard its in there....

"The checking of a constraint depends on its constraint mode within the current SQL-transaction. If the constraint mode is immediate, then the constraint is effectively checked at the end of each SQL-statement. If the constraint mode is deferred, then the constraint is effectively checked when the constraint mode is changed to immediate either explicitly by execution of a <set constraints mode statement>, or implicitly at the end of the current SQL-transaction."

We don't have deferrable constraints in SQL Server unfortunetly.

Allan Mitchell and I are starting our Business Intelligence Masters this week at the University of Dundee, as we are up here anyway it seemed a good idea to put on a user group meeting, so, details below - it's this Wednesday in the Queen Mothers Building from 6pm - 9pm.

You received this email because you are a member of the UK SQL Server user group, if you no longer want to receive these emails then just reply to me with the word remove in the subject - it's a manual task so please be patient if I missed you last time.

Please check the full and modified agenda here: http://sqlserverfaq.com?eid=211.

We are currently all talking about when and where to hold meetings, this will get formalised by the end of this month so watch this site :).

We are in the Queen Mothers Building; any problems my mobile is 0796 816 0362.

For directions on how to get their look at the map of campus found here: http://www.computing.dundee.ac.uk/travel.asp, it's literally a 10 minute walk from Dundee station.

Agenda

18:00 - 18:15 Meet & Greet

Meet up and socialise with your friends, meet new people, find out what other people are doing with SQL Server.

18:15 - 18:30 Round Table Discussion - Q & A from and to the audience

Bring your SQL problems and hopefully somebody within the room will be able to fix or advise you. There is always a wealth of experienced people at the user group meetings.

18:30 - 18:45 LiveLock: Always use IDENTITY for your keys

Two sides of the audience, one will argue FOR the statement and one will argue AGAINST.

18:45 - 19:30 SQL Server Tools

Tony Rogerson, SQL Server MVP

You've been tasked with looking after or developing against SQL Server - what next? How can I make sure my SQL is going to work in production and not grind the entire system to a halt?

I will take you through Management Studio and Profiler so you fully understand what is in there and how to use it; when showing query plans I will explain from an entry level how to read them and what some of the access paths and join types mean.

19:30 - 19:50 Break for Pizza

More time to network with your peers, meet new friends and get your questions answered.

19:50 - 21:00 Introduction to SQL Server Integration Services 2008

Allan Mitchell, SQL Server MVP

ETL? You've heard the word (or perhaps not); anyway - you want to know what this really powerful feature that comes with the SQL Server product set is.

Allan will show you some of the killer features in SSIS 2008, this will be an interactive session so bring your questions!

Many thanks,

Tony Rogerson, SQL Server MVP
http://sqlserverfaq.com
http://sqlblogcasts.com/blogs/tonyrogerson
http://twitter.com/tonyrogerson

Yes, copying between word and email always puts an extra line between code lines which makes the code hard to read.

The answer is simple; in Management Studio go into Find and Replace, in "Find Options" check "Use" and select "Regular Expressions".

Now, in the Find what: box enter the text ^\n

In the Replace with: box enter nothing, please don't take me literally on that and actually type the word nothing, I mean leave the text box empty.

Click Replace All.

Da dah!

You are registered for tonights London UG, if you are not coming then please unregister now so I don't send you a bill for £20! The agenda tonight is cracking - so much going on, so much in fact we have to start early tonight - don't worry though, just get there when you can make it.

This session is not available remotely - sorry, you need to come.

Please check the full and modified agenda here: http://sqlserverfaq.com?eid=193.

In summary we've got.... The usual Round table - bring your problem discussion

First ever LiveLock discussion which will figure on "All tables should have a clustered index" - please do some research before tonight and bring some "for the statement" and "against the statment" - not sure how it will go, but hey - should be a bit of fun :)

Dave Ballantyne will do a SQL nugget for us

Martin Cairney will be doing a session on "An introduction to the Power of Policies"

BakBone will be doing a product demonstration on their tools

Siddharth Mehta will be doing a session on "Microsoft Business Intelligence Project Boosters"

Tony (assuming I can stop coughing by tonight) will be going through some of the Internals for the beginner

A great line up - we've still room so tell your friends, again if you aren't coming please tell me.

Remember we start the meeting at 5.30pm.

Location: Microsoft London, Cardinal Place, 100 Victoria Street, London, SW1E 5JL

Many thanks

Tony Rogerson
SQL Server MVP
http://sqlserverfaq.com
http://sqlblogcasts.com/blogs/tonyrogerson

Table Variables are not Transactional, that is to say BEGIN TRAN....ROLLBACK has no effect on them; we can use that to our advantage.

Have you ever been in the situation where you use a log file within your application to log progress, unfortunetly because your process is in a transaction you lose what has happened, well - not anymore - just use a table variable.... 

create table my_log_file (

    msg varchar(1024)

)

declare @t table ( msg varchar(1024) )

begin tran

--  some processing

--  ...

--  ...

--  logging as we go....

insert my_log_file ( msg ) values( '1' )
insert my_log_file ( msg ) values( '2' )
insert my_log_file ( msg ) values( '3' )
insert my_log_file ( msg ) values( '4' )
insert @t ( msg ) select msg from my_log_file

rollback

select * from my_log_file
select * from @t

This Thursday 26th is the London SQL User Group and also the Edinburgh UG; in London we have Tony Rogerson doing a session on Internals Basics aimed at people who aren't experts but will likely benefit as a reminder for those who are, we have a Nugget from David Ballantyne composed from some of his blog posts on ranking problems and udf problems, we also have Siddharth Mehta who will talk about Microsoft Business Intelligence Project Boosters. The Edinburgh group sees Martin Bell talk through some of the items in the Newsgroups as well as a ton of SQL nuggets.

We are also starting our LiveLock discussions, ever watched Argumental? Well, same principal - we have a specific topic, the first one will be "Every table must have a clustered index" and we have a group who argue for and group against, we summarise by pulling it back to a best practice advice to end the stalemate - should be fun!

If you don't want these email blasts then just reply to me with the word remove on the subject.

To register please visit http://sqlserverfaq.com?eid=193.
Make sure you reserve your place sooner because this session will be popular.Come and socialise and learn from your peers; these physical meetings are great places to expand your network, get answers and find out how other people are using SQL Server and what is going on. This is the last meeting this year for London and for those who can we should do drinks after the meeting to continue the SQL chat in an even more informal environment.If you want to twitter please make sure you use the tag #uksqlug so it is shown on the site.Agenda

17:30 - 18:00 Meet & Greet

Meet up and socialise with your friends, meet new people, find out what other people are doing with SQL Server.


18:00 - 18:15 Round Table discussion and Nuggets

Take stock and get the latest news in the SQL Server field. This is also a great opportunity to ask any burning questions you have - perhaps a work problem, general guidance etc..

18:15 - 18:30 LiveLock: Every table should have a Clustered IndexTwo sides of the audience, one will argue FOR the statement and one will argue AGAINST.18:30 - 18:45 Dave Ballantyne SQL Nugget from his blog posts...http://sqlblogcasts.com/blogs/sqlandthelike/archive/2009/09/08/bug-use-of-ranking-functions-result-in-an-inefficient-query-plan.aspxhttp://sqlblogcasts.com/blogs/sqlandthelike/archive/2009/10/15/udf-overhead-a-simple-example.aspx
( with an offshoot to http://sqlblogcasts.com/blogs/sqlandthelike/archive/2009/11/24/the-observer-effect-in-action.aspx)


18:45 - 19:15 SQL Server Internals for the Beginner
Tony Rogerson, SQL Server MVP
Tony will cover SQL Server memory, Database Structure (how they are composed) including why the transaction log is so important and how to fix the amount of space it uses if it grows too big, we will also cover some best practice for instance maintanence plans and finally we will cover Index internals and fragmentation and what causes it.

This really will be from beginners in, but don't be put off - I will try and go as deep as people want and time permits.
Its a good open session so if you want anything specific covering then let me know in advance.19:15 - 19:35 Break with Pizza

19:35 - 20:15 TBC

To be confirmed shortly
20:15 - 21:00 Microsoft Business Intelligence Project Boosters
Siddharth Mehta

BIDS and SSMS are not always sufficient for any MS BI Project. Right from the documentation that might be required just for compliance to tools that are able to fill up the gaps that SSMS or BIDS leave behind, a lot of accessories are required for a project for a leveraged development and delivery.
 
MS BI Project Booster is a kind of kit that consists of a collection of tool & utilities (freewares) and douments and/or documenting methods, that is good to have installed or available in advance, that can help any MS BI Team to tackle issues or cater requirements in a speedier manner at various stages of a project development life cycle. It's a resource kit, and can be used at various stages of the project and by the use of the same, I believe that a project can have the benefit of an Agile BI Development from an implementation perspective.
My Blog: http://siddhumehta.blogspot.com
 
About me: http://beyondrelational.com/blogs/siddharthmehta/about.aspx

Many thanks

Tony Rogerson
SQL Server MVP
http://sqlserverfaq.com
http://sqlblogcasts.com/blogs/tonyrogerson

 

SQL Server 2008 SP1 Cumulative Update (CU5) link: http://support.microsoft.com/kb/975977/LN/

Details on how they have fully fixed the OPTION( RECOMPILE ) Parameter Embedding Optimisation : http://support.microsoft.com/kb/976603/

What is Parameter Embedding Optimisation, well its basically you can do this...

WHERE email = coalesce( @email, email )
      AND name = coalesce( @name, name )
OPTION ( RECOMPILE )

Basically, if @email is NULL then the optimisers optimises out that clause in the query thus giving better access to index choices and negating the need for dynamic SQL.

A very cool feature that came in with RTM, was disabled in CU4 because of the bug and fixed and renabled in CU5.

 

Andrew Fryer has done a number of blog posts on Virtualisation for DBA's for SQL Server - very good and informative. 

http://blogs.technet.com/andrew/archive/2009/10/16/virtualisation-for-the-dba-part-1-are-you-bovvered.aspx

http://blogs.technet.com/andrew/archive/2009/10/12/virtualisation-for-the-dba-part-2-sql-server-management.aspx

http://blogs.technet.com/andrew/archive/2009/10/20/virtualisation-for-the-dba-part-3-sql-server-performance.aspx

http://blogs.technet.com/andrew/archive/2009/10/21/virtualisation-for-the-dba-part-4-licensing-and-support.aspx

 

This Thursday 29th is the London SQL User Group at Microsoft where we have Jamie Thomson talking about "Deploying SSRS Reports using MSBuild", we have one of our meeting sponsors Steve Hitchman from WhereScape demonstrating their Data Warehouse Development Workbench for SQL Server and we also have Andrew Sadler talking about "BI 2.0 - I'm searching.... give me a minute".

If you do not want't these emails then please just reply with remove in the subject and I'll update my list; you've got it because you are registered and still subscribed.

While I'm emailing you, don't forget the next Reading user group meeting is on the 11th Nov and I'll be talking about High Availability (Partitioning, File Groups (use of, backup/restore)) and an Introduction to SQL Server architecture and I'm currently looking for another one or two speakers, so if you want to come to that then get yourself registered -
http://sqlserverfaq.com?eid=171. Please make sure you are registered if you are coming so we know how much pizza we need! If you are already registered then this is also your confirmation.

Register and more details:
http://sqlserverfaq.com?eid=192.

This meeting is being sponsored by Microsoft and WhereScape.

The User Group meetings are very informal, very interactive and a really great method of expanding your knowledge, meeting other SQL Server professionals and getting answers to your problems.

17:30 - 18:00 Meet & Greet

18:00 - 18:15 Round Table discussion and Nuggets

Take stock and get the latest news in the SQL Server field. This is also a great opportunity to ask any burning questions you have - perhaps a work problem, general guidance etc..

18:15 - 19:15 Deploying SSRS reports using MSBuild - Jamie Thomson, SQL Server MVP

Come and see how deployment of Reporting Services can be automated and integrated into msbuild.

19:15 - 19:30 Break and Pizza

19:30 - 20:00 Steve Hitchman - Demonstration of Data Warehouse Development Workbench for SQL Server

20:00 - 21:00 ‘I’m searching .... give me a minute.’ BI 2.0 - Andrew Sadler, Hitachi Consulting

Search functionality is all pervasive. Outlook, explorer, intranet, internet, its everywhere. Why shouldn’t BI be the same?

As the boundaries between structured and non-structured data get blurred, end users are expecting to see and get more context around their BI as well as being lead into their BI in an intuitive user paradigm, like search.

This session highlights some simple yet quite powerful ways to expose some of search functionality that Sharepoint offers, in relation to Mirosoft’s BI offering, using several real world use-cases.

Many thanks

Tony Rogerson
SQL Server MVP
http://sqlserverfaq.com
http://sqlblogcasts.com/blogs/tonyrogerson

More Posts Next page »