05 March 2010 15:02 tonyrogerson

What is Granularity?

Granularity defines “the lowest level of detail”; but what is meant by “the lowest level of detail”?

Consider the Transactions table below:

create table Transactions (

    TransactionID       int not null     primary key clustered,

    TransactionDate     date not null,

    ClientID            int not null,

    StockID             int not null,

    TransactionAmount   decimal( 28, 2 ) not null,

    CommissionAmount    decimal( 28, 5 ) not null

)

 

A Client can Trade in one or many Stocks on any date – there is no uniqueness to ClientID, Stock and TransactionDate.

The level of granularity on the table is TransactionID because that is our lowest level of detail, TransactionID uniquely identifies the row – there are no other candidate keys, the columns ClientID, StockID and TransactionDate form a context to the TransactionID.

Consider now that ClientID, StockID and TransactionDate are unique and enforced by a unique constraint (a Client can trade multiple stocks but only once in a given day) – what would the level of granularity now be?

The TransactionID in this case would be acting as a surrogate for the candidate key (ClientID, StockID, TransactionDate) therefore the granularity can be said to be TransactionID or all of the columns ClientID, StockID and TransactionDate and those columns are in no specific order when specified.

When using a separate key specific for the business “Transaction” i.e TransactionID the ClientID, StockID and TransactionDate do not form part of the granularity at all, in fact they provide a context to the Transaction, and the context itself is hierarchical because Clients trade in Stocks on a given Date (see below):

The above really talks about non-aggregated data, data you tend to find in the OLTP system.

When looking at aggregated data you would take all columns as the granularity because all the columns together determine the level of detail you are looking at. Aggregated data can be thought of as dimensional, unfortunately SQL does not give us operators to work in dimensional instead we’d use MDX. But it’s important to understand that an aggregated value sits in time and space so to speak and you access it through any member columns of the granular key.

If we had aggregated on ClientID, StockID, TransactionDate (see below) then CommissionAmountTotal has granularity of all the columns (ClientID, StockID and TransactionDate) – remember the columns have no ordering – the granularity key is all of them.

select ClientID,

       StockID,

       TransactionDate, sum( CommissionAmount ) as CommissionAmountTotal

from Transactions

group by ClientID, StockID, TransactionDate

 

This can be proved because you can rewrite the group by clause within any combination and still get the same answer.

When working upon aggregated data if you miss one of the columns out from the granular key then you are implying “All” so effectively the granularity has not changed on the query, but if aggregating a level up then the result of that instance of the query execution will have a different granularity:

 The query below the source granularity is TransactionID, the result granularity is ClientID.

select ClientID, sum( CommissionAmount ) as CommissionAmountTotal

from Transactions

group by ClientID

The query below the source granularity is still TransactionID within the derived table, the outer query the source granularity is {ClientID, StockID, TransactionDate} and the resulting granularity is ClientID.

select ClientID, SUM( CommissionAmount ) as CommissionAmountTotal

from (

    select ClientID,

           StockID,

           TransactionDate, sum( CommissionAmount ) as CommissionAmountTotal

    from Transactions

    group by ClientID, StockID, TransactionDate

 

    ) as agg( ClientID, StockID, TransactionDate, CommissionAmountTotal )

group by ClientID


In summary granularity has a lot to do with perspective and how the lowest level of data is identified, the level of granularity can change – it has a source and result when aggregating. The level of granularity is important because it helps us understand what aggregations can be performed on what data to stop double counting for instance.

 

Filed under:

Comments

# Is there a way to give a non-domain admin certain administrative priviledges? | Domain hosting

Pingback from  Is there a way to give a non-domain admin certain administrative priviledges? | Domain hosting

# Relasi antara Measure Groups dan Dimensi.. « My Notes..

Pingback from  Relasi antara Measure Groups dan Dimensi.. « My Notes..