05 March 2010 15:02
tonyrogerson
What is Granularity?
Granularity defines “the lowest level of detail”; but what is meant by “the lowest level of detail”?
Consider the Transactions table below:
create table Transactions (
TransactionID int not null primary key clustered,
TransactionDate date not null,
ClientID int not null,
StockID int not null,
TransactionAmount decimal( 28, 2 ) not null,
CommissionAmount decimal( 28, 5 ) not null
)
A Client can Trade in one or many Stocks on any date – there is no uniqueness to ClientID, Stock and TransactionDate.
The level of granularity on the table is TransactionID because that is our lowest level of detail, TransactionID uniquely identifies the row – there are no other candidate keys, the columns ClientID, StockID and TransactionDate form a context to the TransactionID.
Consider now that ClientID, StockID and TransactionDate are unique and enforced by a unique constraint (a Client can trade multiple stocks but only once in a given day) – what would the level of granularity now be?
The TransactionID in this case would be acting as a surrogate for the candidate key (ClientID, StockID, TransactionDate) therefore the granularity can be said to be TransactionID or all of the columns ClientID, StockID and TransactionDate and those columns are in no specific order when specified.
When using a separate key specific for the business “Transaction” i.e TransactionID the ClientID, StockID and TransactionDate do not form part of the granularity at all, in fact they provide a context to the Transaction, and the context itself is hierarchical because Clients trade in Stocks on a given Date (see below):
The above really talks about non-aggregated data, data you tend to find in the OLTP system.
When looking at aggregated data you would take all columns as the granularity because all the columns together determine the level of detail you are looking at. Aggregated data can be thought of as dimensional, unfortunately SQL does not give us operators to work in dimensional instead we’d use MDX. But it’s important to understand that an aggregated value sits in time and space so to speak and you access it through any member columns of the granular key.
If we had aggregated on ClientID, StockID, TransactionDate (see below) then CommissionAmountTotal has granularity of all the columns (ClientID, StockID and TransactionDate) – remember the columns have no ordering – the granularity key is all of them.

select ClientID,
StockID,
TransactionDate, sum( CommissionAmount ) as CommissionAmountTotal
from Transactions
group by ClientID, StockID, TransactionDate
This can be proved because you can rewrite the group by clause within any combination and still get the same answer.
When working upon aggregated data if you miss one of the columns out from the granular key then you are implying “All” so effectively the granularity has not changed on the query, but if aggregating a level up then the result of that instance of the query execution will have a different granularity:
The query below the source granularity is TransactionID, the result granularity is ClientID.
select ClientID, sum( CommissionAmount ) as CommissionAmountTotal
from Transactions
group by ClientID
The query below the source granularity is still TransactionID within the derived table, the outer query the source granularity is {ClientID, StockID, TransactionDate} and the resulting granularity is ClientID.
select ClientID, SUM( CommissionAmount ) as CommissionAmountTotal
from (
select ClientID,
StockID,
TransactionDate, sum( CommissionAmount ) as CommissionAmountTotal
from Transactions
group by ClientID, StockID, TransactionDate
) as agg( ClientID, StockID, TransactionDate, CommissionAmountTotal )
group by ClientID
In summary granularity has a lot to do with perspective and how the lowest level of data is identified, the level of granularity can change – it has a source and result when aggregating. The level of granularity is important because it helps us understand what aggregations can be performed on what data to stop double counting for instance.
Filed under: Data Theory