SQL and the like

Dave Ballantyne's blog. Freelance SQL Server database designer and developer at Clear Sky SQL

August 2012 - Posts

Reporting on common code smells : A POC

Over the past few blog entries, I’ve been looking at parsing TSQL scripts in a variety of ways for a variety of tasks.  In my last entry ‘How to prevent ‘Select *’ : The elegant way’, I looked at parsing SQL to report upon uses of SELECT *.  The obvious question leading on from this is, “Great, what about other code smells ?”  Well, using the language service parser to do that was turning out to be a bit of a hard job,  sure I was getting tokens but no real context.  I wasn't even being told when an end of statement had been reached.

One of the other parsing options available from Microsoft is exposed in the assembly ‘Microsoft.SqlServer.TransactSql.ScriptDom’,  this is ,I believe, installed with the client development tools with SQLServer.  It is much more feature rich than the original parser I had used and breaks a TSQL script into intuitive classes for analysis.

So, what sort of smells can I now find using it ?  Well, for an opening gambit quite a nice little list.

  • Use of NOLOCK
  • Set of READ UNCOMMITTED
  • Use of SELECT *
  • Insert without column references
  • Explicit datatype conversion on Sargs
  • Cross server selects
  • Non use of two-part naming convention
  • Table and Query hint usage
  • Changes in set options
  • Use of single line comments
  • Use of ordinal column positions in ORDER BY clause

Now, lets not argue the point that “It depends” as smells on some of these, but as an academic exercise it is quite interesting. 

The code is available on codeplex http://tsqlsmells.codeplex.com/ 

All the usual disclaimers apply to this code, I cannot be held responsible for anything ranging from mild annoyance through to universe destruction due to the use of this code or examples.

The zip file contains a powershell script and my test cases.  The assembly used requires .Net 4 to run, which means that you will need powershell 3 ( though im running through PowerGUI and all works ok ) . 

The code searches for all .sql files in the folder hierarchy for the workingpath,  you can override this if you want by simply changing the $Folder variable, and processes each in turn for the smells.  Feedback is not great at the moment, all it does is output to an xml file (Smells.xml) the offset position and a description of the smell found.

Right now, I am interested in your feedback.  What do you think ?  Is this (or should it be) more than an academic exercise ?  Can tooling such as this be used as some form of code quality measure ?  Does it Work ? Do you have a case listed above which is not being reported ? Do you have a case that you would love to be reported ?

Let me know , please Smile mailto: parser@clearskysql.co.uk.

Thanks

How to prevent ‘Select *’ : The elegant way

UPDATE 2012-09-12 For my latest adventures with TSQL Parsers please see this post.

I’ve been doing a lot of work with the “Microsoft SQL Server 2012 Transact-SQL Language Service” recently, see my post here and article here for more details on its use and some uses.

An obvious use is to interrogate sql scripts to enforce our coding standards.  In the SQL world a no-brainer is SELECT *,  all apologies must now be given to Jorge Segarra and his post “How To Prevent SELECT * The Evil Way” as this is a blatant rip-off Smile

IMO, the only true way to check for this particular evilness is to parse the SQL as if we were SQL Server itself.  The parser mentioned above is ,pretty much, the best tool for doing this.  So without further ado lets have a look at a powershell script that does exactly that :

cls
#Load the assembly
[System.Reflection.Assembly]::LoadWithPartialName("Microsoft.SqlServer.Management.SqlParser") | Out-Null
$ParseOptions = New-Object Microsoft.SqlServer.Management.SqlParser.Parser.ParseOptions
$ParseOptions.BatchSeparator = 'GO'
#Create the object
$Parser = new-object Microsoft.SqlServer.Management.SqlParser.Parser.Scanner($ParseOptions)
$SqlArr = Get-Content "C:\scripts\myscript.sql"
$Sql = ""
foreach($Line in $SqlArr){
    $Sql+=$Line
    $Sql+="`r`n"
}
$Parser.SetSource($Sql,0)
$Token=[Microsoft.SqlServer.Management.SqlParser.Parser.Tokens]::TOKEN_SET
$IsEndOfBatch = $false
$IsMatched = $false
$IsExecAutoParamHelp = $false
$Batch = ""
$BatchStart =0
$Start=0
$End=0
$State=0
$SelectColumns=@();
$InSelect = $false
$InWith = $false;
while(($Token = $Parser.GetNext([ref]$State ,[ref]$Start, [ref]$End, [ref]$IsMatched, [ref]$IsExecAutoParamHelp ))-ne [Microsoft.SqlServer.Management.SqlParser.Parser.Tokens]::EOF) {
    $Str = $Sql.Substring($Start,($End-$Start)+1) 
    try{
        ($TokenPrs =[Microsoft.SqlServer.Management.SqlParser.Parser.Tokens]$Token) | Out-Null
        #Write-Host $TokenPrs
        if($TokenPrs -eq [Microsoft.SqlServer.Management.SqlParser.Parser.Tokens]::TOKEN_SELECT){
            $InSelect =$true
            $SelectColumns+=""
        }    
        if($TokenPrs -eq [Microsoft.SqlServer.Management.SqlParser.Parser.Tokens]::TOKEN_FROM){
            $InSelect =$false
            #Write-Host $SelectColumns -BackgroundColor Red
            
            foreach($Col in $SelectColumns){
                if($Col.EndsWith("*")){
                    Write-Host "select * is not allowed"
                    exit
                
                }
            
            }
            $SelectColumns =@()
        }
        
    }catch{
        #$Error
        $TokenPrs = $null
        
    }    
    if($InSelect -and $TokenPrs -ne [Microsoft.SqlServer.Management.SqlParser.Parser.Tokens]::TOKEN_SELECT){
        if($Str -eq ","){
            $SelectColumns+=""
        }else{
            $SelectColumns[$SelectColumns.Length-1]+=$Str
        }    
    }
}

OK, im not going to pretend that its the prettiest of powershell scripts,  but if our parsed script file “C:\Scripts\MyScript.SQL” contains SELECT * then “select * is not allowed” will be written to the host. 

So, where can this go wrong ?  It cant ,or at least shouldn’t Smile , go wrong, but it is lacking in functionality.  IMO, Select * should be allowed in CTEs, views and Inline table valued functions at least and as it stands they will be reported upon.

Anyway, it is a start and is more reliable that other methods.