February 2011 - Posts

Common sense, courtesy and giving a damn?
17 February 11 11:16 PM | MartinBell | with no comments

One of the questions I ask when being interviewed is “Have you read my blog?” And I have been surprised by the high number of times the answer is “No”. Sometimes this is accompanied by some excuse, usually quite lame. If that excuse is that they haven’t had time then (IMO) they are just digging themselves into a bigger hole.

As a candidate I may be expected to prepare for an interview, for example, showing some basic knowledge of what the company does, for instance, if the company has developed an application what the application is designed to do. So why shouldn’t I expect the same from the interviewer? If my CV says that I blog, is it not common sense that the person doing the interviewing should take a look, after all a blog could tell you more about me than any CV could do. Frequency of posts, technical and other content could reveal a huge amount. The contents of the blog could be used to shape the way the interview is conducted, break the ice or phrase a question. In my previous post I stated how I dislike interviews that are just a set of technical questions read off a piece of paper. They can feel like the Spanish Inquisition, adding pressure and not getting the best out of the candidate.

If you don’t find out as much as you can about a candidate before an interview can you say that you are actually performing the best service for your company? When I interview people, I make sure I read the CV, then go through the CV a second time highlighting any points I think are worth following up on and also make a note of any requirements that are not mentioned on the CV so I can ask the candidate.

I’ve covered common sense and giving a damn, that leave courtesy. Courtesy is something that often seems to have died out in this day, so here's the definition from Merriam-Webster.

Filed under:
Signing Powershell scripts
16 February 11 10:08 PM | MartinBell | 2 comment(s)

I wanted to write a post on signing Powershell scripts, but doing a few internet searches found this article. Which is great if you have a certificate store, and this is the recommended method, but using a certificate store is not always an available option. There are several alternate methods to create a certificate and I found this method that used makecert. Makecert should be available if you have installed the Windows SDK, or it can be downloaded with the SDK at http://msdn.microsoft.com/en-us/windowsserver/bb980924.aspx. I noticed a couple of issues with the code snippet; it has the wrong algorithm (sh1 instead of sha1) and the certificate file should be specified as the –ic  option and not –c. This was reported and hopefully it will be corrected. Another issue running the script is that it assumes that makecert.exe is on the PATH or in the current directory. I came up with the following script that will search for an instance of makecert.exe and then run it.

$pvk = "C:\powershell\root.pvk"
$cer = "C:\powershell\root.cer"
Remove-Item $pvk, $cer -ErrorAction:SilentlyContinue
$file = @(get-psdrive -p "FileSystem" | % {get-childitem $_.Root -include makecert.exe -r -ErrorAction:SilentlyContinue })[0]
& $file.fullName -n "CN=MyRoot" -a sha1 -eku 1.3.6.1.5.5.7.3.3 -r -sv $pvk $cer -ss Root -sr CURRENTUSER
& $file.fullName -pe -n "CN=MyCertificate" -ss MY -a sha1 -eku 1.3.6.1.5.5.7.3.3 -iv $pvk -ic $cer

Another change I made was to the Subject's certificate store location (-sr option) from localhost to CURRENTUSER. Without this change I got permissions errors that gave the error:

Error: Save encoded certificate to store failed => 0x5 (5)
Failed

when the command was run. The certificate will still be installed, but it won’t be trusted and so can’t be used to sign scripts.

When the script works correctly you will be prompted with three dialogs for various passwords. You will also be asked to confirm that you wish to install the certificate as the certification authority can not be confirmed: 

 

To see if the certificate installed correctly use the command:

get-childitem cert:\CurrentUser\My -codesigning

You should see something like:

Directory: Microsoft.PowerShell.Security\Certificate::CurrentUser\My

Thumbprint                                                        Subject
----------                                                              -------
2950019745D88A46C94B46A06571EA3E4522F4B4 CN=MyCertificate 

You can also view the certificate with certmgr




By ckicking on the certificate you can view the details:



To sign the script you can run something like:

$cert = @(gci cert:\currentuser\my -codesigning)[0]
$script = "C:\powershell\script.ps1"
Set-AuthenticodeSignature $script $cert

When this script is run the output will be something like:

    Directory: C:\powershell


SignerCertificate                                                   Status Path
-----------------                                                        ------    ----
3DE67752A34AB7980FAF4AE58A2788DE511C8801  Valid   script.ps1

If the look at script.ps1 file you will see the signature appended to the file. To test the script change the execution policy to allsigned. To check the execution policy use the command:

Get-ExecutionPolicy

If the policy needs setting use the command:

Set-ExecutionPolicy –executionpolicy AllSigned –Force

Scripts will then only execute if they are signed (so do this AFTER after signing the script!). If you don’t sign the script you will get an error message like:

File C:\powershell\script.ps1 cannot be loaded. The file C:\powershell\script.ps1 is not digitally signed. The script will not exec
ute on the system. Please see "get-help about_signing" for more details..
At line:1 char:25
+ C:\powershell\script.ps1 <<<<
    + CategoryInfo          : NotSpecified: (:) [], PSSecurityException
    + FullyQualifiedErrorId : RuntimeException

Because this is a self-signed certificate you will be prompted when the script is run as to whether it is trustworthy e.g.



or for Powershell ISE



If you reply to Always run scripts with this certificate, then any script that is signed using this certificate will run without re-prompting the user.

Filed under:
Heap Inserts and Replication
14 February 11 04:09 PM | MartinBell | 1 comment(s)

In my previous blog post I talked about how rows are inserted into a heap may lead to space in a page not being used. This is most likely to occur when single rows are inserted, and therefore it occurred to me that transactional replication may be one of the situations where this could lead to data file bloating. This is because transactional replication will convert DML statements on the publisher into singleton commands on the subscriber.

I wrote the following script to test this theory. When I inserted 4 rows into the published table in single statement. You can see from the output of DBCC IND that four rows are inserted into a single data page:




The DBCC PAGE output for this page is similar to the example in the previous post and can be found here.

If I check the subscriber DBCC IND shows two data pages for the replicated table.




The DBCC PAGE output for the subscriber table and shows three slots in use on one page and one slot used on the second page.

This could be a problem which leads to bloated data files on the subscriber, although I suspect it isn’t as common as some may think. A replicated table requires a Primary Key, in the majority of cases I think this is likely to be defined with a clustered index. When this is not the case it is probably because there is an alternate clustered index. This probably means that heaps are not often replicated.

Filed under: ,
Heap insert behaviour
10 February 11 07:28 PM | MartinBell | with no comments

As part of his preparation for the “A trip into Index Internals” talk that David Betteridge is giving at the Leeds and Manchester User Groups next month he came upon the following behaviour.

Using the following table:

CREATE TABLE dbo.Heap
(
      KeyField          CHAR(100)         NOT NULL,
      Filler            CHAR(1900)        NOT NULL
)
GO

If you insert 4 rows in a single statement (full script here) e.g.

INSERT INTO dbo.Heap (KeyField, Filler)
VALUES ('1', REPLICATE('A',1900)), ('2', REPLICATE('B',1900)),
('3', REPLICATE('C',1900)), ('3', REPLICATE('D',1900)) ;
GO

and then look at the data pages allocated using DBCC IND

DECLARE @db_id INT = DB_ID() ;
DBCC IND (@db_id, 'dbo.Heap', -1)
GO

You will see that they are all contained on a single data page.


If you look at the DBCC PAGE using the following SQL:

DBCC TRACEON(3604)
DECLARE @db_id INT = DB_ID() ;
DBCC PAGE ( @db_id, 1, 78, 1 )
GO

You can see from the DBCC PAGE output that the PFS indicates that the page is 100_PCT_FULL. The page header for page 78 gives 4 slots in use,
freeCnt indicates 60 bytes are not being used and record size is 2007.




The output from DBCC PAGE can be found here.

But if you insert 4 rows individually (full script here)

DECLARE @i int = 1
WHILE @i < 5
BEGIN
      INSERT INTO dbo.Heap (KeyField, Filler) 
      VALUES (CAST(@i as CHAR(100)), REPLICATE(CHAR(64+@i),1900)) ;
      SET @i+=1 ;
END
GO

Then the forth row inserted will be on a second data page

The DBCC PAGE output for page 78 is now as follows:



You can see from this the PFS indicates that the page is 80_PCT_FULL, and the page header shows 3 slots are beng used and 2069 bytes are free. The DBCC PAGE for page 80 is as follows:



This indicates that 1 slot is in use. The output from DBCC PAGE can be found here.

Looking further into this I found that inserting rows in pairs would use a single page, but inserting three rows and then a single row you get two data pages. Searching for details about this, I found no information that clearly explains why this occurs. In the subject Managing Extent Allocations and Free Space in Books Online it talks about how the PFS pages track the free space:

"The PFS has one byte for each page, recording whether the page is allocated, and if so, whether it is empty, 1 to 50 percent full, 51 to 80 percent full, 81 to 95 percent full, or 96 to 100 percent full.

After an extent has been allocated to an object, the Database Engine uses the PFS pages to record which pages in the extent are allocated or free. This information is used when the Database Engine has to allocate a new page.  The amount of free space in a page is only maintained for heap and ext/Image pages. It is used when the Database Engine has to find a page with free space available to hold a newly inserted row. Indexes do not require that the page free space be tracked, because the point at which to insert a new row is set by the index key values."

Paul Randal has more information about the structure of the PFS on his blog post

In the Book Online topic “Estimating the Size of a Heap”  you can estimate how many rows will be used. In this case:

Fixed_Data_Size = 2000

Variable_Data_Size = 0

Null_Bitmap = 2 + ((Num_Cols + 7) / 8) = 1

Row_Size = Fixed_Data_Size + Variable_Data_Size + Null_Bitmap + 4 = 2007

Rows_Per_Page = 8096 / (Row_Size + 2) = 4

This estimates four rows per page and yet in the second example we only have 3 rows?

The infomation in Managing Extent Allocations and Free Space says the Database engine uses the percentage free information from the PFS page to find out if page has enough space. The value used is the upper bound for the range indicated by the bit setting, even if the page itself has more bytes free. This means that for an 8K page that is 80% full, 1638 bytes is the value used for the amount of bytes free, a page that is 95% full then 409 bytes are deemed to be free. So, for the second example where the row is 2007 bytes when the page that is 51-80% full (or more), SQL Server decides there is not enough room available and therefore a new page is allocated.

The Rows_Per_Page equation also has some limitations. It is based on the FixedVar row format and whilst it does account for the following:

  • Page header (96 bytes i.e. 8096 = 8192 – 96 ) 
  • Row Offset Array  (2 bytes)


It doesn’t take into account the following:

  • Version information if snapshot isolation is turned on on for the database (14 bytes)
  • Reserved spaces on the page that is used if running transactions where to roll back and need to put the original row back on the page
  • Vardecimal columns 
  • Sparse columns
  • Data Compression and rows in Column Descriptor format

To remove the unused extra space at the end of the page, you can use the ALTER TABLE… REBUILD statement available in SQL Server 2008.

ALTER TABLE dbo.Heap REBUILD ;
GO

When I do this on the above table I get the following information returned from DBCC IND.



This is still showing 2 data pages for the heap, but DBCC PAGE shows that page 89 is not being used. This bug is fixed in Denali.
For as script including the rebuild see here.

To get the best space usage for a heap you can do the following:

  1. insert rows in single statement
  2. use bulk insert
  3. avoid lots of updates / deletes
  4. maintain the heap with ALTER TABLE.. REBUILD

Many thanks to Marcel van der Holst who made this post possible.

Filed under:

This Blog

SQL Blogs

Syndication