SQL Server Integration Services - think about Transactions and Checkpointing?
Transactions and Checkpoints are important factors for data consistency, when it comes for ETL processes that too on SSIS what is your stand?
I'm covering this important task for the sake of newbie & junior DBAs, also the similar question was raised by the users during a user group meeting. SQL Server Integration Services has this sophisticated method in evolving a failed package to execute from the point of failure, instead of rerunning the whole package. If you ever developed sophisticated DTS package, and there is another error in one of the task, usually you have to (re)execute the package from beginning. Making this point of execution for the complex packages, is a real problem and having big data that get involved in ETL process, it will consume considerably amount of time wasted. This is where the CHECKPOINT process kicks in SSIS services where you have option not to start package from beginning in case the package gets error.
Similar to this point you have to ensure the packages are configured to use checkpoints, information about package execution is written to a checkpoint file. When the failed package is rerun, the checkpoint file is used to restart the package from the point of failure. If the package runs successfully, the checkpoint file is deleted, and then re-created the next time the package is run. The package execution is logged to a file, when the task in a package gets failure to execute. When you correct the task, and re-execute the package, the beginning of execution will start at the failed task not the first task of a package. The package will read execution in the file, and resume executing the task and the rest. Once it is successfully executed, the package will remove the file. Nice feature that help you a lot when you develop a complex calculation package. There is another downtrend using this checkpoint and transactions method, in the same package. This could cause unexpected results, for instance when a package fails and restarts from a checkpoint, the package might repeat a transaction that has already been successfully committed.
Tip: To use this within the migrated packages from SSIS using dtexec utility, you can opt to run /CheckPointing on option of dtexec is equivalent to setting the SaveCheckpoints property of the package to True, and the CheckpointUsage property to Always.
To use this approach in SSIS you can use BIDS (BI Development Studio) by enabling checkpointing of a package, where you can define package’s properties in Checkpoints section with a 'Checkpointfilename', set CheckPointUsage to 'IfExists' or 'always', and set to true for SaveCheckpoint. Then you set FailPackageOnFailure property of a task to true, otherwise the package will not create checkpoint file if it fails. To see whether it works or not you can create a package with sample execute SQL Task package, set second package property name ForceExecutionResult to failure, and run the package. If you notice the output window, package status will have failure status, then you will find checkpoint file that you specify on Checkpointfilename property. Secondly you can perform the correct task, by keeping 'set ForceExecutionResult' property of second task from 'Failure to success'. Run the package again, this time you notice that package will be executed from the second task (failed task) rather than first task. Once it is successfully executed, the checkpoint file will no longer exist, because the package deletes the checkpoint file that no longer useful.
In addition to this you have to be aware about types of transactions, such as Distributed Transactions and Native Transaction (SQL Engine). Both of these types are suppoted where the initial one uses the 2 phase commit type with MSDTC service, so you have to ensure that DTC service is up and running in order to execute the package successfully. On top of this within the package you need not specify BEGIN TRAN.... COMMIT/ROLLBACK... ENDTRAN statements, as they are taken care by DTC service itself. Secondly the native approach of Transaction commit/rollback method by SQL engine you have to use the explicit specification of BEGIN TRAN... END TRAN statements under 'Execute SQLTask' component of SSIS, also ensure to setup the connection manager to keep the 'RetainSameConnection' property to TRUE.
To wrapup you can take more help from local documentation of SQL Server aka BOL, that too updated one. How-To topics in BOL refers:
To configure a package to restart
In Business Intelligence Development Studio, open the Integration Services project that contains the package you want to configure.
In Solution Explorer, double-click the package to open it.
Click the Control Flow tab.
Right-click anywhere in the background of the control flow design surface, and then click Properties.
Set the SaveCheckpoints property to True.
Type the name of the checkpoint file in the CheckpointFileName property.
Set the CheckpointUsage property to one of two values:
- Select Always to always restart the package from the checkpoint.
- Select IfExists to restart the package only if the checkpoint file is available.