XML Validation in SQL Server Intergration Services
I've been workig with XML in SQL Server Integration services this week and to say
the XML task in SQL Server Integration services sucks is an understatement. Its buggy, the documentation
is woefully inadequate in some cases wrong. I believe this was one of the first
components written back for SQL 2005 so you can imagine it was written along
time ago, XML wasn't as standardised.
So what if you want to validate an XML document against a XSD Schema. Well in
true SQL Server Integration services mode the answer is to do it yourself with a script task.
There are two ways of doing it, the first uses an XmlDocument object and the
other is the XmlReader. If you were on my XML course at SQLBits you will
remember that I pointed out the importance of straeming when dealing with large
data and this is where these two differ. The XmlDocument loads the document into
memory, whereas the XmlReader is a streaming interface to the XML document you
have. So if you have a large document you should use the XmlReader.
In my case however I wanted to pass the XML to a stored procedure after it
had been validated and so I used the XmlDocument.
The first step is to create a XmlSchemaSet and put your schema into it.
string Filepath = Dts.Variables["User::FullNewFilename"].Value.ToString();
string Schema= Dts.Variables["User::SchemaFilepath"].Value.ToString();
// Create a schema set
XmlSchemaSet schemaset =
new XmlSchemaSet();
// Add the schema to the
collection.
schemaset.Add(XmlSchema.Read(XmlReader.Create (Schema),null));
Here Schema is a variable to the path for the schema and Filepath is the
variable to the Xml file to be validated.
After that you have to either use the XmlDocument or XmlReader. For the
XmlDocument we are passing back the contents of the file using the InnerXml
property.
//clear our errors string
builder
ValidationErrors
= new StringBuilder();
//Use this if you need to read the document for use
elsewhere
XmlDocument doc = new XmlDocument();
doc.Schemas
= schemaset ;
doc.Load(Filepath
);
doc.Validate(new ValidationEventHandler(ValidationCallBack));
//Load the contents of file into the variable for later
processing.
Dts.Variables["ImportFileContents"].Value =
doc.InnerXml;
//clear our
errors string builder
or use the XmlReader, this requires another object the XmlReaderSettings
object
//set the validation settings.
XmlReaderSettings settings = new XmlReaderSettings();
settings.ValidationType
= ValidationType.Schema;
settings.Schemas
= schemaset ;
settings.ValidationEventHandler
+= new ValidationEventHandler(ValidationCallBack);
using (XmlReader
xr = XmlReader.Create(Filepath,settings
))
{
//This reads the whole document through the reader and does
the validation
while (xr.Read()) ;
//Close the reader
xr.Close();
}
In both you will see that they have references to the ValidationCallBack. The
validation model is a callback model and so you need this extra function. This
is why we are using a global variable ValidationErrors to store the errors and
then be able to access them after the validation succeeds. The validation
callback function is described here
// Display
any validation errors.
private void
ValidationCallBack(object sender, ValidationEventArgs e)
{
ErrorOccurred = ErrorOccurred || e.Severity == XmlSeverityType.Error;
Warning= Warning||
e.Severity == XmlSeverityType.Error;
if (ValidationErrors.Length >
0)
{
ValidationErrors.AppendLine();
}
ValidationErrors.Append(e.Message);
}
Complete Code
So putting it all together we get
StringBuilder
ValidationErrors;
bool ErrorOccurred= false;
bool Warning = false;
public void
Main()
{
string Filepath =
Dts.Variables["User::FullNewFilename"].Value.ToString();
string Schema= Dts.Variables["User::SchemaFilepath"].Value.ToString();
// Create a schema set
XmlSchemaSet schemaset = new XmlSchemaSet();
// Add the schema to the
collection.
schemaset.Add(XmlSchema.Read(XmlReader.Create (Schema),null));
//clear our errors string
builder
ValidationErrors =
new StringBuilder();
//Use this if you need to read the document for use
elsewhere
XmlDocument doc = new XmlDocument();
doc.Schemas = schemaset
;
doc.Load(Filepath
);
doc.Validate(new ValidationEventHandler(ValidationCallBack));
//Load the contents of file into the variable for later
processing.
Dts.Variables["ImportFileContents"].Value =
doc.InnerXml;
/* //Otherwise
use the following
// Set the validation
settings.
XmlReaderSettings settings = new XmlReaderSettings();
settings.ValidationType
= ValidationType.Schema;
settings.Schemas =
schemaset ;
settings.ValidationEventHandler
+= new ValidationEventHandler(ValidationCallBack);
using (XmlReader xr =
XmlReader.Create(Filepath,settings ))
{
//This reads the whole document through the reader and does the
validation
while
(xr.Read()) ;
//Close
the reader
xr.Close();
}*/
if (ValidationErrors.Length >
0)
Dts.Variables["XMLError"].Value =
ValidationErrors.ToString();
Dts.Variables["Validation"].Value = ErrorOccurred?"Error":(Warning?"Warning":"");
}
// Display any validation
errors.
private void
ValidationCallBack(object sender, ValidationEventArgs e)
{
ErrorOccurred = ErrorOccurred || e.Severity == XmlSeverityType.Error;
Warning= Warning||
e.Severity == XmlSeverityType.Error;
if (ValidationErrors.Length >
0)
{
ValidationErrors.AppendLine();
}
ValidationErrors.Append(e.Message);
}
Enjoy.