I recently needed to load several thousand XML files into a SQL database, I created an SSIS package which was created as followed:
Using a foreach container to loop through a directory and load each file path into a variable, the “Import XML” dataflow would then load each XML file into a SQL table.
Running this, it took approximately 1 second to load each file which seemed a massive amount of time to parse the XML and load the data, speaking to my colleague Martin Croft, he suggested the use of T-SQL Bulk Insert and OpenRowset, so we adjusted the package as followed:
The same foreach container was used but instead the following SQL command was executed (this is an expression):
"INSERT INTO MyTable(FileDate)
SELECT CAST(bulkcolumn AS XML)
'" + @[User::CurrentFile] + "',
SINGLE_BLOB ) AS x"
Using this method we managed to load approximately 20 records per second, much faster…for data loading!
For what we wanted to achieve this was perfect but I’ll leave you with the following points when making your own decision on which solution you decide to choose!
- Much faster to get the data into SQL
- You’ll need to parse or create a view over the XML data to allow the data to be more usable(another post on this!)
- Not able to apply validation/transformation against the data when loading it
- The SQL Server service account will need permission to the file
- No schema validation when loading files
- Slower (in our case)
- Schema validation
- Allows you to apply transformations/joins to the data
- Permissions should be less of a problem
- Data can be loaded into the final form through the package
- When using a schema validation errors can fail the package (I’ll do another post on this)