SSIS Event Log Business Intelligence

Last Thursday I presented at the San Diego SQL Server User Group (www.sdsqlug.org) a codeplex project I developed based on the SSIS Event Logs. Below are links to the codeplex project as well as the slides I presented.

The inspiration for this project comes from a former position I held managing a team of database developers in which we were responsible for all the Customer Data Integration (CDI) processes as well as the Extract Transform and Load (ETL) processes for our Enterprise Data Warehouse (EDW).

While in that role I found that some of the CDI proceses were running slow to the point that it was affecting business operations. Poking around the native SSIS Event Log data it was really difficult to get any clue as to what part of the process was the bottleneck.

Now I knew that Microsoft had previously released some report packs (link) for the SSIS Event Log data that contained the logic I needed to extract this data. So I took these reports and reverse engineered them into an ETL process to load a new Data Mart (DM) based on the SSIS Event Log data.

Add an SSAS cube and some basic reports and I was now able to analyze the performance of my CDI and ETL processes easily and determine where bottlenecks exist and improvements can be made. Another interesting outcome of this is the ability to measure the improvements of changes to the CDI and ETL processes over time.

This project is designed to be generic so that anyone using SSIS can take advantage of this and gain some insight into their process performance. If you are interested in obtaining a copy of the source code please create a codeplex user account and send me a request on the project site (below).

Hope this helps!
Ben

SSIS Event Log Business Intelligence Codeplex Home – http://ssisbi.codeplex.com
Codeplex Registration – https://www.codeplex.com/site/register
San Diego SQL Server User Group – http://www.sdsqlug.org
SSIS on MSDN – http://msdn.microsoft.com/en-us/library/ms141026.aspx

Using Excel 2007 files as a Source in SSIS 2005

Thanks to “jaegd” for his post here: http://forums.microsoft.com/… on how to accomplish this. Below please find some more details along w/ a example which can be downloaded here (ssis-2005-excel-2007-source-example).

Steps

  1. Create new OLEDB Connection by right clicking on the Connection Manager tray
  2. New OLEDB Connection in SSIS

    New OLEDB Connection in SSIS

  3. Chose any valid type or value for the connection and click okay
  4. OLEDB Connection Manager

    OLEDB Connection Manager

  5. Rename the Connection Manager to something that makes sense, like “Excel 2007″
  6. Now that the the connection exists right click on it and select properties
  7. Edit the Connection String property IN THE PROPERTIES WINDOW to match below, updating with your Excel File Location
    • Data Source=C:\MyExcelFile.xlsx;Provider=Microsoft.ACE.OLEDB.12.0;Extended Properties=”Excel 12.0;HDR=YES;”
  8. manually-edit-connection-string

    manually-edit-connection-string

You can now use this connection by referencing it with any task/object that would normally connect to an OLEDB source. To query it use [Sheet1$] for the table name (eg. SELECT * FROM [Sheet1$])

Do you have anything to add? Please add your comments below!

Cheers,
Ben

Adding Row Numbers Excel Sources in SSIS

This is an interesting challenge that I recently explored during a forum discussion on SQLServerCentral.com. The situation in which the user presented wasn’t exactly ideal so I will first present the other two options I proposed besides the final one that appears to work for him in his environment.

Challenge: When extracting data from an Excel source I would like to add a row number.

Solutions:

  1. Include an identity column in the staging table which would add the row number for future data flows that consume the data.
  2. Write the Excel data to a transient table (which has an identity column) that gets truncated at the beginning of the package (resetting the identity column).

Both of these options are expensive I/O operations that would slow things down. Another option presented by Jack Corbet was to use a script task to increment a package variable row by row and adding that to the output.

The final solution I proposed seems to be suitable for this case and I may look to implement this more in my future SSIS requirements. Here goes:

  1. Added an OLE DB Source to my data flow.
  2. Selected a SQL 2005 DB Server for my connection manager.
  3. Chose ‘SQL Command’ as my Data access mode.
  4. Used the RowNumber() function combined with the OPENROWSET() connection method to compose a query which pulled all the data in from the Excel sheet adding in the RowNumber.

In order for this to work your SQL 2005 Surface Area Configuration must allow the OPENROWSET() connection method and you’ll most like want to define this SQL Statement dynamically through a package configuration or some other method.

Here is the query:

SELECT
                ROW_NUMBER() OVER (order by [VendorAccountNo]) AS [RowNumber]
        ,       [VendorAccountNo]
        ,       [LocalAccountNo]
        ,       [CompanyCd]
        ,       [CUSIP]
FROM
        OPENROWSET(
                'Microsoft.Jet.OLEDB.4.0',
                'Excel 8.0;Database=C:\temp\MySheet.xls;HDR=0; IMEX=1;',
                'SELECT * FROM [Sheet1$]'
        )

I’ll probably write more on this later as to which method provides the best performance and maintainability but for now feel free to comment on what you think…

Cheers!,
Ben