fbpx

Adding Row Numbers Excel Sources in SSIS

This is an interesting challenge that I recently explored during a forum discussion on SQLServerCentral.com. The situation in which the user presented wasn't exactly ideal so I will first present the other two options I proposed besides the final one that appears to work for him in his environment.

Challenge: When extracting data from an Excel source I would like to add a row number.

Solutions:

  1. Include an identity column in the staging table which would add the row number for future data flows that consume the data.
  2. Write the Excel data to a transient table (which has an identity column) that gets truncated at the beginning of the package (resetting the identity column).

Both of these options are expensive I/O operations that would slow things down. Another option presented by Jack Corbet was to use a script task to increment a package variable row by row and adding that to the output.

The final solution I proposed seems to be suitable for this case and I may look to implement this more in my future SSIS requirements. Here goes:

  1. Added an OLE DB Source to my data flow.
  2. Selected a SQL 2005 DB Server for my connection manager.
  3. Chose ‘SQL Command' as my Data access mode.
  4. Used the RowNumber() function combined with the OPENROWSET() connection method to compose a query which pulled all the data in from the Excel sheet adding in the RowNumber.

In order for this to work your SQL 2005 Surface Area Configuration must allow the OPENROWSET() connection method and you'll most like want to define this SQL Statement dynamically through a package configuration or some other method.

Here is the query:

SELECT
                ROW_NUMBER() OVER (order by [VendorAccountNo]) AS [RowNumber]
        ,       [VendorAccountNo]
        ,       [LocalAccountNo]
        ,       [CompanyCd]
        ,       [CUSIP]
FROM
        OPENROWSET(
                'Microsoft.Jet.OLEDB.4.0',
                'Excel 8.0;Database=C:tempMySheet.xls;HDR=0; IMEX=1;',
                'SELECT * FROM [Sheet1$]'
        )

I'll probably write more on this later as to which method provides the best performance and maintainability but for now feel free to comment on what you think…

Cheers!,
Ben

8 Comments

  1. Can you tell me if there is any solution for same problem using SQL Server 2000? I think row_number() wont work in 2000.

    Sagar

  2. Hi Sagar,

    You are correct, the row_number() function is not supported in SQL Server 2000. In this case I would suggest inserting the data into a temp table with an identity field. Hope this helps!

    -Ben

  3. Thanks for the helpful hints, and for presenting 3 solutions in one post. I used solution 2/, and it works great. I only have a small Excel Table and don’t care about performance.

  4. But you have an order by clause on VendorAccountNo, maybe the Excel file is not ordered by VendorAccountNo…

  5. I want to add new column in the excel file with the column name “file_[current month]” using SSIS script task. The column should be added only at the 1st of every month. How can I do it? Please help me.

    Thanks,
    Ashwin.

  6. I hope you rarely halt! This is certainly among the list of most effective weblogs Ive ever before examine. Youve obtained some mad ability below, man. I just hope which you dont drop your design simply because youre undeniably on the list of coolest bloggers in existence. Please retain it up since the online expectations a person such as you spreading the term.

  7. I simply wanted to inform you how much my spouse and i appreciate all you’ve provided to help increase the value of the lives of folks in this subject matter. Through your articles, we’ve gone via just a beginner to an expert in the area. It’s truly a homage to your good work. Thanks

Comments are closed.