An Introduction To SSIS ETL Frameworks Part 2

This much delayed part 2 of my introduction to an SSIS ETL framework covers the latest iteration of my SSIS ETL framework which I will make freely available to anyone who wishes to use it.

Enhancements

Since I wrote the original blog post I have made some enhancements to my framework, namely:

  • Application packages are now executed using SSIS catalog stored procedures, this means that application packages no longer have to reside in the same project as the framework itself.
  • The level of SSIS catalog logging can now be controlled by metadata.
  • The 32 or 64 bit run time can be specified for each application child package via metadata.
  • A very simple form of metadata controlled precedence has been built into the framework, this means that you can control whether a package is executed based on whether a package that has already executed has inserted / updated or deleted data (rows).

A Recap On What The Framework Includes

  • An ETL Framework project This embodies the framework run time and includes a template package
    ( TaskPackageTemplate ) on which all application packages should be based.
  • The metadata database The EtlFrameworkMetadata contains all the necessary table for specifying ETL job child package composition, running order and package configuration.
  • The logging database EtlFrameworkLogging

Pre-Requisites

  • You must use integration services with SQL 2012 onwards.
  • Any application packages developed should use the project deployment model and be deployed to a separate project from the framework.

How To Use The Framework

  1. Deploy the framework to the SSIS catalog.
  2. Create your application project and deploy it to the same catalog as the framework.
  3. All packages in the application project should be based on TaskPackageTemplate.dtsx, when developing your packages I would implore you to observe these ‘Holy’ commands of SSIS package development:

Praying Hands tattoo#1 Packages should be as simple as possible
Each package should contain one data flow or one task at most. Do not boil the ocean in a single package, you would not write an entire program in a single C# class, so why would you write an entire ETL application in a single package ?. Writing monolithic modules in any language is the worse possible thing that can be done in order to undermine reusability.

#2 Avoid burying ETL logic in scripting tasks if at all possible
The whole point of integration services is to furnish an ETL toolbox such that you do not have to write swathes of code in a 3GL, yes there might be legitimate circumstances when you need to do this, but this should be exception rather than the rule.

#3 All packages should be instrumented
Integration services provides this out of the box for packages deployed to the catalog and the framework provides its own logging infrastructure on top of this. I cannot overstate this point as I hate seeing developers open packages on production servers for execution in debug mode in order to see what the packages are doing, and when I say I hate this, I hate this with an absolute passion.

#4 Use A ‘Pattern’ Based Approach To Development
This is a topic which Andy Leonard champions, essentially when you are developing anything (within reason) the chances are that someone else has already done this and that for standard (ish) problems there are solutions that have been widely used, proven to work and are documented. The generic ‘Term’ for such a solution is a ‘Pattern’. To illustrate this, here are two SSIS patterns:

The package template contains the following variables:

vars

When developing your packages 99.9% of the time they will effect a table in a database in terms of insert / updating / or deleting rows, your package should set these variables appropriately, the logging element of the framework will then look after the rest.

Once you have written all of your packages, the next step is creating the metadata to ‘Wire’ these together, this is the metadata schema:

1

4. The Project table is the first metadata table which should be populated. To illustrate using the framework I have created a project called “Sample”,  this is deployed to the “ETL Framework” folder:

1

INSERT INTO [dbo].[Project] (
         [Name]
        ,[Folder])
VALUES ( 'Example'
        ,'ETL Framework')

5. Create a row in the Package table for each application child package, in this very simple example, I’m going to create three child packages, packages: PackageA and PackageB will be executed in parallel and the execution of the third: PackageC will be contingent on PackageB having updated rows in the database:

INSERT INTO [dbo].[Package] (
        [Name]
       ,[Enabled]
       ,[ProjectId]
       ,[Use32BitRunTime]
       ,[CatalogLoggingLevel])
VALUES  ( 'PackageA.dtsx'
         ,1
         ,1
         ,0
         ,3 )
       ,( 'PackageB.dtsx'
         ,1
         ,1
         ,0
         ,3 )
       ,( 'PackageC.dtsx'
         ,1
         ,1
         ,0
         ,3 )

6. The next step is to create a ‘Job’ by inserting a row into the Job table. Assuming the contents of the Id column for the row inserted into the Project table from the last step is ‘1’, this is what the insert statement looks like for this:

INSERT INTO [dbo].[Job] (
         [Name]
        ,[Enabled])
VALUES ( 'Example Job',
        ,1 )

7. One of the most powerful features of the framework is the ability it provides to ‘Wire’ packages together via metadata and its this step in which this is done. The JobPackageSchedule table provides the means of wiring packages to jobs and allows packages to be reused across jobs. This is one of the reasons why I stress that packages should be as simple as possible in order to foster reusability. The JobPackageSchedule table enables upto eight packages to be executed in parallel via ‘Channel’, the columns suffixed by ‘1’ are for packages executed on channel 1, the columns suffixed by ‘2’ are for packages executed on channel 2 and so forth.


INSERT INTO [dbo].[JobPackageSchedule] (
         [JobId]
        ,[PackageId_Channel1]
        ,[PackageId_Channel2]
        ,[ExecutionOrder])
VALUES ( 1
        ,1
        ,2
        ,1)

INSERT INTO [dbo].[JobPackageSchedule] (
         [JobId]
        ,[PackageId_Channel1]
,[ExecutionOrder])
VALUES ( 1
        ,3
        ,2)

8. The final thing I am going to do and this is completely optional is create a precedence constraint between one of the packages that was executed in parallel and the final package to be executed, this is done via the PrecendenceConstraint table. I’m going to create a row in this table such that package 3 will only execute if package 2 has updated one or more rows:

INSERT INTO [dbo].[PrecedenceContraint]
       ( [PackageId]
        ,[LoadedData]
        ,[UpdatedData]
        ,[DeletedData]
        ,[RejectedData]
        ,[DependantPackageId])
VALUES ( 3
        ,0
        ,1
        ,0
        ,0
        ,2)

With the caveat that the initial release of the framework has not been exhaustively tested and as such it may contain bugs, which I accept no liability for nor make any commitment as to fixing them, readers of this post are welcome to download the framework from github. There are three github repositories:

Feedback on this is most welcome, enjoy !.

30 thoughts on “An Introduction To SSIS ETL Frameworks Part 2

  1. Thanks for this, looks great.

    Quick question about the ‘Wire’… could this be used in a way to group different sets of packages together into individual tracks and run through just them? for example say you have a group of fruit packages that are wired together, and a group of vegetable packages that are wired together. Each group would have its own precedents and constraints and would only run within its own wire… so apple would complete and run orange then pear etc. then in parallel you could have the vegetable wire running its own set of packages, so carrot completing then running potato etc. Or is that something that would need to be added to the framework?

    Thanks
    Jay

    • Hi Jay,
      Yes, the scenario you have described can be accomodated by the framework. Just to clarify one thing, is your requirement for multiple ‘Wires’ or threads of execution to run within the same invocation of the same job ?.

      Chris

      • Hi Chris

        We would have completely separate packages in each wire, called by separate sql jobs… The only commonality would be that they are running on the same ETL framework.
        Currently we have an in house job engine which groups jobs into tracks (or wires). Each track has a list of jobs which run according to a precedent constraints. Each job can only run once per track reset. We’re looking at replacing this using an ETL framework… You’re example looks like an eexcellen place to start.
        Cheers
        Jay

      • I think my framework would work for your problem as described, there is a package table and a job table, a ‘job’ being the highest level of abstraction and an intermediate table to join the two which effectively provides the wiring.

  2. Awesome !!
    I’m looking for such framework.
    “All packages in the application project should be based on TaskPackageTemplate.dtsx” I do not have knowledge to use your framework.
    I am brand new to the SSIS development. I want to use your framework. Can you please explain me the steps i needed to use your frame for my package development.

    Your help would be much appreciated.

    regards,
    SS

    • Thanks for reading my blog, all the instructions for using the framework should be in the post. However, before using it some familiarity with using the project deployment model and SSIS basics in general are required. If what you intend to do is simplistics then you may not need a framework, however if you have lots of projects all which require complex SSIS based ETL then a framework and this is an ongoing requirement, a framework is the way to go. The broad steps for using this are to deploy the framework project to a catalog, create the logging and meta data databases, create you own project (not you need to be working with SSIS in SQL 2012 and above), in you own project as mentioned use the framework template to base all your own packages on and then populate the metadata databases. If you are just cutting your teeth with SSIS, I recommend the stairway to SSIS on SQL Server Central highly.

  3. Hi there

    Thanks for this, it looks very useful.

    I am getting a logging error when trying to run a package based on the TaskPackageTemplate.dtsx.

    EXECUTE [Logging].[usp_LogPackageExecutionCompleti…” failed with the following error: “Parameter name is unrecognized.”

    I’ve deployed a new project to the same framework as the Main Project, and all the connections work. Also the rest of the package succeeds.

    I’m going to carry on troubleshooting but have posted this incase I’ve missed something obvious…

    Thanks

    • Hi James, As per the comment I’ve just made to Tony, the framework was not tested as thoroughly as I would have liked, I will try to investigate this issue in the next week or so.

  4. Hi there

    Quick question… are we required to manually set the variables in each of the child packages?

    When I try to run the job the errors are always related to missing parameters or insert issue in the logging tables… The variables in the child SSIS package don’t have any default values…

    Much appreciated!

    • Tony, Yes you do need to update the variables, I will have to give the framework a thorough testing, as per the disclaimer it was not tested as thoroughly as I would have wished due to other demands on my time.

  5. Hi Again

    Unfortunately have been unable to get the framework working… was unsure where to set the variables.

    Steps I followed…

    * Created new Project in same ETL Framework as Main.
    * Created a SSIS package based on the TaskPackageTemplate.dtsx
    * Created two jobs… Job B calls the new SSIS Package. Job A calls EtlFrameworkRuntime.dtsx, with the JobName parameter pointing to Job B.
    * Run Job B

    Look forward to you posting further.

    Thanks

  6. Hi Chris

    I checked out your framework and what it quiet simple but POWERFUL. I tested your framework within our company and I am happy to say it handled our complex ETL process quiet well.

    You did a fantastic job. Keep it up

    Regards

    Jude

  7. Hi Chris,

    Great piece of software and I managed to get it working with testing 3 packages. However, looking at the logging DB there are some tables that dont get populated. What did you have in mind for them?

    select * from [Logging].[ProcessedDataDestination]
    select * from [Logging].[ProcessedDataSource]
    select * from [Logging].[DataSource]
    select * from [Logging].[SourceType]

    Thanks
    Stephen

  8. Hi, Chris, it’s a great article.
    If suppose one of child package through error then how to recover that package.
    Is this also handled by the framework?

    • Hi Sunil,

      No, unfortunately not, you need to construct your packages such that when you re-run a SSIS job, it can recover from any prior failures.

      Chris

  9. In your example does ‘PackageC.dtsx’ wait for successful completion of ‘PackageA.dtsx’ and ‘PackageB.dtsx’.
    I can’t see how this is achieved, seems like the loop will just fire the packages off.

  10. Hi Chris,

    This article is very helpful, and the framework quite useful.

    After removing some hard-coded values, and fixing some parameter names, most of it works quite well.

    One thing I cannot get working, is the Inserted/Updated/Deleted/Rejected and the SqlStatement rows to get updated.
    I set the database name, schema name and the table name correctly.

    I cannot find the logic that updates those rows. Can you point me in the right direction?

    The Script task in the package template doesn’t seem to have any code in it, so I am guessing it is just a placeholder that can be replaced with a DFT?

    Thanks!

    • To be brutally honest its a long long time since I played around with the ETL framework. I cannot make any promises, but if I get the chance I will install it from scratch and try and work out what kinks need ironing out in it.

  11. Hi Chris, great work!
    Quick question: when a job completes, how do I go about removing it to avoid that the same will be run again the next time the app starts? It looks like there are no checks for a job to be completed? Will it be responsibility of whatever add it to the JobPackageSchedule table in the first instance to remove it after completion?
    Thanks

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s