Using the SSIS Merge Join

The Merge Join Transform in SSIS is a great way to load a Data warehouse quickly and an easy way to join two data sources together. There are a few requirements to join these two data sources. The data sources must be sorted and there must be a key that you can join them with. This can all be done in the data flow of the SSIS package.
Here are some examples of a Merge Join with some different situations so you can see the outputs. The two images below show two tables. The first table contains the first names of people and the second table contains there last names. Each table contains an ID column that will be used in the join.
Table one:
Table two:
These are OLEDB sources. The queries in the sources are calling the data and sorting it with an order by clause. The source does not know the data is sorted so you have to manually tell the source it is sorted. You do this by right clicking on the source and selecting Show Advanced Editor.
Then click on the Input Output Properties Tab. Click on the OLE DB Source Output and Set the IsSorted Property to true.
Then click on the column that is the used to sort the data, in this case the ID column, then set the SortKeyPosition to 1. If you are sorting by more than one column, place a 2 on the next and so on.

Then you connect both sources to the Merge Join. When it asks which input you set the First name table as the left input. You can always swap the inputs later by clicking the swap input button in the Merge Join Transform.

In the Merge Join, Map the Id together and select the columns you want to pass through to the output. In this example the Join type is set to Left Outer Join. The Results will look like the below image.
Notice the Null in ID 5 under last name. There was no match for ID 5, but since it was set to Left Outer Join we still get to keep the ID 5. If you set the Merge Join to Inner Join the ID 5 row would be dropped.
One of the other scenarios you will encounter is when the right side of the join has more than one match for some of the ID columns. For example, let’s add another row to the Last name column.

This new table has an extra row with the ID 1 and the last name Davis 2. If you run this through the Merge Join with the Join set to Left Outer Join the results will match the image below.

If you change the Join to Inner Join the Results will be in the below image. Notice the ID 1 is still on the output twice and ID 5 is still in the output.

Now let’s add another row to the Last name table.

When you run this table through the Merge Join with the Join set to Full Outer Join the results will match the figure below.
Here you can see the Null Id. That is because we passed the ID through from the First name table. The Last name table ID was not passed through. If you want to ensure you get an Id then pass through each Id and use a derived column to check for nulls.
I hope this clarifies how to use the Merge Join Transform. This is a very powerful transform and can make joining data very fast.

This entry was posted in SSIS, Syndication and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s