SSIS XML Task

The XML Task in SSIS allows you to parse through an XML file and read the nodes in the XML. In a previous blog I showed how to use a web service task to get the city and state when entering a zip code.

The results we got back from the web service were in XML format and we saved them into a variable named Results. This image shows the value in the variable

clip_image002

We need to get the city and the state out of this XML and save them each into a variable. You will need to create an XML task in the Control Flow of a package. Set the operation type to XPATH. The source is going to be the results variable. The destination will be the city variable. The second operand will be the node you want to be read, “//CITY” in this case. Last we set the XPATH Operation to values because you want the value of the city node.

clip_image004

The XML Task that retrieves the State value will be identical except for the Second Operand and the Destination variable.

clip_image006

After all of this is complete the package with the web service task will come before the XML Tasks. I placed a break point on the last XML Task and I am showing the results of the variables in the watch window. You can see that the results contain the XML, the City, and State variables contain the values from their respective nodes.

clip_image008

Posted in SSIS, Syndication | Tagged | Leave a comment

SSIS Web Service Task

The Web Service task in SSIS can be used to call a web service command to perform a needed operation in your package. The results of most web services will be in XML Format. You can save these results into a variable or into a file. Either way, you can then parse the XML to find the node values you need. This can be done with the XML task in SSIS. I have a blog on the XML task also.

The Web Service I am using is a free demo service. It allows you to enter a zip code and returns the city and state associated with that zip code. This first thing you need on the package is an HTTP connection. The Server URL for this connection is http://www.webservicex.net/uszip.asmx?op=GetInfoByZIP. The rest of the settings for the connection are default.

clip_image002

Then drag a Web Service Task into the control flow. Set the connection to the HTTP connection you just created. You will need the WSDL file for the web service. This can be downloaded from the website where the web service is hosted. This file will need to be saved locally. Set the WSDL File to the location where you saved the WSDL file in the Web Service task in SSIS.

clip_image004

Click on the input node on the left window pane and set the input properties as shown below. These are drop down menus that are populated automatically by the WSDL file. Create a package variable with a data type of string to hold the zip code. Map that in the fields below as shown.

clip_image006

Under the output node set the Output type to Variable and create a results variable with a datatype of string.

clip_image008

If you enter the zip code of 32065 you will receive back this xml list.

clip_image010

This shows us the proper city and state associated with the zip code we entered. Now we can parse through the XML with an XML task and use this data. I show you how to do this in the next blog here.

Posted in SSIS, Syndication | Tagged | Leave a comment

Using Variables and Parameters SSIS

When creating an SSIS package it is always a best practice to use variables, and parameters in 2012,  to avoid hard coding values into any part of your package. But there are some best practices involved with creating those variables/parameters also. The rest of this article will refer to variables. But each one could be a parameter except for the variables with expressions. This is because an expression will over write any value passed in.

Let’s look at a common situation in an SSIS package. A common need is to save a file with a date appended to it. This can be done with an expression and a variable. The first example is going to be using one variable without the best practices applied.

Of course we are going to need a file system task to save the file for us. One of the nice features of the file system task is the fact that the rename function can rename and move a file. You simply set the source and destination to a different location. You should never need two file system tasks back to back, with one doing the move and the other doing the rename.

The first thing we are going to do is create a file connection in the connection manager of a package. The file location will be c:\test\test.txt. We are going to back this file up in the folder c:\test\backup and rename the file to test_YYYYMMDD.txt. This will give us today’s date at the end of each file we back up.

clip_image002

This could be a flat file connection also.

The source in the File system task will be set to this connection. We could use a variable to pass this name in if we needed. In this example the destination is the important part. The destination will use a variable so we set the property IsDestinationPathVariable to True.

We will need a variable on the package also. I will create a variable called strFileName. It is always a best practice to create variables with the first few letters of the name describing the data type and camel case the rest of the variable name. Here is a list of some data types and the extensions to use.

str

String

dt

DateTime

int

Integer

bol

Boolean

obj

Object

chr

Char

Now we set the properties of the variable. Set the property EvaluateAsExpression to true. Then click on the expression ellipse and set the expression to:

“c:\\test\\backup\\test”+ (DT_WSTR, 10) (DT_DBDATE) GETDATE() +”.txt”

clip_image004

Notice the double slashes. These are necessary due to escape characters. For example “\n” is new line. Also notie the double quotes around the literals. Then we simple use three functions. Getdate gives us the date, DT_DBDate trims the time from the date, and DT_WSTR converts the date to a string with 10 characters.

clip_image006

This is an easy way to save a file with the date. But this is not using best practices in SSIS. What if the folder you want to back the file up into changes? What if the file extension changes? Using configuration files or tables you can update these values from outside the package. But if the variable contains an expression then configuration file value would be over written when the expression is evaluated. Therefore it is necessary and a best practice to use multiple variables.

Now I will show the same example using variable best practices in SSIS. We are going to keep the same connection in the connection manager and the source in the file system task will still be set to this connection. But instead of creating one variable for the backup file location we will create several variables. Here they are:

strBackupFolder – The backup location

strFileName – The name of the file without the folder or extension

strFileExtension – The extension of the file

strFullFileName – All the other variables combine in this one

strDate – The date represented as a string

strBackupFolder will be set to “c:\test\backup\”

strFileName will be set to “test”

strFileExtension will be set to “.txt”

strFullFileName will be set to an expression

strDate will be set to an expression

Now for the two expressions, the strDate will be just like the expression from the previous example. We use the same three functions to get a date string: (DT_WSTR, 10) (DT_DBDATE) GETDATE(). If you want the date to be in a different format then you will need to use the function Year, Day, and Month to control the format of the date.

The strFullFileName will simply combine all the other variables together to form a complete file name.

clip_image008

Now each of the variables that do not contain expressions can be updated from outside the package with configuration file or configuration tables and they still play nicely with the expressions in the other variables. In any packages you create try to ask the questions like “What if something changes?” This will save you tons of headaches in the future from maintenance. Always try to combine multiple variables with expressions, do not code everything into one variable.

Posted in SSIS, Syndication | Tagged | Leave a comment

Using Configuration Files in SSIS

Now in SQL 2012 we have parameters that make it easy, but configuration files are still an option and I still see a lot of my clients using them even on 2012 due to several reasons, but mostly because of the work to convert over.

SSIS packages are great ETL tools and can do just about anything you need in terms of ETL. Most organizations start out creating SSIS package one by one until they have dozens, hundreds, or even thousands of packages. I have worked with one client that ran over 4,000 packages. This can be a nightmare to maintain. You can save yourself a lot of work by deciding upfront how to configure your packages using configuration files or tables. We are going to discuss configuration files in this article.

We are going to look at a simple example of passing information to a package with a configuration file. Then we will go over using configuration files on multiple packages. Imagine running dozens of packages that point to a server and the server name changes. If you have a configuration file that is feeding this server name to every package you can make a single change to the configuration file and all the packages are updated. This can reduce your maintenance time significantly.

Here is a simple package example:

1. Drag in a script task into a blank SSIS package.

2. Create a string variable on the package named strData

3. Set the value of the variable to “Package”

4. Double click on the script task.

5. Add the strData variable to the read only variables.

6. Click Edit Script

7. Under the main function add the code MsgBox(Dts.Variables(“strData”).Value)

8. Click save and then close the window

9. Close the script editor by clicking ok

10. Run the package

clip_image002

When you run the package a popup box appears show the work Package. This is the value of the variable saved in the package. Now we will set up a configuration file on the package to give us the ability to change the value of the variable from outside the package.

1. Close the popup box and stop the package.

2. Right click in the control flow and select Package Configurations.

3. Place a check in Enable Package Configurations.

4. Click Add.

5. Click Next in the in the welcome screen if it appears.

6. Click on browse and select a location you have rights to write to.

7. Name the file FirstConfig

8. Click save and then click next

9. Click the plus next to variables >strData > Properties

10. Place a check next to value, notice the value on the right

11. Click Next > Finish > Close

clip_image004

We now have a configuration file on the package but the value is still the value from the package. Now we will open the configuration file and change the value. The configuration file is an XML file and I like to use XML notepad (Free from Microsoft) to open it. You will look for the configured value in this file. This configured value is the value passed to the package.

1. Open the folder containing the configuration file

2. Open the configuration file by right clicking and select open with

3. Select a program you can use to edit the file (Example: Notepad, Wordpad, XML notepad)

4. Check the configured value from Package to Config

5. Save and Close the File

6. Return to the package and run it

clip_image006

You should see a popup showing Config. This is the value from the configuration file. The value saved in the package is overwritten at run time.

clip_image008

This is just one small example of using configuration files. A popular way to use configuration files is on connections. When you have a connection on a package the properties of this connection show in the configuration manager. You can place a check next to the connection string property or you can place a check next to the individual elements that make up the connection string, initial catalog, server name, user name, and password. The user name and password are not needed when using windows authentication.

clip_image010

The password is not stored in the configuration file automatically even if you select it in the configuration manager. This is done by design for security. Microsoft did not want you saving your configuration file in plain text without knowing it. So you will have to open the configuration file and add the password. If you selected the connection string the password will go right after the user name. You must type in “Password =####;”, (#### Represents your password). Don’t forget the semicolon after the password. Now this configuration file can be used in any other package using this connection.

There is an issue when using a configuration file in multiple packages. The package that is using the configuration files will try to load every connection in the configuration file. It the package does not contain that connection it will fail validation and the package will not run. This causes an issue when trying to share a configuration file with many packages. There are three methods for handling these issues. You can create a configuration file for each package or create a configuration file for each connection or a combination of both.

The first method of a configuration file for each package works well if you do not have a lot of packages. If you have a thousand connections and fifty packages, a per-package solution is the obvious choice. If every package has a different set of connections this is almost necessary.

The second method of a configuration file for each connection works well if you have a lot of packages and fewer connections. If you have fifty connection and a thousand packages it will be much easier to maintain a per connection solution. In this situation a package with ten connections would have ten configuration files, each with one connection.

The third option is to combine the first two options in some form. For example, if you have one connection that is used by every package use this configuration file in every package. The other connections can have a package level configuration file. This is harder to maintain and you need document which packages are using which configuration files.

With all the options of configuration files it is important to plan out how you will use them in your environment before you create thousands of packages and create a maintenance nightmare. Planning your SSIS package configuration architecture is important and should not be over looked. It is easy to put it off when you only have a couple of packages. Most environments have their packages grow in number faster than anticipated. Planning your configuration files will save you a huge retro fit project in the future.

Posted in SSIS, Syndication | Tagged | 2 Comments

Two Digit Dates in SQL – MM/DD/YYYY

When creating dates or numbers as strings it is sometimes required to have two digit numbers.

For example, instead of this: 3/1/2011

You want to see this: 03/01/2011

There is an easy way to do this using the Right() function and adding a string zero to the front of a number, and then take the right two characters.

Here are some examples:

Let’s say the Date is December 6, 2010, so the day is a single digit “6″

Running this query:

1

Select Convert(varchar(10), Day(GETDATE())) as OneDigit

Results = “6″

Add the right function with a string zero looks like this query:

1

Select Right(’0′+Convert(varchar(10), Day(GETDATE())),2) as TwoDigit

Results = “06”

Another example without the date.

select COLUMN_NAME,

Right(’0′+ Convert(varchar(3), ROW_NUMBER() over(Order by COLUMN_NAME)),2) as ColNumber

from INFORMATION_SCHEMA.COLUMNS

where TABLE_NAME = ‘Product’

Order by COLUMN_NAME

Posted in Syndication, T-SQL | Tagged | Leave a comment

SSRS Top N and Bottom N reporting with Duplicates

Previously I wrote a blog on Top N and Bottom N reporting. There were a couple of gotcha I did not cover in that blog so I thought I would cover those here to answer some of the questions I have received. The other blog can be found here.

On a report you are trying to group the top N number of rows together and all others should be in a separate group. Let’s say you want the top 5 sales people based on amount.

In my other blog I showed how to use the Top N and Bottom N filters on groups. The major problem comes in when you have duplicate amounts at the cut off. For example if you want the top 5 sales employees to be in one group based on their sales amount, and three employees tie for the number 5 slot. This would cause more than 5 people to show in the top N group and those same people would show in the bottom N using the technique I showed in the previous blog.

So how do you fix this? You will need a different method to do this. You need to find the cut off amount and spilt the top from the bottom based on that amount.

Here is the table we will use on the report. Notice that positions 4, 5, and 6 have the same amount. If you set the filter on the report to top 5 you will get 6 rows in the top. The Bottom N method I showed in the previous blog will fail because the bottom group will not know about the extra rows in the top group. This will cause row 6 to show in both groups.

clip_image001

A better way to do this Top N and Bottom N grouping would be to find the cut off amount and filter the groups based on this.

Here is the main query pulling the data to show on the table.

SELECT RowNum, Name, Amount

FROM TopN

Now you need to create a new data set and the query will look like below

With O as(

Select top (@N) t.RowNum, t.Amount, t.Name

From TopN t

Order by Amount desc

)

Select Top 1 Amount

From O

Order by Amount

I named this dataset MinTopAmount. This query is using a CTE to get the top N amounts. N is a parameter on the report, you can hard code this if desired. Then the outside query gets the lowest amount from that query. This gives you the lowest amount to show in the top N group. This query will only return single row and column with the cut off amount.

Now you need to set up the filters on the groups to filter based on the amount instead of Top or Bottom. These are two adjacent groups on a single table.

clip_image002

The Sum of the Amount value form the new dataset, which is named MinTopAmount in this example, should not sum anything because the query only returns one row due to the top 1 in the query with the CTE. The bottom group will have the exact same filter except it will use less than instead of greater than or equal to.

With the N set to 5 it will divide based on the amount at the fifth spot. If there is a tie, it won’t matter.

In this image you can see the bottom group in blue. Notice even though we selected Top 5 we get 6 rows in the top due to the tie, everyone else is in the bottom group.

clip_image003

If you have any issues let me know.

Posted in SSRS, Syndication | Tagged | Leave a comment

Standard Deviations with CTE in SQL

I was working with a client recently getting the number of standard deviations a student was off the average number of suspensions. I accomplished this with two Common Table Expressions, (CTE).

The first CTE gets the standard deviation and the average for the entire database. The Database holds all of the schools in the district and this CTE gives use the overall average.

The second CTE gets the number of suspensions summed up and group by student.

In the final query we bring the two CTE’s together by selected each column and performing the calculation to determine the number of standard deviations the student is off from the average (Mean).

That calculation is: (Student Count – Overall Average)/Standard Deviation

There are other ways to do this same thing. But by breaking it up into two CTE’s it make the maintenance and testing very easy. You can run the CTE’s separately to make sure each is returning the correct data before combining them.

Here is the query:

With

School as( — This CTE gets the Standard Deviation and Average of the Districts ISS

Select convert(numeric(18,3),stdev(f.InSchoolSuspension)) StdISS, AVG(convert(numeric(18,3),f.InSchoolSuspension)) AvgISS

from FactSchoolPerson f

),

Student as( — This CTE gets the ISS summed up by student

Select f.DimDWPersonKey, Sum(f.InSchoolSuspension) ISS

from FactSchoolPerson f

Group by f.DimDWPersonKey

)–Combine the two CTEs in this query and calculate the number of StdDevs Off the Mean

Select st.DimDWPersonKey, st.ISS, s.AvgISS , s.StdISS ,

((st.ISS – s.AvgISS)/ s.StdISS) StdOffISS

from student st Cross Join

School s

Order by st.DimDWPersonKey

Posted in Syndication, T-SQL | Tagged | Leave a comment