Data Profiling using SSIS





Applies to: SQL Server 2012 Standard Edition.





Data profiling is the process of examining the data to obtain statistics about it and use those statistics to better understand the data, to identify problems with the data, and to help properly design ETL processes.

To profile data using SSIS, please perform the following steps:



1. Create an Integration Services Project with SQL Server Data Tools.





2. Create a new ADO .NET Connection to the data source you would like to profile. Top create this connection you will need to provide the server name, the authentication type (Windows Authentication/SQL Authentication), and the database where the table(s) you will be profiling is(are) located.







































3. Double click the Data Profiling Task on the Common section of the SSIS Toolbox or drag it to Control Flow surface.



4. Configure the Data Profiling Task.

4.1 Specify the XML files where de results of the data profiling will be stored.



Let's edit the Data Profiling task and create a new file connection.

































4.2 Specify the profile type that will be performed and the table use as data source.



Now on the Profile Requests tab of the Data Profiling Task Editor, let's select the "Column Value Distribution Profile Request" from the Profile Type dropdown list. Take a look at all available profile types. For a detailed description of each profile type, please read the Data Profiling Task article on TechNet.













Next, let's provide the connection, table and column on that table that will be analyzed.

















5. On the Debug menu, click "Start Debugging" to start analyzing the data.













6. Finally, let's use Data Profile Viewer, to see the statistics returned by the profiling process.





Double click the Data Profiling Task.









Make a click on the Open Profile Viewer button.





















