Research & Development projects
Nowadays, one of the biggest challenges enterprises have to face is to derive the greatest possible business benefits from increasing data. Database Management Systems (DBMS) play a crucial role in these areas allowing storing and retrieving user’s data and data security management. In this article, I will focus on presenting the basic capabilities of the SQL Server Integration Services (SSIS). I will show this tool is not only for database administrators and analytics, but also other IT professionals might find it helpful in their everyday work.
Microsoft SQL Server is a relational database management system (RDBMS) and one of the most market-leading technologies for managing multiple databases. It helps organize and manage huge business databases, supports Business Intelligence and data analysis application in the company IT environment.
Microsoft SQL Server is built on top of SQL (Structured Query Language) – a standardized programming language for database administrators and other IT professionals. SQL Server comes with its own implementation of the SQL language (Structured Query Language) – T-SQL (Transact-SQL) that allows to manage databases and query the data they contain.
SSIS (SQL Server Integration Services) is the graphical ETL tool from Microsoft that has been included in Microsoft SQL Server since version 2005.
We can write a lot about the capabilities of SSIS. This tool is not just a graphical “flip” of data or a T-SQL code. Using the SSIS tool can bring many benefits, among which we can mention:
What’s essential, SSIS eliminates the need of hardcore programmers, as it contains a GUI that helps users to transform data quickly rather than writing large programs.
Thanks to SSIS, we can connect to various data sources, not only MS SQL databases but also Oracle databases, ERP systems, or various types of flat files, and load data into different tables, saving time significantly. What’s more, we can control each flow, for example, by adding an email notification when the data is not loaded entirely or with problematic records, and it will not prevent further steps of the flow.
In this article, I will focus on presenting the beginning of the road with SSIS, i.e., downloading, validating, and loading data into database – a process called ETL (Extract-Transform-Load).
The installation of the tool is very simple, and the step-by-step instruction is available on the official Microsoft website. See here.
After completing the installation, we can see a clear view when opening the Visual Studio tool.
At first glance, our attention is drawn to Control Flow and Data Flow that are the most important functions at the beginning of our adventure with SSIS.
Now let’s try to load the data from the excel file into our newly created database. If we have many sheets in one excel file, we don’t have to build separate flows. We can use the Foreach Loop Container option and loop our loading cycle sheet by sheet.
For simplicity, let’s imagine the task: we have an excel file provided with the data format: Name, Surname, Birthday (Day-Month-Year). We have three columns, but the client expects this data in a Year / Month / Day format.
There are many ways to cope with it. Below you can see the most simple use of SSIS.
1. Create a table
CREATE TABLE [dbo].[test](
[name] [varchar](30) NULL,
[surname] [varchar](30) NULL,
[year] [int] NULL,
[month] [int] NULL,
[day] [int] NULL)
2. In Visual Studio set Data Flow
We start by indicating the source and destination where the raw data are Excel records, and the final table is our newly created dbo.test
It gives us an outline of the data import.
We can see information about errors or steps that we have skipped. Let’s check what these errors mean and fix them.
3. Use the option ‘Derived Column”
As we can see the mapping cannot be complete because we have different input and target columns. In this case, we can use the option – ‘Derived Column’.
We were able to reformat the data from 1 source column to 3 columns – but we still see an error message about a different data type for the Name / Last Name column.
Should we change the data type? No – if the target data type is [varchar], select Data Conversion.
After data conversion and code page change, the process is successfully completed.
This simple exercise demonstrates a small fraction of what you can do with SSIS packages. The possibilities are enormous when we delve into the types of transformation, aggregation, the possibility of using other coding languages starting with C # or PowerShell and ending with the capabilities of SCD (Slowly Changing Dimension) or Package Protection Level.
If you are interested in the topic of SSIS, I encourage you to delve into the available materials – it’s worth starting with e-book: Microsoft SQL Server 2012 Integration Services – ebook.