Archiving your input files in Data Services

Most of your ETL processes probably rely on databases. But sometimes you need to integrate flat files or Excel documents into your extraction. And you may have the obligation to keep these files stored in some place, for instance, due to company policies or audit. Unfortunately Data Services lacks of file explorer capabilities. You cannot directly rename, copy or delete a file on your local hard drive. So what options are there left for you in order to manage those files you want to archive?

Using dynamic file names

The easiest solution is to use dynamic file names – by this I mean using a source file whose name will vary over time. For example "2011-01.xls", "2011-02.xls", etc. In this specific case there is no need to follow any further steps in order for you to archive your files: just leave them where they are. The question remains: how do you manage a dynamic file name in Data Services? The answer is quite simple: use a variable for your file name (view the image below).

Data services 1
Data services 1

Remember that you have to both, declare the variable in the "Variables and Parameters" window (view image below) as well as initialize it in a script so that it matches your actual file name.

Data services 2
Data services 2

I’d like to share with you an interesting tip about using a variable name for a file name: if you want to create an input file in Data Services based on an actual file, first of all select the very document you want to archive and import its schema (columns names), after which replace the file name with your variable. Do remember that you can check beforehand whether or not a file already exists by using the "file_exists()" function in a Data Services script.

Using fixed file names

Then again, you may also find yourself in the situation where you cannot force the users (or the programs) to give the input files different names. So how do you handle a situation where every day/month/quarter and every single 'Data.xls' file is created within a directory?

Ideally, you would be driven to extract the data from the file, and from there rename the file with the current day/month/quarter, and possibly move it to an archiving folder. But as I have already mentioned, there is no built-in function in Data Services that allows you to rename a file. This is why we have to create a small executable script (a .bat batch file) that recalls to another script from within Data Services.

The Data Services script should look like this:

exec ('C:Source_file_directoryarchiving.bat', $G_new_file_name , 0);

The first parameter is a string path to the executable file, the second parameter is a string representing a list of parameters to be sent to the executable (here we use a variable in order to send the new file name), and the third parameter is a flag which determines how Data Services will act upon an error (you can find the different values in the Data Services Reference Guide according to your version).

What shall we put in the executable file? A simple line will do:

move "C:Source_file_directoryData.xls" "C:Archiving_directory%1.xls"

As you may have guessed already, the '%1' is a placeholder for the first parameter transmitted (here $G_new_file_name).

If you have authorization problems (typically because of a 50306 error), you may want to check the user of your DS service, and to whether it has rights to the accessed folders. You may also want to try the following trick: replace "C:" with \server_nameC$.

As we have seen here, managing archived input files is possible whatever your file name configuration (fixed or dynamic) is. If you have questions to this regard, or if you want to share your experience or tips, leave a comment!