""

"Data Services"

Managing ETL dependencies with BusinessObjects Data Services (Part 1)

Are you satisfied with the way you currently manage the dependencies in your ETL? Dependencies between jobs (or parts of jobs) are an important aspect of the ETL management. It pertains to questions like: Do you want to execute job B if job A failed? Imagine that you have a job C with sub-job 1 (usual runtime: 3 hours) and sub-job 2 (usual runtime: 2 minutes). If sub-job 1 was successful and sub-job 2 failed, can you gracefully restart job C without the sub-job 1 being restarted again?

As soon as you have more than 1 simple job, you have to manage your dependencies. In this article (part 1 of a series of articles about ETL Dependencies Management) I’ll first list some of the characteristics I’m looking for in an ideal dependency management system. I will then have a look at some of the possibilities offered by SAP Data Services 4. In part 2 (my next post), I will propose the architecture of a possible dependency management system. In part 3, I will go into the details of the implementation in Data Services. I’ll finish with part 4 by telling you about how the implementation went, and if some improvements are possible.

The ideal dependency management system

In this post I will use the word “process” to design a series of ETL operations that have a meaning together. Example: extract a source table, create a dimension, or update a fact table. The objective here is to manage the dependencies between the processes: updating a fact table should probably only be allowed if updating the corresponding dimensions was successful.

A dependency management system should ideally have at least the following characteristics:

  • Run a process only if its prerequisites ran correctly
  • After a failure, offer the option to re-run all the processes or only the processes which failed
  • Trace the outcome of each process (ran successfully, failed, did not run)
  • Run dependent processes dynamically (rather than statically, i.e. based on date/time)

The possibilities

Let’s enumerate some of the possibilities offered by Data Services, with their respective pros and cons.

1) One job with all processes inside. This is very easy to implement, dynamic in terms of run times, but it doesn’t allow for concurrent runs. Most importantly, it means that failures have to be managed so that the failure of one process does not stop the whole job.

2) One process per job, with jobs scheduled at specific times. This is very easy to implement, allows concurrent runs, but is not dynamic enough. If the process durations increase with the months/years, jobs may overlap.

3) One main job calling other jobs (for example with execution commands or Web Services).

4) One process per job, all the jobs being scheduled at specific times, but checking in a control table if the pre-requisites ran fine. Otherwise they just sleep for some time before checking again.

5) Use the BOE Scheduler to manage jobs based on events (how-to is well described on the SCN). I’ve not tested it yet, but I like this approach.

By default, the first two possibilities only manage the “flow” side of the dependency management (after A, do B). But they do not manage the conditional side of the dependency management (do B only if A was successful). In both cases, a control table updated by SQL scripts would allow the ETL to check if the prerequisite processes have been run correctly.

What I don’t really like in the solutions 2 to 5 is the fact that it’s difficult to have an overview of what’s going on. You cannot really navigate within the whole ETL easily. The solution 1 gives you this overview, but at the cost of having a potentially huge job (without the possibility of processes running concurrently).

Also note that the solutions with multiple jobs will need to manage the initialization of the global variables.

What I miss in all these solutions is an optimal re-start of the ETL. If 10 of my 50 processes failed, and I want to restart these 10 only, do I really have to start them manually?

In my next blog post I’ll propose an architecture that addresses this optimal restart.

Until then, please let me know your thoughts about how you manage your ETL dependencies. Any of the 5 solutions mentioned before? A mix? Something else? And how well does it work for you.

Working with Data Services 4.0 repositories

On my previous blog,  Installing Data Services 4.0 in a distributed environment  I mentioned there was an important step to be carried out after installing data services in a distributed environment:  configure the repositories. As promised, I will now walk you through this process. Data Services 4.0 can now be managed using the BI Platform for security and administration. This puts all of the security for Data Services in one place, instead of being fragmented across the various repositories. It means that Data Services repositories are managed through the Central Management Console. With Data Services 4.0 you will be able to set rights on individual repositories just like you would with any other object in the SAP BusinessObjects platform. What is particularly interesting about managing Data Services security using CMC is that you can assign rights to the different Data services repositories.

Data Services Application in CMC
Data Services Application in CMC

Now you can log into Data Services Designer or Management Console with your SAP BusinessObjects user ID instead of needing to enter database credentials. Once you log into Designer, you are presented with a simple list of Data Services repositories to choose from.

I advice anyone who is beginning to experiment with Data Services to use the repositories properly from the start! At first it might seem really tough, but it will prove to be very useful. SAP BusinessObjects Data Services solutions are built over 3 different types of meta-data repositories called central, local and profile repository. In this article I am going to show you how to configure and use the central and the local repository.

The local repositories can be used by the individual ETL developers to store the meta-data pertaining to their ETL codes, the central repository is used to "check in" the individual work and maintain a single version of truth for the configuration items. This “check in” action allows you to have a version history from which you can recover older versions in case you need it.

Now lets see how to configure the local and central repository in order to finish the Data Services’ installation.

So, let’s start with the local repository. First of all, go to Start Menu and start Data Service Repository Manager Tool.

Choose “Local” in the repository type combo box. Then, enter the information to configure the meta-data of the Local Repository.

Configuring Local repository
Configuring Local repository

For the central repository chose central as repository type and then enter the information to configure the meta-data of the Central Repository. Check the check box “Enable Security”.

Configuring Central repository
Configuring Central repository

After filling in the information, press Get Version in order to know if there is connection with the data base. If there is connection established, press Create.

Once you have created the repositories you have to add them into the Data Service Management Console. To do that log in DS Management Console. You will see the screen below.

DS Management Console Error
DS Management Console Error

This error is telling you that you have to register the repositories. The next steps is to click on Administrator to register the repositories.

On the left pane, click on Management list item.  Then click on Repositories and you will see that there are no repositories registered on it. In order to register them, click the Add button to register a repository in management console.

Finally, write the repository information and click Test to check the connection. If the connection is successful, click apply and then you will be able to see the following image which contains the repositories that you have created. DO NOT forget to register the central repository.

DS Management console
DS Management console

After that, the next logical step is to try to access Data Services using one of the local repositories. Once inside the Designer, activate the central repository that you have created. Data services will display an error telling you that you do not have enough privileges to activate it.

To be able to active the central repository you have to assign the security. To do that go to Data Services Management Console. In the left pane go to “Central Repository” and click on “users and groups”.

Central repository - Users and groups
Central repository - Users and groups

Once you are inside Users and Groups, click “add” and create a new group. When the group is created select it and click on the “users” tab which is on the right of the group tab (see the image below).

Add and Create new group
Add and Create new group

Now you can add the names of the users that can activate and use the central repository in the designer tool. As you can see in the image below the only user that you added in the example is the “Administrator”.

Add user names
Add user names

Now the last task is to configure the Job Server to finish the configuration.

As always, go to Start Menu and start Server Manager Tool. Once you start the Server Manager the window below appears.

Server Manager tool
Server Manager tool

Press Configuration Editor a new window will be opened. Press Add and you will have a new job server.

Configuration editor
Configuration editor

Keep the default port and choose a name or keep the default one for the new Job server. Then press Add to associate the repositories with the job server created.

Once you press “Add” you will be able to add information in the right part of the window “Repository Information”. Add the information of the local repositories that you have created in the previous steps.

At the end you will see two local repositories associated.

Add Repository Information
Add Repository Information

With this step you have already finished the repository configuration and are now able to manage the security from the CMC and  use the regular BO users to log into the designer and add security to the Data Services repositories like you do with every BusinessObjects application.

If you have any questions or other tips, share it with us by leaving a comment below.

Problem Uninstalling Data Services

I have faced a problem uninstalling Data Services  recently and I wanted to share the resolution, just in case you find the same problem. I was trying to upgrade a Data Services machine following SAP procedure (this is copying the configuration files uninstall and then install the new version – not very sophisticated as you can see). This was not as simple as I first thought.

Problems started after uninstalling the software, the new version refused to install stating that I should first uninstall the previous version. I uninstalled the software again… but Data Services is still there, so uninstalled again, but this time the process failed (makes sense as the software is already uninstalled), so I kept trying… reboot…uninstall… reboot…rename older path name… reboot…you see where this is going…

 

So, how did I finally solve this?

  1. Start Registry Editor (type regedit in a command window or in the Execute dialog).
  2. Take a backup of the current Registry content. To do this, with the top node of the registry (Computer) selected go to File -> Export and select a name for the backup file.
  3. Delete the Key: HKEY_LOCAL_MACHINESOFTWAREBusiness ObjectsSuite 12.0EIM (Suite XX.X may vary).  NOTE: You may want to write down the key KEY_LOCAL_MACHINESOFTWAREBusiness ObjectsSuite 12.0EIMKeycode first as it contains the license code.
  4. To remove the entry for the software in the Uninstall Window’s dialog, go to HKEY_LOCAL_MACHINESOFTWAREMicrosoftWindows CurrentVersionUninstall and look for a KEY which property DisplayName is “BusinessObjects Data Services.
  5. Finally delete the content of the installation directory (typically: C:Program FilesBusiness ObjectsBusiness Objects Data Services)

Now you can launch the installer and it should work.

Hope this may help you if in case you are experiencing the same issue. If you have any doubts or if you ever faced the same issue, leave a comment below.