Recently, we upgraded Tableau from version 8.0 (32 Bits) to 8.1.7 (64 Bits) along with Automatic Login (Single Sign On) and PostgreSQL configuration (Access to the Tableau system Database). After the successful upgrade of this application and configuration of the named features, I would like to share my experience as they may be useful in case you face a similar situation.
Based on Clariba’s specific expertise and successful completion of all requirements of the SAP® Recognized Expertise Program, Clariba Consulting Middle East has been granted the SAP Recognized Expertise designation in SAP HANA® and Business Intelligence.
These days almost every company store data such as users, emails, roles, printers, etc. in what it is call a Directory Service, providing a centralized access to company-related data that can be use across all its systems. One of the most widely used solution is Active Directory from Microsoft, since its tightly integrated with Windows OS systems. The other one LDAP, which stands for Lightweight Directory Access Protocol, has a broad range of users too since the de facto standard in Linux-based systems.
Given the fierce competition and the ever-soaring demands of customers, small and medium-sized enterprises (SMEs) today need to invest in having BI capabilities to stay relevant and meet these demands. Yet, with scarce resources available, the alignment of business and IT is vital to ensure success, and new options for hosted BI solutions freeing up cash flow are attractive options for SMEs.
SAP BusinessObjects (BO) BI 4 Platform has come with many improvements that have made the system more complex. Rep-Up-GraphWe, as BI 4 administrators, are required to know how this application works in order to apply the proper optimization improvements to the platform, when it is required. In this blog article, we will cover the most important concepts that need to be considered as a first step to increase the performance of the BI platform. It is very important for a good designer to understand the conceptual tiers in the BI Platform: Application Tier, Intelligence tier and Processing tier. We will refer to each one of them and we will nail down some optimization tips.
Don’t you just hate it when you see a report with DIV/0 errors? Thankfully, Web Intelligence provides us with a function to handle these types of errors, allowing us to provide a more meaningful message.
Let's look at IsError:
First impression when using Microsoft SQL Server Integration Services (SSIS), the developer’s first impression is the tool provides powerful features in addition is simple to use. Although this is true, there are some common errors that can make us waste our time. The purpose of this blog article is to cover some of these very common issues and save some time when starting to develop a solution with Microsoft SSIS.
BI and data analytics are hitting the top trend and prediction lists in 2014 and beyond. We have highlighted our top 4 analytics trends from these lists.
A recent article in the Harvard Business Review by Thomas Davenport, professor and fellow at MIT, caught my attention. Davenport refers to the beginnings of a new era: “After Big Data” in his article Analytics 3.0. While organizations built around massive data volumes — such as Facebook, LinkedIn, eBay, Amazon and others — may be ready to move to the next wave of analytics, the full spectrum of big data capabilities is yet to be realized in the traditional organizations we work with on a daily basis.
Are you satisfied with the way you currently manage the dependencies in your ETL? Dependencies between jobs (or parts of jobs) are an important aspect of the ETL management. It pertains to questions like: Do you want to execute job B if job A failed? Imagine that you have a job C with sub-job 1 (usual runtime: 3 hours) and sub-job 2 (usual runtime: 2 minutes). If sub-job 1 was successful and sub-job 2 failed, can you gracefully restart job C without the sub-job 1 being restarted again?
As soon as you have more than 1 simple job, you have to manage your dependencies. In this article (part 1 of a series of articles about ETL Dependencies Management) I’ll first list some of the characteristics I’m looking for in an ideal dependency management system. I will then have a look at some of the possibilities offered by SAP Data Services 4. In part 2 (my next post), I will propose the architecture of a possible dependency management system. In part 3, I will go into the details of the implementation in Data Services. I’ll finish with part 4 by telling you about how the implementation went, and if some improvements are possible.
The ideal dependency management system
In this post I will use the word “process” to design a series of ETL operations that have a meaning together. Example: extract a source table, create a dimension, or update a fact table. The objective here is to manage the dependencies between the processes: updating a fact table should probably only be allowed if updating the corresponding dimensions was successful.
A dependency management system should ideally have at least the following characteristics:
- Run a process only if its prerequisites ran correctly
- After a failure, offer the option to re-run all the processes or only the processes which failed
- Trace the outcome of each process (ran successfully, failed, did not run)
- Run dependent processes dynamically (rather than statically, i.e. based on date/time)
Let’s enumerate some of the possibilities offered by Data Services, with their respective pros and cons.
1) One job with all processes inside. This is very easy to implement, dynamic in terms of run times, but it doesn’t allow for concurrent runs. Most importantly, it means that failures have to be managed so that the failure of one process does not stop the whole job.
2) One process per job, with jobs scheduled at specific times. This is very easy to implement, allows concurrent runs, but is not dynamic enough. If the process durations increase with the months/years, jobs may overlap.
3) One main job calling other jobs (for example with execution commands or Web Services).
4) One process per job, all the jobs being scheduled at specific times, but checking in a control table if the pre-requisites ran fine. Otherwise they just sleep for some time before checking again.
5) Use the BOE Scheduler to manage jobs based on events (how-to is well described on the SCN). I’ve not tested it yet, but I like this approach.
By default, the first two possibilities only manage the “flow” side of the dependency management (after A, do B). But they do not manage the conditional side of the dependency management (do B only if A was successful). In both cases, a control table updated by SQL scripts would allow the ETL to check if the prerequisite processes have been run correctly.
What I don’t really like in the solutions 2 to 5 is the fact that it’s difficult to have an overview of what’s going on. You cannot really navigate within the whole ETL easily. The solution 1 gives you this overview, but at the cost of having a potentially huge job (without the possibility of processes running concurrently).
Also note that the solutions with multiple jobs will need to manage the initialization of the global variables.
What I miss in all these solutions is an optimal re-start of the ETL. If 10 of my 50 processes failed, and I want to restart these 10 only, do I really have to start them manually?
In my next blog post I’ll propose an architecture that addresses this optimal restart.
Until then, please let me know your thoughts about how you manage your ETL dependencies. Any of the 5 solutions mentioned before? A mix? Something else? And how well does it work for you.