ETL job flows tuning

Job flows

Most of  data warehouses include some sort of ETL jobs to refresh its contents. Typically those jobs can be divided into two categories: jobs that need to be executed in a specific order and jobs that may run in parallel independently from each other. There are also other constraints involved that make the job flows more complex.

Developing a job flow is an iterative process. First we add just a few jobs to the flow. We run it, then we add a few more jobs, and so on. The job flow keeps growing. When all the jobs are ready, the flow usually is far from being optimal. The tuning is needed.  Unless you followed a good design from the very beginning, the flow will have to be restructured in order to optimize its execution time and load on the machine it runs at. Continue reading

Materialized views for multi-level aggregated data

Too many materialized views

Query rewrite is a very nice feature of Oracle Database EE. It was first introduced into Oracle8i. It allows significantly to speed up a lot of data aggregating queries. However the more queries we want to tune with query rewrite, the more of materialized views (MVs) we need. And usually we end up with much too many of them. Let me show you some example. I have a simple model with 2 dimensions D_Time and D_Product joined to a F_Sales fact table. This is a regular star schema data model.

multidimensional model
Continue reading