ETL job flows tuning

Job flows

Most of  data warehouses include some sort of ETL jobs to refresh its contents. Typically those jobs can be divided into two categories: jobs that need to be executed in a specific order and jobs that may run in parallel independently from each other. There are also other constraints involved that make the job flows more complex.

Developing a job flow is an iterative process. First we add just a few jobs to the flow. We run it, then we add a few more jobs, and so on. The job flow keeps growing. When all the jobs are ready, the flow usually is far from being optimal. The tuning is needed.  Unless you followed a good design from the very beginning, the flow will have to be restructured in order to optimize its execution time and load on the machine it runs at. Continue reading