Materialized views for multi-level aggregated data

Too many materialized views

Query rewrite is a very nice feature of Oracle Database EE. It was first introduced into Oracle8i. It allows significantly to speed up a lot of data aggregating queries. However the more queries we want to tune with query rewrite, the more of materialized views (MVs) we need. And usually we end up with much too many of them. Let me show you some example. I have a simple model with 2 dimensions D_Time and D_Product joined to a F_Sales fact table. This is a regular star schema data model.

My multidimensional model is based on the following 3 relational tables: Sales, Time and Product.

Each dimension has 3 levels. In the standard scenario, if I would like to keep one MV for each possible dimensions combination, then we would have to create 64 MVs in order to avoid any on-the-fly aggregation. Of course, we rarely create all MVs for each level combination. We apply rather some skip-level approach or choose only the most often used level combinations. Anyway, still we need to create many MVs.

One materialized view for all queries

There is a way that allows to create one MV for many level combinations of your choice. Of course, the data has to be hierarchical. With the star schema data model we nearly always have hierarchical data. First, I prepare an aggregating query that will produce level combinations I need. I use CUBE or ROLLUP extensions OF GROUP BY clause. They will produce level combinations I need (see Oracle documentation for details). The use of GROUPING function will allow to distinguish rows of different groupings (level combinations).

This could be a good query example using the CUBE extension:

SELECT
  PRODUCT_ID,
  PRODUCT_SUBCATEGORY_ID,
  PRODUCT_CATEGORY_ID,
  DAY_ID,
  MONTH_ID,
  YEAR_ID,
  GROUPING_ID (YEAR_ID,
               MONTH_ID, DAY_ID,
               PRODUCT_CATEGORY_ID,
               PRODUCT_SUBCATEGORY_ID,
               PRODUCT_ID) GID,
  SUM(SALES_AMOUNT) SALES_AMOUNT,
  SUM(SALES_QUANTITY) SALES_QUANTITY
FROM F_SALES s
  JOIN D_PRODUCT p ON (s.PRODUCT_PRODUCT_ID=p.PRODUCT_ID)
  JOIN D_TIME t ON (s.TIME_DAY_ID=t.DAY_ID)
GROUP BY CUBE (YEAR_ID,
               MONTH_ID, DAY_ID,
               PRODUCT_CATEGORY_ID,
               PRODUCT_SUBCATEGORY_ID,
               PRODUCT_ID);

I will use it in my scenario. The query will produce the whole cube and cover all 64 possible level combinations. In real life we need usually much fewer combinations. Once I have the query, I can create a MV on top of it:

CREATE MATERIALIZED VIEW f_sales_cube_mv
PARTITION BY RANGE(GID) INTERVAL (1)
  (PARTITION p0 VALUES LESS THAN (1))
ENABLE QUERY REWRITE
AS
  ... here goes the above query

As you must have noticed, I used interval partitioning so Oracle is creating automatically partitions for me. The partitioning key is based on the grouping ID, so I will have one partition for each grouping. This will prevent database from full table scans when searching for rows corresponding to the specific grouping. It will be just single partition quick access. I would like to underline that including this GROUPING_ID function in the query is very important for query re-write mechanisms. If it is not added to the MV then it will lose its re-writing capabilities.

Materialized view in action

The MV has to be tested to confirm that the query rewrite is occurring. I created the MV for all possible groupings so I do not expect that my query would do any aggregation on-the-fly. It should return data directly from the MV. The explain plan will tell me that. For the quick testing I will choose 2 sample queries and check if they are properly re-written by Oracle optimiser to use my MV and additionally I will check if the single partition is used and no aggregation on-the-fly is applied. This will be my first test query:

SELECT
    PRODUCT_SUBCATEGORY_ID,
    YEAR_ID,
    SUM(SALES_AMOUNT) SALES_AMOUNT,
    SUM(SALES_QUANTITY) SALES_QUANTITY
  FROM F_SALES s
    JOIN D_PRODUCT p ON (s.PRODUCT_PRODUCT_ID=p.PRODUCT_ID)
    JOIN D_TIME t ON (s.TIME_DAY_ID=t.DAY_ID)
  GROUP BY YEAR_ID,PRODUCT_SUBCATEGORY_ID;

The explain plan for that query is:

SELECT *
  FROM TABLE(DBMS_XPLAN.DISPLAY(NULL, NULL, 'basic +predicate +partition'));

-------------------------------------------------------------------------
| Id  | Operation                     | Name            | Pstart| Pstop |
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                 |       |       |
|   1 |  PARTITION RANGE SINGLE       |                 |    30 |    30 |
|*  2 |   MAT_VIEW REWRITE ACCESS FULL| F_SALES_CUBE_MV |    30 |    30 |
-------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - filter("F_SALES_CUBE_MV"."GID"=29)

So far, so good. The plan shows single partition access to the MV and no aggregation on-the-fly is applied. Please note the predicate at the bottom. Oracle optimiser added the filter GID = 29 to choose only appropriate data grouping.

My second test query is as following:

SELECT
    PRODUCT_CATEGORY_ID,
    DAY_ID,
    SUM(SALES_AMOUNT) SALES_AMOUNT,
    SUM(SALES_QUANTITY) SALES_QUANTITY
  FROM F_SALES s
    JOIN D_PRODUCT p ON (s.PRODUCT_PRODUCT_ID=p.PRODUCT_ID)
    JOIN D_TIME t ON (s.TIME_DAY_ID=t.DAY_ID)
  GROUP BY DAY_ID,PRODUCT_CATEGORY_ID;

and its explain plan:

-------------------------------------------------------------------------
| Id  | Operation                     | Name            | Pstart| Pstop |
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                 |       |       |
|   1 |  PARTITION RANGE SINGLE       |                 |    52 |    52 |
|*  2 |   MAT_VIEW REWRITE ACCESS FULL| F_SALES_CUBE_MV |    52 |    52 |
-------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - filter("F_SALES_CUBE_MV"."GID"=51)

In the both test cases the queries were re-written to MV. The queries used single partition access and no additional aggregation was applied before returning rows.

Data Warehousing Snippets

Just another Oracle technology blog

Materialized views for multi-level aggregated data

Too many materialized views

One materialized view for all queries

Materialized view in action

Leave a Reply Cancel reply