Data warehouses quite often need to query remote data over database link. These types of queries may involve significant volume of data. When a data warehouse is loaded incrementally, then usually a fraction of the remote table is needed. So, it is natural to add a WHERE clause to the query. However, what shall we do if with a query, for example like this one:
SELECT * FROM clients@otherdb a WHERE "ClientNo" = '5562541333';
is getting a very disappointing plan like below:
------------------------------------------------------------ | Id | Operation | Name | Rows | Inst |IN-OUT| ------------------------------------------------------------ | 0 | SELECT STATEMENT | | 20 | | | |* 1 | FILTER | | 20 | | | | 2 | REMOTE | CLIENTS | | OTHER~ | R->S | ------------------------------------------------------------
The whole remote table CLIENTS is pulled to the local database and only then the result is filtered. Of course, this is far from being optimal. There might be many reasons why Oracle optimizer decides not to send the WHERE clause over database link, especially when the remote database is not Oracle. One of the most common reasons is mismatch of data types.
In my example I have this WHERE clause:
"ClientNo" = '5562541333'
If Oracle has to match data types to execute the comparison, then the conversion has to be done on either of the sides. If the conversion occurs on the left side, i.e. CONVERSION_FUNCTION(“ClientNo”), then it will have to pull the whole table into the local database. We can easily prevent this from happening by explicitly converting the right side of the comparison to the data type of the left side, for example:
SELECT * FROM clients@otherdb a WHERE "ClientNo" = CAST('5562541333' AS NVARCHAR2(10));
Let’s see the new execution plan after applying this change
------------------------------------------------------------ | Id | Operation | Name | Rows | Inst |IN-OUT| ------------------------------------------------------------ | 0 | SELECT STATEMENT | | 20 | | | | 1 | REMOTE | CLIENTS | 20 | OTHER~ | R->S | ------------------------------------------------------------
This plan looks much better. The WHERE clause is sent to the database link and executed remotely. The new performance will be much better.
I played around using different ways of providing the filtering to the remote query run on MS SQL Server and found that the WHERE clause was not sent when I used:
- join – unfortunately DRIVING_SITE hint was constantly ignored
- subquery with IN operator, even if the PRECOMPUTE_SUBQUERY hint changed subquery result into literals
- PL/SQL function
and it worked fine when when I used:
- literal with explicit conversion
- IN operator with literals (explicitly converted)
- bind variable (explicitly converted)
I will be glad to hear your experiences about querying remote database over database link.