Friday, November 15, 2013

Multi variate queries and you

Hi all,

A long time since my last post.I apologize, I have no explanation for the same. I was busy in other stuff. 

Today we will be discussing queries that involve multiple columns and their performance. Consider queries like:

SELECT * FROM table1 WHERE table1.a=1 AND b=1;

Let's get into a bit of details about what happens inside the database when such a query like this is executed.

In traditional relational databases, the query planner has no knowledge of any correlation between any two columns present in a join.If the columns are not correlated, the query performance can be good. However, if the columns are correlated, the performance of the query can be bad.

The planner keeps an arbitrary value as the selectivity estimate for the join in some cases. For these cases, the performance can be really bad.

The main problem is that the database planner does not take into consideration multi variable correlation. The query essentially queries per column and then unifies the results.

If we treat each query as a search keeping all the query column values in mind, this can lead to a single search throughout the search space (which can be larger than the traditional case,though).

This idea has to be explored, and I think I have some leads here. Sounds interesting? Think,think. Wait for the next blog post for the results :)

Until next time,

Peace 

Atri