When Join Order Matters
by Jacob Mastel
It's a popular belief that the order of a SQL query's join order doesn't matter so long as the joins are done as an inner join . In a simple world, this is true. Unfortunately, life is rarely so simple.
A common question among new SQL query users is "Does the order of my inner joins matter?". The answer to that question, like many things in life, tends to be more complicated than one might originally think.
First we should look at the mathematical point of view. The symmetric property tells us that if $a = b$, then $b = a$. In plain English, that means the results of our query won't change, regardless of how we input our join order. From a logical standpoint, the order of the join won't effect the results. Simple, right?
The question gets more complicated in how the query plays out in the RDBMS's optimizer. Most of my work is done with Oracle's RDBMS, so I will focus on their implementation and documentation. Regardless, the information should apply to MySQL, Microsoft's SQL Server, PostgreSQL and any other RDBMS you can think of.
The optimizer, in case you're unaware, is the part of the RDBMS that is responsible for interpreting your query and building an execution plan. Most of the optimizer's work is done through estimations. As a result, the plan given is what the RDBMS believes to be the best plan available. What the RDBMS believes and what is true however, aren't necessarily the same.
The plan generator explores various plans for a query block by trying out different access paths, join methods, and join orders. The number of possible plans for a query block is proportional to the number of join items in the FROM clause. This number rises exponentially with the number of join items.
Oracle Database Performance Tuning Guide 22.214.171.124 (Emphasis mine)
The second bolded section is the most important aspect in relation to this article. A short query joining two tables isn't likely to give the optimizer much trouble. Longer queries with more complex relationships, like joining complex views, can create more possible join orders than the optimizer has time to examine. If this happens, you might get stuck with a sub optimial execution plan.
Fortunately, the optimizer is good at taking hints from the query writer. It turns out that the order in which you write your joins acts as a kind of hint in of itself. If you specify a join order in a particular way, the optimizer will try that order first.
I wrote a query relatively recently where the performance was completely unusable. My mistake was that I placed one of my database's largest views, Address Current under the driving table on my query's join tree. Worse yet, the next table up the join tree was Person Detail. That's also one of the database's largest views.
As the image demonstrates, the Good Join contains all the same views as the Bad Join. The key difference is that my driving table, the CTE, is at the bottom of the good join. In the Bad Join the database pulls and joins all the records in Person Detail and Address Current and only starts to filter them later when the CTE is applied.
How Much Does it Matter?
Well, the Bad Join takes so long to run that it might as well not. The Good Join on the other hand, returns fairly quickly. Looking at the explain plan, the difference is pretty stark. The Bad Join has a cost of 13,500, but the Good Join has a cost of 9,727. That's an estimated 37% increase just based on the order I typed in the joins!
Finally, looking the autotrace we see that the performance of the two queries diverge even further. The cost returned from the autotrace lists the Bad Join at 28,429 and the query took 1701.556 seconds to return. The Good Join however returned in just 1.782 seconds with a cost of 18,178%. That's a 67% increase in cost and a 95,385% increase in execution time!
Logically, your join order may not matter, but if you want your query to return in a reasonable amount of time, you need to pay attention to how you're building your query. Make sure that your driving tables are at the bottom of your join tree, and focus on building the join tree taller as opposed to wider.