Improving Cardinality Estimation: Answers & Discussion

Your challenge for this week was to improve estimation on a query. Stack Overflow has a dashboard that shows the top-ranking users in their most popular location. The Stack Overflow database even has an index to support it:

But the problem we were having was that the query took several seconds to run, and the query plan had tons of estimation problems. For example, when you hovered your mouse over that index scan operator, Postgres only guessed that 1,174 rows would come out from our most popular location – when in reality, 108,433 came out:

The Core of the Problem

The main problem is that when we run a statement (like select), Postgres:

  1. Designs the execution plan for the entire statement all at once, without executing anything, then
  2. Executes the entire statement all at once, with no plan changes allowed

Think of it as two different departments, neither of which are allowed to do the job of the other. The design department sketches out the query plan, but it has no idea of what this part of the query will produce:

Just for reference, the top location happens to be India.

The Core of the Solution

If the problem is that the database server:

  1. Designs the execution plan for the entire statement all at once, without executing anything, then
  2. Executes the entire statement all at once, with no plan changes allowed

Then the solution is to break the statement up into multiple statements. In pseudocode:

  1. Find the top location, then
  2. Find the users who live in that location, after we know what the location is

Because if I run the second query with India directly specified in the query:

Then the execution plan has much more accurate estimates:

Weirdly, the sort operation is now complaining about OVERestimation, hahaha:

In my example, I broke the second query up into a separate one with a hard-coded India value. That kind of thing is more easily done on the client side.

If you try to do it with a temporary function, like this:

Then we’re still facing the same problem of the single-statement optimization. The WITH query is optimized all at once, including the function’s contents. The explain plan has the same bad estimation on the number of people who live in the top location.

On the other hand, if you call the temporary function with the input value clearly defined:

Then the plan’s estimations are bang on, and fast:

Another approach – if you try to do it entirely in Postgres with a temp table, like this:

It doesn’t work because the explain plan shows the same wild under-estimation:

(sigh) Bummer!

Previous Post
Query Exercise: Improving Cardinality Estimation
Next Post
Smart Postgres News and Techniques 2024 Vol 5

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.
You need to agree with the terms to proceed