Using Materialized Views to Make Queries Run Faster

4 Comments

Say you’ve got a reporting query that takes a long time to run. I’ll use the Stack Overflow database as an example:

That query gets the top 100 users who have the most accepted answers. On my server, that takes about a minute to run – unacceptably long for a public-facing web site, for example.

Does that data need to be up to the second? Of course not. The leaderboard of the top 100 answerers isn’t going to change all that quickly. People who have successfully answered hundreds of questions aren’t going to suddenly disappear, nor is someone else suddenly going to rocket to the top.

For report queries like this, Postgres offers materialized views: a view that’s written to disk, and then updated only when you want to update it. Here’s how to create one:

(The unique index isn’t strictly required, but more on that later.)

We have to change our query to point to the new materialized view. (Just as a side note, this is one of the cool things you don’t have to do with Microsoft SQL Server Enterprise Edition. It automatically takes the original query, figures out that there’s an indexed view it can use, and automatically rewrites the query plan to use the materialized view.)

The query runs instantaneously, and the resulting explain plan shows that the query only hits the materialized view:

We can then update that materialized view on a scheduled basis, like overnight when workloads are lighter. If we have a unique index on the view, we can even update the materialized view’s contents while other people are running queries:

This is important because the more complex & resource-intensive our materialized views are, the more performance impact we’ll have on the server while we refresh the view. In our case, the query at the top of the post only took about 1 minute to run, but building or refreshing the materialized view – even with just 100 rows – took about 5 minutes to run. Even if no data has changed, the rebuild still takes time because Postgres doesn’t track which rows in the underlying tables changed. It rebuilds the view from scratch.

This particular indexed view is really small because it only has 100 rows. We can see the size with check_indexes:

It’s just 8KB for the materialized view and 16KB for the b-tree index. Of course, if we didn’t have the LIMIT 100 on building the view, all tens of millions of users would be in the view, and it would take longer to build and refresh.

It’s up to us as to when we want to refresh our materialized view – daily, weekly, hourly, etc – and it’s up to us to build the mechanism to refresh that view. We could do it via a cron job, or as part of a nightly set of commands called by our own code.

I wish I could get check_indexes to alert us when a materialized view is out of date, either by checking the number of modified rows in the underlying tables, or their quantities, or the date since the last refresh. Unfortunately, I’m not aware of a reliable way to do that, that doesn’t involve adding extensions to Postgres. If you’ve got ideas, I’d love to hear them in the comments.

Previous Post
Announcing Fundamentals of Vacuum Early Access
Next Post
Smart Postgres News and Techniques 2024 Vol 14

4 Comments. Leave new

  • I’d love to learn of a technique to combine materialised views with the latest data to get a combination of performance and newest data sets. I know timescale is able to do this via a mix of partitioning and using dates to update an older result set, but the details are fuzzy to me.

    Another thing that really bothers me about materialised views is that you can’t selectively update it (for example, if a user logs in, I only meed to update the last_login field on that specific user). You can always create a completely separate table, but then it quickly becomes an unmaintainable code spaghetti with all the triggers.

    Reply
    • About the first question – from a high level, you would need to combine 3 things:

      1. A materialized view with all data with SalesDate < 2024-11-01.
      2. A view of the underlying tables, not materialized, with a where clause filtering it to all data >= 2024-11-01.
      3. A view doing a union all across both of those, showing all data from both.

      Then have your query hit that outer view, and it won’t have to be modified each time you update views #1 and #2 with new dates.

      Reply
      • …sometimes, the answers to things are right in front of you, staring you in the face

        Makes sense, cheers!

        Reply
  • […] Brent Ozar builds a view but a special one: […]

    Reply

Leave a Reply to Gabriel Cancel reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.
You need to agree with the terms to proceed