Query Exercise Answer: Here’s Why The Index Gets Smaller with Updates.

2 Comments

In last week’s Query Exercise, I asked you why updating a table reduced its index sizes. Our table had a date column called last_update_date, and when we set that date on about 25% of the rows, then vacuumed the table, we noticed that the size of the index on last_update_date got smaller.

The key (no pun intended) is in the values we used to originally populate the last_updated_date column:

Check out the contents of the last_updated_date column:

Sample recipes

They’re all relatively unique. Uniqueness isn’t guaranteed here, but as you can see, it’s pretty doggone unique. So, let’s think about the contents of an index on last_access_date – it would look like this:

The results visualize the contents of the index:

Index contents after initial load

The index will really have each last_updated_date along with its matching ctids. A true visualization would show that first last_updated_date (22.543), followed by its two ctids (10403,81 and 10403,82).

The last_updated_dates aren’t completely unique – there are some duplicates, but a lot of the values are distinct. We can count how many distinct last_updated_dates there are:

About 871k distinct last_updated_dates

Then, when our update runs, it sets about 25% of the rows to have the same last_updated_date:

Which means there are way less distinct last_updated_dates after the update:

Distinct last updated dates after the first update

When a single last_updated_date has a lot of matching rows (ctids), then Postgres is able to compress that index because it only has to store that last_updated_date once, followed by the list of ctids that match that last_updated_date.

To see an extreme version of that, let’s set all of the rows to have exactly the same last_updated_date, then vacuum the table and check the size of the index:

The results:

Index size when all rows have the same value

The index drops from 22MB, to 18MB, all the way down to 7MB when all the rows in the index have the same value.

The takeaway: the more similar a column’s content is, the smaller indexes will be on that column. The more unique a column’s values are, the larger its indexes will become.

I found this really interesting because Microsoft SQL Server does not share that same behavior! In SQL Server, a b-tree index’s size is the same whether all the values are unique or all identical.

Previous Post
Query Exercise Challenge: Why Is This Index Getting Smaller?
Next Post
How to Read the Buffers Numbers in Explain Plans

2 Comments. Leave new

  • Really interesting post. I appreciate these as I am new to PostgreSQL. I did want to give you a heads-up that the link to the blog post on SQL Server’s indexing behavior seems to be broken.

    Reply

Leave a Reply to Brent Ozar Cancel reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.
You need to agree with the terms to proceed