In my first Postgres Surprise post, we talked about the fact that Postgres’s tables are heaps: unsorted copies of the data, as opposed to many other databases that use sorted (aka clustered) indexes that are continuously sorted in real time.
We said the heap approach makes sense for Postgres because it’s always used multi-version concurrency (MVCC, aka RCSI) from the start. Every time someone updates a tuple (aka row), Postgres adds another version of that tuple inside the same data pages intermingled with the rest of the live tuples.
Which leads to a question…
What cleans out the old versions?
It’s not like whenever we add a version, we can write an expiration date/time on there. We don’t know when another transaction is going to come along and update that same tuple. Heck, even when we write a new version of an existing tuple, we can’t even determine the expiration date of the old tuple – because our transaction might never complete, or it might be rolled back. The only way we know that a tuple’s version expired is when its transaction is over (and even that might not be correct!)
Enter the vacuum.
Exit the dog.
(Get it? Dogs are afraid of the vacuum cleaners. Well, I mean, that’s what I hear about most dogs, but mine actually likes to sit still and get vacuumed by hand. My dog doesn’t even get out of the Roomba’s way. Also, my first floor Roomba’s name is J. Lo, and I bet you can’t guess why. Give up? Here’s why. Also, my second floor Roomba’s name is Jennifer Garner, and I really bet you can’t guess why that is. I’ll let you guess in the comments.)
Postgres’s vacuum is the equivalent of database garbage collection, and one of its tasks is to mark the dead tuples’ space as available to reuse. After vacuum runs, other tuple versions can be written over the top of where the old versions were.
By default, the freshly vacuumed tables & indexes don’t get smaller. After all, we’re probably still going to have more inserts and updates on that object, so we might as well leave the space there already allocated to that table, available for reuse. If something went haywire on the table and you’ll never need that space again, you can do a vacuum full to shrink the object down to its minimal size, and return the rest of the space to the database.
You *DO* vacuum your house, right?
I’m sure you frequently vacuum your abode to cut down on allergens, lift up your carpet to extend its life (we all need to be lifted up from time to time), and to get rid of the evidence.
I don’t, and I’m being sarcastic about you, because there’s a reason us geeks all buy robotic vacuums.
Similarly, Postgres has an autovacuum feature that’s on by default. It tries to keep up with your tables, vacuuming out the dead versions, just like your Roomba tries to keep up with your Doritos habit. Years ago, when you were living alone and you went off to work every day, your Roomba had plenty of time to vacuum, and you weren’t cranking out that much trash.
However, over time, as you got married and had kids, and the kids started dumping their food all over the house, that one Roomba couldn’t keep up with the disaster known as your carpet.
It’s the same way with Postgres:
- The more tables you have, and
- The higher their rate of change, and
- The more concurrent transactions you have…
The more likely it is that you’re going to have to monitor autovacuum, and you’re going to need to tweak:
- autovacuum_max_workers
- autovacuum_naptime
- autovacuum_vacuum_threshold
And the more likely it is that you’re going to need to run your own vacuum commands based on your app’s workloads and your customers’ maintenance windows.
The classic sign that autovacuum isn’t keeping up is that your queries – especially big reporting queries that do scans – take longer and longer, beyond the relationship to table size growth. That’s because they’re having to read more data pages in the object – because there are too many dead versions in those tables.
That’s exactly why I wrote the free check_indexes script – because in my own environments, vacuum wasn’t keeping up, and I bet in some of your environments, you’re facing the same problems.


6 Comments. Leave new
Great article! If the vacuum process can keep up, then the table has bloated, much like the body when one eats too much and exercises too little. Then you would have to do a vacuum full, which locks the table. I have found a way not to lock the table. If you are on AWS RDS, you can install the extension, pg_repack, to get the space back. I can’t have a real large table locked for a long period of time.
As far as the second floor Roomba being called Jennifer Garner, is it because the link to J. Lo and Jennifer Garner is they were both married to Ben Affleck?
All you need now is a Scooba named Ben Affleck.
I know iRobot stopped calling their mopping robot a Scooba, but I’ll never let that name go!
You just about got it! My first one was J. Lo, and then I bought a Scooba and named it Ben Affleck, and then when we moved to a 2-story house, I got a 2nd Roomba who’s up on the 2nd floor alone (J. Lo and Ben are on the 1st floor), so I named the 2nd floor one Jennifer Garner. 😀
Now you just need to get a 3 floor house, and put one up in the attic and name it Marc Anthony. 🙂
Wouldn’t Marc Anthony go outside shirtless and beat the rugs on the clothesline like a real man?
HAHAHA