Postgres Surprise #1: There’s No Clustered Index.

12 Comments

I’ve worked with Microsoft SQL Server since the late 1990s. In that world, almost every table has what’s called a clustered index, which we think of as the data itself. The table’s data is stored in 8KB pages, and by creating a clustered index, you define how the table itself is sorted.

For example, let’s take the users table from the Stack Overflow database. In SQL Server, the table’s creation statement looks like this:

The resulting data structure is a b-tree set of 8KB pages where the Users are sorted by Id. This has a few interesting advantages:

  • If you know a particular user’s Id, you can dive directly into that row, reading as few pages as possible. It’s like having an index on Id – but without the extra overhead of another index, or the overhead of reading extra pages to get columns beyond Id.
  • When you insert a row, it’ll have the highest Id, so you can dive into the last page and add it there.
  • Because inserts are always focused on the tail end of the table, it’s very likely that the tail end of the page is cached in memory, making inserts faster on

However, this also comes with some disadvantages:

  • If for some reason you need to insert a row somewhere other than the highest Id, there probably won’t be enough space on the earlier pages. Say our current top Id is 100,000,000, but we wanna insert a new row for Id 500, then that 8KB page is probably already full. SQL Server has to do a page split to add an additional page to the index, move some of the rows over to that new 8KB page, insert the new row, and update the b-tree pages. You’d think this would be a rare scenario, but…
  • This also happens when you need to update a variable-width column (or a nullable column) to have wider contents. Say we have a user whose DisplayName was just “Brent”, and they chose to update it to their full name. There might not be enough empty space on the 8KB page to accommodate that wider data, so SQL Server would have to allocate a new page and do a page split.
  • This also happens if we need to add another version of a row. This wasn’t a big deal in the early days of SQL Server because the database engine just didn’t do row versioning for multi-version concurrency control (MVCC.) Microsoft didn’t add that (under the name of Read Committed Snapshot Isolation, aka RCSI) until 2005, and even today, it isn’t the default for new database creations. Database developers have to enable MVCC for each database they create – and most don’t, but the ones who do, experience issues with page splits and fragmentation.

To deal with this, especially in the days of rusty spinning frisbees, Microsoft database administrators became intimately familiar with index rebuilds and fill factor. These days, the topic of defragmentation is rather hotly contested amongst DBAs.

Not all tables used clustered indexes. Typical exceptions included:

  • Staging tables – in data warehouses, it’s common to pull data from a source nightly, load that into a staging table, do some processing on it, load it into its final destination, and then delete/truncate it. In that case, a heap made sense because any query that worked on the staging table accessed all of the rows, not needing to dive into any particular row.
  • Data warehouse fact tables – Microsoft had a reference architecture called the Fast Track Data Warehouse Reference Architecture in which all fact tables were heaps, and query performance just came down to how fast you could scan the entire fact table. Those fact tables didn’t have any indexes on them at all because we could never predict what data warehouse users would search for, or what order they’d need the data in. Later on, as Microsoft’s columnstore implementation improved, data warehouses began to migrate over to clustered columnstore instead.

Meanwhile, in the Postgres world…

Postgres starts with a very different assumption: it uses multi-version concurrency control (MVCC, or what SQL Server folks call RCSI) by default, right from the get-go. When we update rows, Postgres automatically makes new versions of the rows, storing them inside the same table. There’s no need to enable it – it’s on for all databases, by default.

So because Postgres is constantly creating new versions of rows, it doesn’t really make sense to store the table itself in some kind of order.

Every table is a heap.

You can create indexes, primary keys, and foreign keys, but there’s no such thing as a clustered index. It’s just not an option in Postgres.

However, you can cluster a table.

But that statement is really deceiving. The cluster command’s syntax looks like this:

Where ix_users_id is an index you created on the users.id column. That will sort the Users heap rows ordered by the same order as the ix_users_id index. However, that’s only a one-time operation, done at the moment you execute that CLUSTER command, and the heap’s order is not maintained going forward.

You can rerun CLUSTER users later, without specifying the index name again, and it’ll automatically re-sort the data using the same index it used last time.

You can kinda think of that as index maintenance that a database administrator would do, but it’s not really the main index maintenance task we have to worry about in Postgres. No, Postgres has its own interesting index maintenance challenge, and we’ll leave that for the next Postgres Surprise post.

Previous Post
Announcing the Box of Tricks v0.1 with check_indexes and drop_indexes
Next Post
Get Postgres Training For Your Whole Company for $1995.

12 Comments. Leave new

Leave a Reply to PostgreSQL and (Lack of) Clustered Indexes – Curated SQL Cancel reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.
You need to agree with the terms to proceed