I’m in Vancouver this week at PGConf.dev, a conference is aimed at people who are either hacking on Postgres source code, or building extensions, or growing the community. I don’t really fit into any of those categories, but I’m here because I wanted to observe how Postgres is built.
Here were the sessions I attended on day 1, and what I took away from each of ’em. The hyperlinks to each session title has the presentation material if the presenter uploaded it.
Welcome Session by Jonathan Katz and Melanie Plageman
The overall PGConf.dev conference was explained as:
- Presentations about things the attendees want to build in the next year
- Meeting other people who want to build the same thing
- On Friday, having in-person unconference sessions to help brainstorm & plan work for the next year
I love this concept, although lemme be clear that I’m not a developer, nor am I here to try to sway development in any way. I’m here out of professional curiosity, and I’m writing about my experiences here because I bet some of y’all are curious about how Postgres gets built too.
Intro to Hacking on Postgres by Tomas Vondra
This session explained the development process & workflows for Postgres. I love that this was arranged first in the agenda given that it’s necessary to onboard developers into the ecosystem.
Tomas explained that the ~30 committers (people who have permission to actually push code) come to PGConf.dev to meet hackers and network with them. Attendees shouldn’t be afraid to talk to a committer at the conference.
Annual releases come out in September, with a feature freeze in April. Along the way, Postgres runs commitfests: one month of work, one month off. You can see the current & future commitfests online. If you’re looking for a place to start, like a patch that would be useful for your workloads, search through the patches for the next commitfest, and help with reviewing those patches. Why? Because There’s an expectation that if you’re going to contribute patches, you should also be reviewing the patches of others.
After picking a patch to test, Tomas covered how to download the Postgres source code and compile it from scratch on your laptop with the right debugging configuration to make testing easier. He introduced the repo’s regression tests, isolation level tests, and the more advanced check-world multi-server tests (like replication testing.)
I only sat in the first 70 minutes of this session because after that, the next part was much more technically detailed, and it was going to soar over my head. This first hour was the perfect intro for me to remember how hard it is to do open source development. I switched over to the “regular” sessions, in which speakers talk about big-picture things that could be done in future versions of Postgres, and what work would be entailed.
Saving Space in Various Subsystems by Matthias van de Meent
Matthias’s focus is reducing storage required when working with millions of databases. (Makes sense, given that his employer is Neon, a company that hosts serverless Postgres databases and has a free 0.5GB database tier.)
Matthias started by discussing more appropriate sizes for system catalogs, then moved on to more interesting scenarios with user tables such as logical column order to keep bits together, compressing page-level data, and the fixed 24 byte overhead per WAL record.
The session covered one of the big problems with using relational databases to store JSON data. If you update a tiny part of the JSON, Postgres copies the whole new JSON (not the tiny changed part) to the write-ahead log, plus builds a new copy of the TOAST.
I liked the interactivity in the session: some of the attendees piped right up during the presentation, talking about which space savings would be easier or harder to achieve.
Adaptive Query Optimization by Alena Rybakina
This session was about how the AQO extension gives the query planner memory to remember how selective various query parameters were, and then reusing that memory to predict cardinality of similar future incoming queries.
It’s like Microsoft SQL Server’s Query Store in the sense that it stores all known queries, their hashes, query texts, their query plans, and the selectivity and number of rows returned from each node.
However, it’s even more ambitious than Microsoft since it feeds details back into the query planner to correct plan issues based on the actual number of rows that came out of plan nodes. (Later at the conference, I happened to overhear the aqo developers discussing with various people about how to identify similar queries in an effort to improve their estimates as well – that’s pretty ambitious.)
I loved the test methodology that involved very complex join conditions from this IMDB data set. Those queries look horrific.
The audience immediately asked the right question that caused the most problem with SQL Server’s implementation: how do you intelligently cap how much data is kept around for learning and planning? (Just for reference: SQL Server’s implementation uses space in the user database, whereas this project just uses RAM.)
How Postgres is Misused and Abused by Karen Jex
Karen started by explaining that she’s not a developer, but she was here talking at a Postgres hacker conference because she has so much real-world field experience. She works with end users doing oddball things, and she came here to explain why users choose to do those oddball things. Bonus points for using a lot of @in_otternews photos to illustrate points.
Some of her examples included companies insisting on running specific old major versions because they’re afraid of upgrade complexity, only doing pg_dump full “backups” once a day, using once-a-day storage snapshots as database backups, deleting WAL files to free up space, thinking they could do synchronous replication with no overhead, and much more.
Listening to this, I kept thinking, “Platform-as-a-service databases solve all of these.” However, a lot of companies aren’t moving to PaaS, especially companies that hire consulting DBAs like Karen, so to her, these seem like big problems. Database servers are just too hard for many (but not all) companies to run on their own.
Could developers fix these problems? Yes, but the database would have to:
- Monitor for these problems
- Surface the warnings to the end user (how does a database engine do that?)
- Explain to the user what they should do to fix the problem, or what other behaviors to do instead
I don’t think database engines can feasibly do the latter two – database front end tools do these kinds of things, whether they’re management tools or monitoring tools. However, I’m curious what effect this session might have on future PostgreSQL development. We’ll see if any unconference sessions pop up on this for Friday.
One of the attendees who’s a Postgres user (rather than a developer) asked a question that illustrated this well. He basically said, “There are an overwhelming number of tools and choices to fix some of these problems, and I don’t know which one to run with, so I just use the first one I find. I wish there was an officially blessed list of ways to solve these problems.” It reminded me of the classic XKCD on standards.
Postgres and Artificial Intelligence by Bruce Momjian
I did wanna see Collations from A to Z by Jeff Davis and Jeremy Schneider in this same time slot, but I couldn’t pass up the chance to see Bruce speak in person.
I thought that if this session was at a Postgres developer conference, it’d be targeted at Postgres developers. I thought it’d either be explaining realistic ways that deep learning could actually be accomplished inside the engine, and what it might be used for. (Especially since I’ve heard about people working on GPU usage inside Postgres in the past.)
This session was not that.
For example, the demo was a Perl function (ah yes, the language of AI?) to guess whether a number has a non-leading zero. The function was wildly, wildly inaccurate – the function was 22% confident that the number 100 had a non-leading zero, and 68% confident that the number 487234987 had a non-leading zero.
I didn’t get a lot of value out of this session, and I’m curious if any of the Postgres developers did. I grabbed a coffee after the session and then overheard a couple of attendees talking to Bruce later, and their questions (specific to Postgres extension & cloud vendor internal implementation of AI) were much more interesting and relevant. I wish those could have been a session.
Building Petabyte-Scale PostgreSQL Deployments by Christopher Travers
Adjust is an app measurement company that kept just 30 days worth of data in Elastic Search, and they were hitting 1PB. That’s 33 terabytes of inserts per day, or 385 megabytes per second on average! With that many inserts, even the slightest hiccup (like garbage collection or a bad user query) would make their entire Elastic Search cluster unusable.
Adjust decided to build a solution atop Postgres, and he discussed their design goals. I had to chuckle when he said, “We’re never going to back it up. Backups would take too long, and restores would take too long. Instead, we just do every write to 2 different databases on 2 different servers.” I love wild edge cases like this. Their solution eventually grew to 10PB.
Adjust didn’t open source their platform, but several employees who left Adjust decided to build their own open source system to solve similar problems. This session was about the ongoing work to build out this behemoth.
This session was amusing because of its edge case requirements. I was just a little let down because the abstract’s 3 goals included “Why and How We Patched PostgreSQL to support our endeavor,” and that wasn’t covered in this session. (The attendee questions kinda pointed at that too, asking Postgres-specific details since there weren’t any covered in the session.)
Keynote: When Hardware and Databases Collide by Margo Seltzer
I really wanted to stick around for this, but I was brain dead at this point of the day, so I went back to the hotel to drop my laptop off and get some food & drink. The sessions were recorded, so hopefully I’ll be able to catch the video.
Overall impressions from day 1
I’m really glad that I experienced this one time, although I don’t know that I personally need to come back next year because I’m not the target audience. It’s not like I need to be here every year to help myself plan my own work for the next year.
If my time & money were no object, though, I’d be here again next year just to satisfy my intellectual curiosity. It’s fun seeing sharp minds thinking about database problems.
You should consider attending PGConf.dev if you’re a reader who:
- Wants to get involved in submitting patches
- Works at a company with a large investment in Postgres, and things you want to see achieved
- Works at a software company that makes a living with Postgres







