Querying Our Read-Only Database

We run a free public read-only Postgres server with the Stack Overflow database:

Our Secret Labs
  • Server: query.smartpostgres.com
  • Username: readonly
  • Password: 511e0479-4d35-49ab-98b1-c3a9d69796f4

(In spring 2024, we’ll be setting up automated per-person user accounts, so expect this username/password to stop working at that point.)

Building Your Own Database

If you want to create indexes or test queries that change data, you’ll want to build your own Stack Overflow database. To do that, check out Francesco Tisiot’s tutorial on loading the Stack Overflow data. The exact queries to do it are at the bottom of his post.

That method requires downloading the original XML files. The ones we use on our read-only server are the big ones for StackOverflow.com itself:

Note that those file sizes are big, and when they’re unzipped with 7zip, they’re even bigger – hundreds of gigabytes. You don’t have to use the same Stack Overflow data that we use – you can pick the data for another smaller site. The smaller sites are distributed in a single 7z file for all of the files on the site. For example, here are some of ’em:

All of the sites use the exact same file names & formats – for example, anime.stackexchange.com and stackoverflow.com both have users.xml, posts.xml, comments.xml, etc.

Avoid the meta sites – that’s a different kind of data, discussion about the site itself, that tends to be extremely small.

Learning More About the Data

To learn more about the tables, columns, and their relationships to each other:

The stackoverflow database on that server is imported from the Stack Overflow data dump. This data, like Stack’s data dump, is provided by Stack Overflow under cc-by-sa 4.0 license. That means you are free to share this database and adapt it for any purpose, even commercially, but you must attribute it to Stack Overflow, the original authors (not Smart Postgres.)

For questions, check out the FAQ.