SQLAI.ai Review: Using AI to Generate & Tune Queries

SQLAI.ai‘s tagline is that you can hook your database up to ChatGPT, and let it generate, explain and optimize queries.

To see how it works, I hooked it up to our read-only copy of the Stack Overflow database. That database is kind of a best-case-scenario for this kind of tool: it’s publicly accessible over the internet and there’s a freely available read-only login. The SQLAI.ai configuration process was delightfully simple for me, but of course in a real-world situation where you can’t connect it directly to your database, you’re going to have additional hoops to jump through. Set that thought aside for a second, and let’s run a few tests anyway.

(Update 2025-12-05: the SQLAI.ai author updated their app and posted new results on their blog.)

Test 1: Generating an Easy Query

After configuring SQLAI to generate PostgreSQL with ChatGPT 4 Turbo, I gave it this simple instruction:

Find the top-scoring posts from 2023

It generated this query:

The good news: it figured out that the table is called posts (not too hard), and that the creationdate column was the one for date filtering.

The bad news: there’s no limit on the number of posts it brings back, which would be over 50 million. Yowza. Let’s try refining our query:

Find the top 250 scoring posts from 2023, and show the users that created the post

The resulting query:

Alright! It figured out that owneruserid was the right join column for users. Pretty good! Let’s move onto something harder.

Test 2: Writing a Harder Query

In a recent Query Exercise, I asked you to find posts with the wrong commentcount. Let’s see how SQLAI does:

Find posts whose commentcount does not match the number of comments that actually exist for that post

The result:

That’s a pretty good start! It looks correct, but unfortunately it hits the same problem we discussed in the Query Exercise answers: it doesn’t handle situations where there are no rows in the comments table, but posts.commentcount is > 0.

To help fix that, SQLAI.ai has a “Refine” link right above each generated query:

Click the Refine link, and you can provide additional instructions, like this:

The newly generated query:

That’s great in the sense that it works, but it’s bad in the sense that we had to know the query had a problem in order to give it that instruction. If someone’s using a tool like this, and they don’t have that level of database knowledge, then they’re just going to generate incorrect queries without realizing it. If the user does have the database knowledge to mentally test the query, then… they might not pay for a tool like this.

At any rate, let’s add one more refining instruction:

Show the variance between reported_comment_count and actual_comment_count, and only show the top 100 rows by that variance in descending order.

The status said “AI generation” for about 30 seconds, and then I got an error:

The browser console:

Hmm. Not thrilled with this one. Later, the site’s author reported that he resubmitted the request, and it generated this. He’s going to work on the timeout error handling.

Let’s try another test.

Test #3: Optimize a Query

In another recent Query Exercise, your mission was to find tagged questions faster. Here’s the query to tune:

Here was SQLAI’s response:


Performance suggestions:

  • Indexing: Create an index on the tags column to speed up the LIKE query. Since the LIKE pattern does not start with a wildcard, a B-tree index could be used effectively.
  • Partial Index: If the dataset is large and only a small subset of posts are tagged with ‘postgresql’, consider creating a partial index that only includes rows where the tags column contains ”.
  • Text Search: If the tags column often uses LIKE queries for searching, consider using PostgreSQL’s full-text search capabilities, which can be more efficient than LIKE with wildcards.
  • Limit Columns: Instead of using SELECT *, specify only the columns that are needed. This reduces the amount of data that needs to be fetched and sent over the network.
  • Caching: If the same query is run frequently with similar results, consider caching the results at the application level to avoid hitting the database each time.
  • Query Analysis: Use EXPLAIN ANALYZE to get detailed information about the query execution plan. This can help identify potential bottlenecks and areas for improvement.
  • Maintenance: Regularly run VACUUM and ANALYZE to maintain table statistics and optimize query planning.
  • Partitioning: If the posts table is very large, consider partitioning it based on certain criteria (e.g., date ranges or post types) to improve query performance.
  • Hardware: Ensure that the database server has sufficient memory and CPU resources to handle the workload. Properly configure PostgreSQL memory settings like work_mem and shared_buffersto optimize performance.

Hmm, that’s… not good.

The first indexing suggestion is just flat out wrong: it says the like pattern doesn’t start with a wildcard, but it clearly does.

The Partial Index suggestion is great!

The Text Search recommendation doesn’t work well because of the kinds of tags that are in use. There are all kinds of wacko tags like c++builder-xe2 that need to be matched exactly.

The rest of the optimization suggestions are just generic stuff that could be copy/pasted into any optimization recommendations. Sure, caching, query analysis, maintenance, and hardware can help anything.

SQLAI.ai Review Summary: Not There Yet.

I really enjoyed using SQLAI.ai. The site is simple, clear, and easy to use.

The problem isn’t the site: it’s ChatGPT. The problem is that even ChatGPT 4 Turbo, even hooked up directly to your database, still isn’t savvy enough yet.

  • Users WITHOUT database knowledge will get simple working queries out of it – only to be surprised later when the data isn’t quite right.
  • Users WITH database knowledge will quickly hit its limitations.

The good news is that as ChatGPT matures, SQLAI.ai is well-poised to make the querying & tuning experience easier. Now, it’s just a waiting game for ChatGPT to continue to improve its database knowledge. As of early 2024, we’re just on the cusp of retrieval-augmented generation (RAG) tools to update LLMs with knowledge about additional topics – specifically, your database schema. I’m excited to see where that goes.

Previous Post
Query Exercise: Find Tagged Questions Faster.
Next Post
Finding Tagged Questions Faster: Answers & Discussion

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.
You need to agree with the terms to proceed