SQLAI.ai‘s tagline is that you can hook your database up to ChatGPT, and let it generate, explain and optimize queries.
To see how it works, I hooked it up to our read-only copy of the Stack Overflow database. That database is kind of a best-case-scenario for this kind of tool: it’s publicly accessible over the internet and there’s a freely available read-only login. The SQLAI.ai configuration process was delightfully simple for me, but of course in a real-world situation where you can’t connect it directly to your database, you’re going to have additional hoops to jump through. Set that thought aside for a second, and let’s run a few tests anyway.
(Update 2025-12-05: the SQLAI.ai author updated their app and posted new results on their blog.)
Test 1: Generating an Easy Query
After configuring SQLAI to generate PostgreSQL with ChatGPT 4 Turbo, I gave it this simple instruction:
Find the top-scoring posts from 2023
It generated this query:
|
1 2 3 4 |
SELECT id, title, score FROM posts WHERE creationdate >= '2023-01-01' AND creationdate < '2024-01-01' ORDER BY score DESC; |
The good news: it figured out that the table is called posts (not too hard), and that the creationdate column was the one for date filtering.
The bad news: there’s no limit on the number of posts it brings back, which would be over 50 million. Yowza. Let’s try refining our query:
Find the top 250 scoring posts from 2023, and show the users that created the post
The resulting query:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
SELECT p.id AS post_id, p.title, p.score, u.id AS user_id, u.displayname AS user_displayname FROM posts p JOIN users u ON p.owneruserid = u.id WHERE p.creationdate >= '2023-01-01' AND p.creationdate < '2024-01-01' ORDER BY p.score DESC LIMIT 250; |
Alright! It figured out that owneruserid was the right join column for users. Pretty good! Let’s move onto something harder.
Test 2: Writing a Harder Query
In a recent Query Exercise, I asked you to find posts with the wrong commentcount. Let’s see how SQLAI does:
Find posts whose commentcount does not match the number of comments that actually exist for that post
The result:
|
1 2 3 4 5 |
SELECT p.id AS post_id, p.commentcount AS reported_comment_count, COUNT(c.id) AS actual_comment_count FROM posts p LEFT JOIN comments c ON p.id = c.postid GROUP BY p.id HAVING p.commentcount != COUNT(c.id); |
That’s a pretty good start! It looks correct, but unfortunately it hits the same problem we discussed in the Query Exercise answers: it doesn’t handle situations where there are no rows in the comments table, but posts.commentcount is > 0.
To help fix that, SQLAI.ai has a “Refine” link right above each generated query:

Click the Refine link, and you can provide additional instructions, like this:

The newly generated query:
|
1 2 3 4 5 6 |
SELECT p.id AS post_id, p.commentcount AS reported_comment_count, COALESCE(COUNT(c.id), 0) AS actual_comment_count FROM posts p LEFT JOIN comments c ON p.id = c.postid GROUP BY p.id HAVING p.commentcount != COALESCE(COUNT(c.id), 0); |
That’s great in the sense that it works, but it’s bad in the sense that we had to know the query had a problem in order to give it that instruction. If someone’s using a tool like this, and they don’t have that level of database knowledge, then they’re just going to generate incorrect queries without realizing it. If the user does have the database knowledge to mentally test the query, then… they might not pay for a tool like this.
At any rate, let’s add one more refining instruction:
Show the variance between reported_comment_count and actual_comment_count, and only show the top 100 rows by that variance in descending order.
The status said “AI generation” for about 30 seconds, and then I got an error:

The browser console:

Hmm. Not thrilled with this one. Later, the site’s author reported that he resubmitted the request, and it generated this. He’s going to work on the timeout error handling.
Let’s try another test.
Test #3: Optimize a Query
In another recent Query Exercise, your mission was to find tagged questions faster. Here’s the query to tune:
|
1 2 3 4 5 |
SELECT * FROM posts WHERE tags LIKE '%<postgresql>%' ORDER BY score DESC LIMIT 100; |
Here was SQLAI’s response:
Performance suggestions:
- Indexing: Create an index on the
tagscolumn to speed up theLIKEquery. Since theLIKEpattern does not start with a wildcard, a B-tree index could be used effectively.
|
1 |
CREATE INDEX idx_posts_tags ON posts USING gin (tags gin_trgm_ops); |
- Partial Index: If the dataset is large and only a small subset of posts are tagged with ‘postgresql’, consider creating a partial index that only includes rows where the
tagscolumn contains ”.
|
1 2 |
CREATE INDEX idx_posts_tags_postgresql ON posts (score) WHERE tags LIKE '%<postgresql>%'; |
- Text Search: If the
tagscolumn often usesLIKEqueries for searching, consider using PostgreSQL’s full-text search capabilities, which can be more efficient thanLIKEwith wildcards. - Limit Columns: Instead of using
SELECT *, specify only the columns that are needed. This reduces the amount of data that needs to be fetched and sent over the network. - Caching: If the same query is run frequently with similar results, consider caching the results at the application level to avoid hitting the database each time.
- Query Analysis: Use
EXPLAIN ANALYZEto get detailed information about the query execution plan. This can help identify potential bottlenecks and areas for improvement. - Maintenance: Regularly run
VACUUMandANALYZEto maintain table statistics and optimize query planning. - Partitioning: If the
poststable is very large, consider partitioning it based on certain criteria (e.g., date ranges or post types) to improve query performance. - Hardware: Ensure that the database server has sufficient memory and CPU resources to handle the workload. Properly configure PostgreSQL memory settings like
work_memandshared_buffersto optimize performance.
Hmm, that’s… not good.
The first indexing suggestion is just flat out wrong: it says the like pattern doesn’t start with a wildcard, but it clearly does.
The Partial Index suggestion is great!
The Text Search recommendation doesn’t work well because of the kinds of tags that are in use. There are all kinds of wacko tags like c++builder-xe2 that need to be matched exactly.

The rest of the optimization suggestions are just generic stuff that could be copy/pasted into any optimization recommendations. Sure, caching, query analysis, maintenance, and hardware can help anything.
SQLAI.ai Review Summary: Not There Yet.
I really enjoyed using SQLAI.ai. The site is simple, clear, and easy to use.
The problem isn’t the site: it’s ChatGPT. The problem is that even ChatGPT 4 Turbo, even hooked up directly to your database, still isn’t savvy enough yet.
- Users WITHOUT database knowledge will get simple working queries out of it – only to be surprised later when the data isn’t quite right.
- Users WITH database knowledge will quickly hit its limitations.
The good news is that as ChatGPT matures, SQLAI.ai is well-poised to make the querying & tuning experience easier. Now, it’s just a waiting game for ChatGPT to continue to improve its database knowledge. As of early 2024, we’re just on the cusp of retrieval-augmented generation (RAG) tools to update LLMs with knowledge about additional topics – specifically, your database schema. I’m excited to see where that goes.
