Text2SQL, or Chat2SQL tools convert natural language or questions into SQL queries. Imagine having ChatGPT write beautiful, correct and useful SQL queries for you!
These tools started out to bridge the gap between non-tech users and databases, by allowing them to interact with databases using natural language and reduce the barrier to accessing and analyzing data. But with the advance of AI models, these tools now support more advanced features such as handling complex queries, joining multiple tables, or even supporting natural language conversations.
They can also help improve productivity by automating the process of generating SQL queries, thereby saving time and effort.
In this edition of Star History monthly, we have compiled a collection of open-source Text2SQL tools.
Chat2DB aims to be a general-purpose SQL client and reporting tool that incorporates AI capabilities from the start. It supports connection to a handful of databases including MySQL, Postgres, Oracle, SQL Server, SQLite, ClickHouse and more.
There was a bit of drama involving Chat2DB a while ago, we won't get into details here but I'm curious to know what you think.
SQL Chat is a chat-based SQL client, and you can use natural language to communicate with your database to implement operations, such as query, modification, addition, and deletion (!) of the database.
It currently supports MySQL, Postgres, SQL Server and TiDB serverless.
It's open-sourced by Bytebase, a database migration tool for teams.
Vanna is a Python framework that allows the training of an RAG model with queries, DDL, and documentation from a database.
You can use Vanna as is, or build your own custom UI with an existing tool (e.g. Streamlit, Slack).
It was open-sourced in July 2023 and got really popular this past January.
DuckDB-NSQL is a Text2SQL LLM built for local DuckDB SQL analytics tasks, by MontherDuck and Numbers Station. This can certainly help users leverage the full power of DuckDB and its analytic potential, without having to go back-and-forth between the DuckDB documentation and the SQL shell.
With Langchain, you can build a Q&A chain and agent over an SQL database yourself.
LangChain also has an SQL Agent that you can add to the chain. It can not only answer questions based on the databases’ schema and content, but also recover from errors by running a generated query, catching the traceback and regenerating it correctly.
Awesome Text2SQL is a suite of curated tutorials and resources for LLMs, Text2SQL, Text2DSL, Text2API, Text2Vis, and more. Most of the models are LLM+Text2SQL, and for each model, there are links for papers, code, dataset. If you want to dive deep into Text2SQL, take a look!
LLM or not, you should still be extra careful when executing model-generated SQL queries. Some ways to minimize risks includes describing your database schema, data; constraining the size of the output; validating and reviewing the generated SQL queries before executing them.
If you want more AI content, check out earlier editions of the Star History Open-source Monthly: