This python application uses Retrieval Augmented Generation(RAG) to ask the data questions directly using plain English. The application then uses OPENAI's GPT 3.5 model to convert this question i.e prompt into SQL, which then queries the SQL Alchemy database that stores the data and returns the solution, in addition to the SQL statement that generates this data.
The simplicity of testing the correctness of the answers makes this application a powerful, and useful use of Large Language Models(LLMs) in Data Science that can directly provide values to Business Users who are unfamiliar with SQL by allowing them to directly use Business Questions to answer Data Questions in seconds with a Gradio Application.
https://www.loom.com/share/f292263472ae4e9cbfa813655bc7c654?sid=c3a5bf89-f80f-4d69-bae0-79beee641cbe
git clone https://github.com/LNshuti/usgov-contracts-rag.git
conda env create --file=environment.yaml
conda activate gov-data
pip install -r requirements.txt
cd data
cp <your_data> .
CSV_FILE_PATH = 'data/your_data.csv'
DB_FILE_PATH = 'gov-contracts.db'
TABLE_NAME = 'your_table_name'
python connect_db.py
datasette serve gov-contracts.db
python run app/app.py
-
Harshad Suryawanshi. From Natural Language to SQL(Na2SQL): Extracting Insights from Databases using OPENAI GPT3.5 and LlamaIndex. https://github.com/AI-ANK/Na2SQL
-
Ravi Theja. Evaluate RAG with Llamaindex. https://cookbook.openai.com/examples/evaluation/evaluate_rag_with_llamaindex
-
Mostafa Ibrahim. A Gentle Introduction to Advanced RAG. https://wandb.ai/mostafaibrahim17/ml-articles/reports/A-Gentle-Introduction-to-Advanced-RAG--Vmlldzo2NjIyNTQw
-
Adam Obeng; J.C. Zhong; Charlie Gu. How we built Text-to-SQL at Pinterest. https://medium.com/pinterest-engineering/how-we-built-text-to-sql-at-pinterest-30bad30dabff