- QueryGPT is a natural language data query tool based on
OpenAI GPT3.5
andLangchain
.- It is implemented using
OpenAI GPT3.5-turbo-0613
andLangchain Function Agent
, which is faster, more stable, and more accurate thanPandas Agent
andCSV Agent
.- It supports data output in
Markdown
format,JSON
format, and visualization withECharts
andMatplotlib
.
- Install
Python 3.10
andpip
, and create a virtual environment. git clone git@github.com:sdaaron/QueryGPT.git
cd QueryGPT/server
pip install -r requirements.txt
- Rename
.env.example
to.env
and modify the environment variables.
OPENAI_API_KEY='Your OpenAI Key'
- Start the API
Start the API with the default local data file:python main.py --host xx --port xxx
Start the API with a specific local CSV file:python main.py --host xx --port xxx --csv_path xxx.csv
Start the API with a specific local Excel file:python main.py --host xx --port xxx --excel_path xxx.xlsx
cd QueryGPT/client
pnpm install
npm run dev
- Make sure your
vite.config.js
has the sameproxy
configuration as your API service address.
-
Project Overview
- QueryGPT is a data query tool based on OpenAI. It uses FastAPI as the backend framework and Langchain to implement specific functions. The advantages of this tool include data privacy and validity, as all data operations are performed locally without passing through the interface. Additionally, the tool is extensible, and developers can add custom classes in the Langchain tool to implement specific query functions. Furthermore, the tool uses Embedding similarity search to achieve multi-file search, enabling quick identification of target columns and reducing the token usage.
-
Features and Characteristics
- The tool retrieves the required information by calling OpenAI's API, completes the data query by calling the interface, and preprocesses the information to retain only the relevant information, ensuring data privacy and validity.
- The Langchain tool implements agent functions, and developers can extend the tool's functionality by adding the required classes in the tool to achieve specific query functions.
- Embedding similarity search is used to quickly identify target columns and reduce the number of tokens. By converting the data into vector representations and calculating the similarity between vectors, target columns similar to the query conditions can be quickly found. Reducing the number of tokens improves search efficiency and performance.
-
Limitations
- The tool has relatively simple functionality and cannot flexibly implement complex query tasks. It may not meet users' requirements for advanced queries and complex data analysis tasks.
- Limited usability: The tool is currently only suitable for querying structured data (such as CSV files). It has low applicability to other types of data or different query scenarios and cannot achieve good migration and scalability.
-
Future Development To further enhance the functionality and applicability of the tool, the following improvements and developments can be considered:
- Support more query scenarios: The tool can be extended to support more query scenarios for structured data, such as database queries. This can increase the tool's applicability and meet a wider range of user needs.
- Implement a universal data repository: To support more query methods and data types, consider implementing a universal data repository. By building a data repository, users can flexibly perform data queries and analysis, and enjoy more features and capabilities.
-
Implement data query functionality and result output
- Implement custom Langchain Function Agent
- Implement Query Tool to achieve basic data retrieval requirements
- Implement Plot Tool to enable chart generation
- Implement various custom calculation tools
- Calculate average order value, percentage, growth rate
- Calculate month-over-month and year-over-year growth rate
- Support more custom calculations
- Integrate ydata-profiling for simple data description
- Embedding similarity search
- Support multiple output formats
- Output data in Markdown format
- Output data in JSON format
- Use Matplotlib to generate clustered bar charts, line charts, pie charts
- Output Echarts charts
- Output tables
- Implement custom Langchain Function Agent
-
Optimization of interactivity
- Improve UI design
- Optimize chart styles
- Streamed output
- Guide users to ask questions correctly and display related questions
- Display data retrieval process
-
Implement more features
- Enhanced data analysis capabilities: Perform simple analysis on output data
- Support user-uploadable tabular data
- Support database connection
- Support natural language operations on tabular data
- User login