Skip to content

behroozmirzaie/behviz

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

7 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Beh Viz - Natural Language to SQL Visualization

An intelligent AI-powered application that converts natural language questions into SQL queries and automatically generates interactive visualizations from the results.

๐ŸŽฏ Project Overview

Beh Viz is a sophisticated data analytics tool that bridges the gap between business users and database analytics. Users can ask questions in plain English, and the system automatically:

  1. Translates natural language to SQL queries
  2. Validates queries against the database schema
  3. Executes queries safely on PostgreSQL
  4. Generates interactive visualizations using Plotly
  5. Self-corrects common issues automatically

๐Ÿค– Understanding AI Agents

What's an AI Agent?

Think of an AI Agent as a smart digital assistant that can understand what you want and automatically figure out how to do it. Unlike regular software that needs specific instructions, our AI Agent can:

  • ๐Ÿง  Think: Understands your questions in plain English
  • ๐Ÿ” Explore: Investigates your database to understand its structure
  • ๐Ÿ› ๏ธ Execute: Writes and runs SQL queries automatically
  • ๐Ÿ“Š Create: Builds beautiful charts from the results
  • ๐Ÿ”ง Fix: Corrects mistakes on its own

Simple Example: From Question to Answer

You ask: "Which products sell the most?"

Behind the scenes, our AI Agent:

๐Ÿง  Thinks: "User wants top-selling products"
๐Ÿ” Explores: "Found 'products' and 'sales' tables"  
๐Ÿ› ๏ธ Writes: "SELECT product_name, SUM(quantity) FROM..."
โœ… Validates: "Query looks correct, tables exist"
๐Ÿš€ Executes: Runs the query on your database
๐Ÿ“Š Visualizes: Creates a bar chart automatically

You get: Interactive chart showing top products in 15 seconds! โšก

Why This Matters

  • No SQL Knowledge Needed: Just ask questions naturally
  • No Manual Work: Everything happens automatically
  • Always Accurate: Validates against your actual database
  • Self-Healing: Fixes common errors without bothering you

This turns complex data analysis into simple conversations! ๐Ÿ’ฌโžก๏ธ๐Ÿ“Š

๐Ÿš€ Features

  • Natural Language Processing: Ask questions like "Show me the number of users per email domain"
  • Intelligent SQL Generation: Powered by Ollama's Llama 3.2 model
  • Schema Validation: Ensures generated queries are valid against your database structure
  • Auto-correction: Handles dialect differences and column name mismatches
  • Interactive Visualizations: Automatic chart generation (bar, line, scatter, pie charts)
  • Database Schema Explorer: View your database structure within the app
  • Robust Error Handling: Graceful handling of database connection issues

๐Ÿ—๏ธ Architecture

System Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   Streamlit     โ”‚    โ”‚    Ollama       โ”‚    โ”‚   PostgreSQL    โ”‚
โ”‚   Frontend      โ”‚โ—„โ”€โ”€โ–บโ”‚   LLM Server    โ”‚    โ”‚   Database      โ”‚
โ”‚                 โ”‚    โ”‚  (Llama 3.2)    โ”‚    โ”‚                 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚                       โ”‚                       โ”‚
         โ”‚                       โ”‚                       โ”‚
         โ–ผ                       โ–ผ                       โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚     Docker      โ”‚    โ”‚    SQLAlchemy   โ”‚    โ”‚     Plotly      โ”‚
โ”‚   Container     โ”‚    โ”‚   ORM Layer     โ”‚    โ”‚  Visualization  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

AI Agent Architecture

The core AI agent follows a sophisticated multi-stage pipeline:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                           AI Agent Pipeline                             โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                         โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                โ”‚
โ”‚  โ”‚   Input     โ”‚    โ”‚   Schema    โ”‚    โ”‚    LLM        โ”‚                โ”‚
โ”‚  โ”‚ Processing  โ”‚โ”€โ”€โ”€โ–บโ”‚  Analysis   โ”‚โ”€โ”€โ”€โ–บโ”‚ SQL Generationโ”‚                โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                โ”‚
โ”‚         โ”‚                   โ”‚                   โ”‚                       โ”‚
โ”‚         โ–ผ                   โ–ผ                   โ–ผ                       โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                  โ”‚
โ”‚  โ”‚ Natural     โ”‚    โ”‚ Database    โ”‚    โ”‚ Structured  โ”‚                  โ”‚
โ”‚  โ”‚ Language    โ”‚    โ”‚ Schema      โ”‚    โ”‚ JSON        โ”‚                  โ”‚
โ”‚  โ”‚ Question    โ”‚    โ”‚ Context     โ”‚    โ”‚ Response    โ”‚                  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                  โ”‚
โ”‚                                                โ”‚                        โ”‚
โ”‚                                                โ–ผ                        โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”              โ”‚
โ”‚  โ”‚ Visualization โ”‚โ—„โ”€โ”€โ”€โ”‚   Query     โ”‚โ—„โ”€โ”€โ”€โ”‚    SQL        โ”‚              โ”‚
โ”‚  โ”‚ Generation    โ”‚    โ”‚ Execution   โ”‚    โ”‚ Validation.   โ”‚              โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜              โ”‚
โ”‚         โ”‚                   โ”‚                   โ”‚                       โ”‚
โ”‚         โ–ผ                   โ–ผ                   โ–ผ                       โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                  โ”‚
โ”‚  โ”‚ Interactive โ”‚    โ”‚ Pandas      โ”‚    โ”‚ SQLGlot     โ”‚                  โ”‚
โ”‚  โ”‚ Charts      โ”‚    โ”‚ DataFrame   โ”‚    โ”‚ Parser      โ”‚                  โ”‚
โ”‚  โ”‚ (Plotly)    โ”‚    โ”‚ Results     โ”‚    โ”‚ Validator   โ”‚                  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                  โ”‚
โ”‚                                                                         โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

AI Agent Components

1. Natural Language Processor

  • Input: User questions in plain English
  • Processing: Context-aware prompt engineering
  • Output: Structured prompts for the LLM

2. Schema Intelligence Module

  • Function: Dynamically inspects database schema
  • Capabilities:
    • Table and column discovery
    • Relationship inference
    • Constraint validation
  • Caching: Schema information cached for performance

3. LLM Query Generator (Ollama Llama 3.2)

  • Model: Llama 3.2 running locally via Ollama
  • Configuration:
    • Temperature: 0 (deterministic output)
    • Format: JSON (structured responses)
  • Output: JSON containing SQL query and visualization metadata

4. SQL Validation Engine (SQLGlot)

  • Parser: Advanced SQL parsing and validation
  • Dialect Translation: MySQL โ†’ PostgreSQL conversion
  • Schema Compliance: Validates tables, columns, and joins
  • Business Rules: Enforces specific join requirements (e.g., sales.product_id = products.id)

5. Query Execution Layer (SQLAlchemy)

  • Connection Management: Robust PostgreSQL connectivity
  • Retry Logic: Handles database startup delays
  • Error Handling: Graceful failure management
  • Security: Parameterized query execution

6. Visualization Intelligence

  • Chart Selection: Automatic chart type determination
  • Axis Mapping: Smart column-to-axis mapping
  • Self-Correction: Handles column name mismatches
  • Rendering: Interactive Plotly visualizations

7. Self-Healing Mechanisms

  • Column Resolution: Automatically corrects column name issues
  • Retry Logic: Database connection resilience
  • Fallback Options: Default visualization when primary fails
  • Error Recovery: Graceful degradation with user feedback

๐Ÿ› ๏ธ Technology Stack

Backend

  • Python 3.12: Core application runtime
  • Streamlit: Web application framework
  • SQLAlchemy: Database ORM and connection management
  • Pandas: Data manipulation and analysis
  • SQLGlot: SQL parsing, validation, and dialect translation

AI/ML

  • Ollama: Local LLM server
  • Llama 3.2: Large language model for SQL generation
  • LangChain: LLM integration framework

Database

  • PostgreSQL 15: Primary database engine
  • Adminer: Database administration interface

Visualization

  • Plotly Express: Interactive chart generation
  • Streamlit Components: UI components and layout

Infrastructure

  • Docker: Containerization and orchestration
  • Docker Compose: Multi-service deployment

๐Ÿ“‹ Prerequisites

  • Docker and Docker Compose installed
  • Ollama running locally with Llama 3.2 model
  • At least 8GB RAM (recommended for LLM operations)

Setting up Ollama

  1. Install Ollama from https://ollama.ai
  2. Pull the Llama 3.2 model:
    ollama pull llama3.2:latest
  3. Ensure Ollama is running on the default port (11434)

๐Ÿš€ Quick Start

1. Clone the Repository

git clone <repository-url>
cd beh-ai

2. Start the Application

docker-compose up --build

3. Access the Application

4. Database Setup

The PostgreSQL database will be automatically initialized with sample data from init-db/backup.sql.

๐Ÿ’ก Usage Examples

Example Queries

  1. User Distribution: "Show me the number of users per email domain"
  2. Sales Analysis: "What are the top 10 products by total sales?"
  3. Time Series: "Show monthly sales trends for the last year"
  4. Geographic: "Display user distribution by country"

Expected Workflow

  1. Enter your question in natural language
  2. Click "Generate Visualization"
  3. Review the generated SQL query
  4. Examine the query results
  5. Interact with the generated visualization

๐Ÿ”ง Configuration

Environment Variables

  • Database: Configured in docker-compose.yml

    • User: user
    • Password: password
    • Database: mydatabase
    • Port: 5432
  • Ollama:

    • Base URL: http://host.docker.internal:11434
    • Model: llama3.2:latest

Customization

  • Database Schema: Replace init-db/backup.sql with your data
  • LLM Model: Change model in main.py line 98
  • UI Styling: Modify Streamlit configuration in main.py

๐Ÿข Database Schema

The application automatically discovers your database schema and uses it for:

  • Query validation
  • Table relationship inference
  • Column existence checking
  • Join requirement enforcement

๐Ÿ”’ Security Considerations

  • Database credentials are configured for development use
  • SQL queries are validated before execution
  • Parameterized queries prevent SQL injection
  • Local LLM execution ensures data privacy

๐Ÿ› Troubleshooting

Common Issues

  1. Ollama Connection Failed

    • Ensure Ollama is running: ollama serve
    • Check model availability: ollama list
  2. Database Connection Issues

    • Wait for PostgreSQL initialization (up to 30 seconds)
    • Check Docker container status: docker-compose ps
  3. Out of Memory

    • Increase Docker memory allocation
    • Use a smaller LLM model

๐Ÿ”ฎ Future Enhancements

  • Support for multiple database types (MySQL, SQLite)
  • Advanced chart customization options
  • Query result caching
  • User authentication and query history
  • Advanced analytics and insights
  • Natural language explanations of results
  • Integration with cloud LLM providers

๐Ÿค Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ“ž Support

For support and questions:

  • Create an issue on GitHub
  • Check the troubleshooting section
  • Review Docker logs: docker-compose logs

Beh Viz - Transforming natural language into actionable data insights! ๐Ÿš€๐Ÿ“Š

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors