Skip to content

Latest commit

 

History

History
266 lines (186 loc) · 5.2 KB

File metadata and controls

266 lines (186 loc) · 5.2 KB

🚀 GitSage – AI-Powered GitHub Repository Intelligence

CoC Inheritance 2025
GitSage: Code Confusion? We've Git You Covered

By Team GitSage

Table of Contents

📝 Description

GitSage is an AI-powered repository intelligence system that enables users to:

  • Ask natural language questions about any GitHub repository
  • Automatically generate structured documentation
  • Compare two repositories intelligently

It solves the problem of developer onboarding and repository understanding using:

  • Retrieval-Augmented Generation (RAG)
  • Code + text embeddings
  • Persistent vector database search
  • Large Language Models for reasoning

GitSage transforms raw source code into structured insights.


🔗 Links


🤖 Tech Stack


🏗️ System Architecture

graph LR
    A[User Input] --> B[FastAPI Backend]
    B --> C[Ingestion Pipeline]
    C --> D[Embedding Pipeline]
    D --> E[ChromaDB Vector Store]
    E --> F[Retriever]
    F --> G[LLM - Groq API]
    G --> H[Final Response]

Loading

🌐 Frontend

  • React (Vite)
  • TypeScript
  • Tailwind CSS
  • Lucide Icons
  • Responsive UI

⚙️ Backend

  • FastAPI
  • Python 3.11
  • Async ingestion pipeline
  • RESTful API architecture
  • Modular service design

🧠 AI / ML Layer

  • Retrieval-Augmented Generation (RAG)
  • Code embedding model
  • Sentence embedding model
  • Groq LLM API
  • Prompt engineering with hallucination control

🗄️ Database

  • ChromaDB (Persistent Vector Database)
  • Metadata-based filtering
  • Separate collections for code and text embeddings

📈 Progress


✅ Fully Implemented Features

🔹 Intelligent Q&A System

  • Natural language repository queries
  • Context-aware retrieval
  • Grounded LLM responses
  • Controlled inference without hallucination

🔹 Automatic Documentation Generator

  • Structured documentation generation
  • Overview, architecture, modules
  • Tech stack detection
  • Setup and usage instructions

🔹 Repository Comparison Engine

  • Side-by-side metadata comparison
  • LLM-based architectural analysis
  • Strengths, trade-offs, verdict
  • Feature comparison table

🔹 Version-Aware Ingestion

  • Detects repository updates
  • Avoids redundant embeddings
  • Maintains ingestion consistency

🚧 Work in Progress

  • Advanced tech stack inference
  • AST-based deeper code analysis
  • Performance optimization for large repositories
  • Query caching system

🔮 Future Scope

  • Cloud deployment with scalable vector storage
  • Multi-repository cross-analysis
  • Visual architecture diagram generation
  • Authentication & saved workspaces
  • Enterprise-level CI/CD integration

💸 Applications

  1. Developer Onboarding – Understand unfamiliar codebases quickly
  2. Open Source Exploration – Analyze large repositories before contributing
  3. Code Review Support – Gain instant architectural insights
  4. Academic Learning – Explore algorithm-heavy repositories
  5. Technical Interviews – Evaluate GitHub projects efficiently

🛠 Project Setup

📌 Prerequisites

  • Python 3.11+
  • Node.js 18+ and npm
  • Groq API Key
  • GitHub Personal Access Token (PAT)

Create a .env file inside the backend/ folder:

GROQ_API_KEY=your_groq_api_key
GITHUB_PAT=your_github_pat

1️⃣ Clone Repository

git clone https://github.com/your-username/GitSage.git
cd GitSage

2️⃣ Backend Setup

cd backend
python3.11 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
uvicorn main:app --reload

Backend runs at:

http://127.0.0.1:8000

3️⃣ Frontend Setup

cd frontend
npm install
npm run dev

Frontend runs at:

http://localhost:5173

👨‍💻 Team Members


👨‍🏫 Mentors


💎 Why GitSage?

  • Modular AI architecture
  • Persistent vector search
  • Dual embedding pipeline
  • Structured LLM reasoning
  • Real-world developer problem solving
  • End-to-end full-stack system

⭐ Built with intelligence. Powered by code understanding.