Skip to content

chore: deleteme #450

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 12 commits into from
Closed
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 33 additions & 0 deletions .docker/minio/setup.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
#!/bin/sh

# Simple script to set up MinIO bucket and user
# Based on example from MinIO issues

# Format bucket name to ensure compatibility
BUCKET_NAME=$(echo "${S3_BUCKET_NAME}" | tr '[:upper:]' '[:lower:]' | tr '_' '-')

# Configure MinIO client
mc alias set myminio http://minio:9000 ${MINIO_ROOT_USER} ${MINIO_ROOT_PASSWORD}

# Remove bucket if it exists (for clean setup)
mc rm -r --force myminio/${BUCKET_NAME} || true

# Create bucket
mc mb myminio/${BUCKET_NAME}

# Set bucket policy to allow downloads
mc anonymous set download myminio/${BUCKET_NAME}

# Create user with access and secret keys
mc admin user add myminio ${S3_ACCESS_KEY} ${S3_SECRET_KEY} || echo "User already exists"

# Create policy for the bucket
echo '{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Action":["s3:*"],"Resource":["arn:aws:s3:::'${BUCKET_NAME}'/*","arn:aws:s3:::'${BUCKET_NAME}'"]}]}' > /tmp/policy.json

# Apply policy
mc admin policy create myminio gitingest-policy /tmp/policy.json || echo "Policy already exists"
mc admin policy attach myminio gitingest-policy --user ${S3_ACCESS_KEY}

echo "MinIO setup completed successfully"
echo "Bucket: ${BUCKET_NAME}"
echo "Access via console: http://localhost:9001"
23 changes: 23 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -33,3 +33,26 @@ GITINGEST_SENTRY_PROFILE_LIFECYCLE=trace
GITINGEST_SENTRY_SEND_DEFAULT_PII=true
# Environment name for Sentry (default: "")
GITINGEST_SENTRY_ENVIRONMENT=development

# MinIO Configuration (for development)
# Root user credentials for MinIO admin access
MINIO_ROOT_USER=minioadmin
MINIO_ROOT_PASSWORD=minioadmin

# S3 Configuration (for application)
# Set to "true" to enable S3 storage for digests
# S3_ENABLED=true
# Endpoint URL for the S3 service (MinIO in development)
S3_ENDPOINT=http://minio:9000
# Access key for the S3 bucket (created automatically in development)
S3_ACCESS_KEY=gitingest
# Secret key for the S3 bucket (created automatically in development)
S3_SECRET_KEY=gitingest123
# Name of the S3 bucket (created automatically in development)
S3_BUCKET_NAME=gitingest-bucket
# Region for the S3 bucket (default for MinIO)
S3_REGION=us-east-1
# Public URL/CDN for accessing S3 resources
S3_ALIAS_HOST=127.0.0.1:9000/gitingest-bucket
# Optional prefix for S3 file paths (if set, prefixes all S3 paths with this value)
# S3_DIRECTORY_PREFIX=my-prefix
2 changes: 1 addition & 1 deletion .github/workflows/codeql.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ jobs:
strategy:
fail-fast: false
matrix:
language: ["javascript", "python"]
language: ["javascript", "python", "actions", "javascript-typescript"]
# CodeQL supports [ $supported-codeql-languages ]
# Learn more about CodeQL language support at https://aka.ms/codeql-docs/language-support

Expand Down
2 changes: 2 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,7 @@ repos:
files: ^src/
additional_dependencies:
[
boto3>=1.28.0,
click>=8.0.0,
'fastapi[standard]>=0.109.1',
httpx,
Expand All @@ -138,6 +139,7 @@ repos:
- --rcfile=tests/.pylintrc
additional_dependencies:
[
boto3>=1.28.0,
click>=8.0.0,
'fastapi[standard]>=0.109.1',
httpx,
Expand Down
18 changes: 13 additions & 5 deletions .vscode/launch.json
Original file line number Diff line number Diff line change
@@ -1,12 +1,20 @@
{
"version": "0.2.0",
"configurations": [
{
"name": "Python Debugger: Module",
"name": "Python: Attach to Docker",
"type": "debugpy",
"request": "launch",
"module": "uvicorn",
"args": ["server.main:app", "--host", "0.0.0.0", "--port", "8000"],
"cwd": "${workspaceFolder}/src"
"request": "attach",
"connect": {
"host": "localhost",
"port": 5678
},
"pathMappings": [
{
"localRoot": "${workspaceFolder}/src",
"remoteRoot": "/app"
}
]
}
]
}
85 changes: 85 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -204,6 +204,8 @@ This is because Jupyter notebooks are asynchronous by default.

## 🐳 Self-host

### Using Docker

1. Build the image:

``` bash
Expand Down Expand Up @@ -239,6 +241,89 @@ The application can be configured using the following environment variables:
- **GITINGEST_SENTRY_PROFILE_SESSION_SAMPLE_RATE**: Sampling rate for profile sessions (default: "1.0", range: 0.0-1.0)
- **GITINGEST_SENTRY_PROFILE_LIFECYCLE**: Profile lifecycle mode (default: "trace")
- **GITINGEST_SENTRY_SEND_DEFAULT_PII**: Send default personally identifiable information (default: "true")
- **S3_ALIAS_HOST**: Public URL/CDN for accessing S3 resources (default: "127.0.0.1:9000/gitingest-bucket")
- **S3_DIRECTORY_PREFIX**: Optional prefix for S3 file paths (if set, prefixes all S3 paths with this value)

### Using Docker Compose

The project includes a `compose.yml` file that allows you to easily run the application in both development and production environments.

#### Compose File Structure

The `compose.yml` file uses YAML anchoring with `&app-base` and `<<: *app-base` to define common configuration that is shared between services:

```yaml
# Common base configuration for all services
x-app-base: &app-base
build:
context: .
dockerfile: Dockerfile
ports:
- "${APP_WEB_BIND:-8000}:8000" # Main application port
- "${GITINGEST_METRICS_HOST:-127.0.0.1}:${GITINGEST_METRICS_PORT:-9090}:9090" # Metrics port
# ... other common configurations
```

#### Services

The file defines three services:

1. **app**: Production service configuration
- Uses the `prod` profile
- Sets the Sentry environment to "production"
- Configured for stable operation with `restart: unless-stopped`

2. **app-dev**: Development service configuration
- Uses the `dev` profile
- Enables debug mode
- Mounts the source code for live development
- Uses hot reloading for faster development

3. **minio**: S3-compatible object storage for development
- Uses the `dev` profile (only available in development mode)
- Provides S3-compatible storage for local development
- Accessible via:
- API: Port 9000 ([localhost:9000](http://localhost:9000))
- Web Console: Port 9001 ([localhost:9001](http://localhost:9001))
- Default admin credentials:
- Username: `minioadmin`
- Password: `minioadmin`
- Configurable via environment variables:
- `MINIO_ROOT_USER`: Custom admin username (default: minioadmin)
- `MINIO_ROOT_PASSWORD`: Custom admin password (default: minioadmin)
- Includes persistent storage via Docker volume
- Auto-creates a bucket and application-specific credentials:
- Bucket name: `gitingest-bucket` (configurable via `S3_BUCKET_NAME`)
- Access key: `gitingest` (configurable via `S3_ACCESS_KEY`)
- Secret key: `gitingest123` (configurable via `S3_SECRET_KEY`)
- These credentials are automatically passed to the app-dev service via environment variables:
- `S3_ENDPOINT`: URL of the MinIO server
- `S3_ACCESS_KEY`: Access key for the S3 bucket
- `S3_SECRET_KEY`: Secret key for the S3 bucket
- `S3_BUCKET_NAME`: Name of the S3 bucket
- `S3_REGION`: Region for the S3 bucket (default: us-east-1)
- `S3_ALIAS_HOST`: Public URL/CDN for accessing S3 resources (default: "127.0.0.1:9000/gitingest-bucket")

#### Usage Examples

To run the application in development mode:

```bash
docker compose --profile dev up
```

To run the application in production mode:

```bash
docker compose --profile prod up -d
```

To build and run the application:

```bash
docker compose --profile prod build
docker compose --profile prod up -d
```

## 🤝 Contributing

Expand Down
110 changes: 110 additions & 0 deletions compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
# Common base configuration for all services
x-app-base: &app-base
ports:
- "${APP_WEB_BIND:-8000}:8000" # Main application port
- "${GITINGEST_METRICS_HOST:-127.0.0.1}:${GITINGEST_METRICS_PORT:-9090}:9090" # Metrics port
environment:
# Python Configuration
- PYTHONUNBUFFERED=1
- PYTHONDONTWRITEBYTECODE=1
# Host Configuration
- ALLOWED_HOSTS=${ALLOWED_HOSTS:-gitingest.com,*.gitingest.com,localhost,127.0.0.1}
# Metrics Configuration
- GITINGEST_METRICS_ENABLED=${GITINGEST_METRICS_ENABLED:-true}
- GITINGEST_METRICS_HOST=${GITINGEST_METRICS_HOST:-127.0.0.1}
- GITINGEST_METRICS_PORT=${GITINGEST_METRICS_PORT:-9090}
# Sentry Configuration
- GITINGEST_SENTRY_ENABLED=${GITINGEST_SENTRY_ENABLED:-false}
- GITINGEST_SENTRY_DSN=${GITINGEST_SENTRY_DSN:-}
- GITINGEST_SENTRY_TRACES_SAMPLE_RATE=${GITINGEST_SENTRY_TRACES_SAMPLE_RATE:-1.0}
- GITINGEST_SENTRY_PROFILE_SESSION_SAMPLE_RATE=${GITINGEST_SENTRY_PROFILE_SESSION_SAMPLE_RATE:-1.0}
- GITINGEST_SENTRY_PROFILE_LIFECYCLE=${GITINGEST_SENTRY_PROFILE_LIFECYCLE:-trace}
- GITINGEST_SENTRY_SEND_DEFAULT_PII=${GITINGEST_SENTRY_SEND_DEFAULT_PII:-true}
user: "1000:1000"
command: ["python", "-m", "uvicorn", "server.main:app", "--host", "0.0.0.0", "--port", "8000"]

services:
# Production service configuration
app:
<<: *app-base
image: ghcr.io/coderamp-labs/gitingest:latest
profiles:
- prod
environment:
- GITINGEST_SENTRY_ENVIRONMENT=${GITINGEST_SENTRY_ENVIRONMENT:-production}
restart: unless-stopped

# Development service configuration
app-dev:
<<: *app-base
build:
context: .
dockerfile: Dockerfile
profiles:
- dev
environment:
- DEBUG=true
- GITINGEST_SENTRY_ENVIRONMENT=${GITINGEST_SENTRY_ENVIRONMENT:-development}
# S3 Configuration
- S3_ENABLED=true
- S3_ENDPOINT=http://minio:9000
- S3_ACCESS_KEY=${S3_ACCESS_KEY:-gitingest}
- S3_SECRET_KEY=${S3_SECRET_KEY:-gitingest123}
# Use lowercase bucket name to ensure compatibility with MinIO
- S3_BUCKET_NAME=${S3_BUCKET_NAME:-gitingest-bucket}
- S3_REGION=${S3_REGION:-us-east-1}
# Public URL for S3 resources
- S3_ALIAS_HOST=${S3_ALIAS_HOST:-http://127.0.0.1:9000/${S3_BUCKET_NAME:-gitingest-bucket}}
volumes:
# Mount source code for live development
- ./src:/app:ro
# Use --reload flag for hot reloading during development
command: ["python", "-m", "uvicorn", "server.main:app", "--host", "0.0.0.0", "--port", "8000", "--reload"]
depends_on:
minio-setup:
condition: service_completed_successfully

# MinIO S3-compatible object storage for development
minio:
image: minio/minio:latest
profiles:
- dev
ports:
- "9000:9000" # API port
- "9001:9001" # Console port
environment:
- MINIO_ROOT_USER=${MINIO_ROOT_USER:-minioadmin}
- MINIO_ROOT_PASSWORD=${MINIO_ROOT_PASSWORD:-minioadmin}
volumes:
- minio-data:/data
command: server /data --console-address ":9001"
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
interval: 30s
timeout: 30s
start_period: 30s
start_interval: 1s

# MinIO setup service to create bucket and user
minio-setup:
image: minio/mc
profiles:
- dev
depends_on:
minio:
condition: service_healthy
environment:
- MINIO_ROOT_USER=${MINIO_ROOT_USER:-minioadmin}
- MINIO_ROOT_PASSWORD=${MINIO_ROOT_PASSWORD:-minioadmin}
- S3_ACCESS_KEY=${S3_ACCESS_KEY:-gitingest}
- S3_SECRET_KEY=${S3_SECRET_KEY:-gitingest123}
- S3_BUCKET_NAME=${S3_BUCKET_NAME:-gitingest-bucket}
volumes:
- ./.docker/minio/setup.sh:/setup.sh:ro
entrypoint: sh
command: -c /setup.sh

volumes:
minio-data:
driver: local
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@ description="CLI tool to analyze and create text dumps of codebases for LLMs"
readme = {file = "README.md", content-type = "text/markdown" }
requires-python = ">= 3.8"
dependencies = [
"click>=8.0.0",
"httpx",
"pathspec>=0.12.1",
"pydantic",
Expand Down Expand Up @@ -44,6 +43,7 @@ dev = [
]

server = [
"boto3>=1.28.0", # AWS SDK for S3 support
"fastapi[standard]>=0.109.1", # Minimum safe release (https://osv.dev/vulnerability/PYSEC-2024-38)
"prometheus-client",
"sentry-sdk[fastapi]",
Expand Down
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
boto3>=1.28.0 # AWS SDK for S3 support
click>=8.0.0
fastapi[standard]>=0.109.1 # Vulnerable to https://osv.dev/vulnerability/PYSEC-2024-38
httpx
Expand Down
6 changes: 3 additions & 3 deletions src/gitingest/query_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,9 +44,9 @@ async def parse_remote_repo(source: str, token: str | None = None) -> IngestionQ
host = parsed_url.netloc
user, repo = _get_user_and_repo_from_path(parsed_url.path)

_id = str(uuid.uuid4())
_id = uuid.uuid4()
slug = f"{user}-{repo}"
local_path = TMP_BASE_PATH / _id / slug
local_path = TMP_BASE_PATH / str(_id) / slug
url = f"https://{host}/{user}/{repo}"

query = IngestionQuery(
Expand Down Expand Up @@ -132,7 +132,7 @@ def parse_local_dir_path(path_str: str) -> IngestionQuery:
"""
path_obj = Path(path_str).resolve()
slug = path_obj.name if path_str == "." else path_str.strip("/")
return IngestionQuery(local_path=path_obj, slug=slug, id=str(uuid.uuid4()))
return IngestionQuery(local_path=path_obj, slug=slug, id=uuid.uuid4())


async def _configure_branch_or_tag(
Expand Down
Loading
Loading