Update README and CONTRIBUTING from source repository

neuromechanist · neuromechanist · commit 96f5acbc2944 · 2025-11-24T02:13:40.000-08:00
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -0,0 +1,142 @@
+# Contributing to The Annotation Garden Project
+
+Thank you for your interest in contributing to this VLM-based image annotation system for Natural Scene Dataset (NSD) research!
+
+## Getting Started
+
+### Development Setup
+
+1. Fork and clone the repository
+2. Set up the development environment:
+```bash
+conda activate torch-312  # or create: conda create -n torch-312 python=3.12
+pip install -e .
+```
+
+3. Install pre-commit hooks:
+```bash
+pip install pre-commit
+pre-commit install
+```
+
+### Project Structure
+
+- `src/image_annotation/` - Core Python package
+- `frontend/` - Next.js web dashboard
+- `tests/` - Test suite (real data only, NO MOCKS)
+- `.context/` - Development context files
+- `.rules/` - Development standards and patterns
+
+## Development Workflow
+
+### Branching Strategy
+
+1. Create feature branches: `git checkout -b feature/short-description`
+2. Make atomic commits with descriptive messages (<50 chars, no emojis)
+3. Test thoroughly before pushing
+4. Submit pull requests to main branch
+
+### Code Standards
+
+#### Python
+- Follow `.rules/python.md` standards
+- Use ruff for formatting: `ruff check --fix . && ruff format .`
+- Type hints required for all functions
+- Real tests only - absolutely NO MOCKS (see `.rules/testing.md`)
+
+#### Testing Philosophy
+- Use real NSD images for testing
+- Real OLLAMA/LLM API calls only
+- Docker containers for test databases
+- Test against actual behavior, not mocked interfaces
+
+#### Documentation
+- Examples over explanations
+- Keep README concise - details go in separate docs
+- Update `.context/` files for development context
+
+## Contribution Types
+
+### Code Contributions
+- Bug fixes and feature implementations
+- Performance optimizations
+- Test coverage improvements
+- Documentation updates
+
+### Research Contributions
+- New VLM model integrations
+- Annotation quality improvements
+- NSD processing enhancements
+- Performance benchmarking
+
+### Infrastructure
+- CI/CD improvements
+- Docker configurations
+- Deployment optimizations
+
+## Pull Request Process
+
+1. **Check Context**: Review `.context/plan.md` for current priorities
+2. **Research**: Update `.context/research.md` if exploring new approaches
+3. **Document Failures**: Log attempts in `.context/scratch_history.md`
+4. **Test**: Run `pytest tests/ --cov` with real data
+5. **Format**: Run `ruff check --fix . && ruff format .`
+6. **Commit**: Atomic commits, descriptive messages
+7. **PR**: Reference relevant issues and context files
+
+### PR Requirements
+- All tests pass with real data
+- Code coverage maintained or improved
+- Documentation updated where needed
+- No breaking changes without discussion
+- Performance implications considered
+
+## Issue Guidelines
+
+### Bug Reports
+- Include environment details (Python version, OS, dependencies)
+- Provide minimal reproduction case with real data
+- Include relevant log output
+- Reference specific NSD images if applicable
+
+### Feature Requests
+- Describe the research use case
+- Consider impact on 25k+ annotation processing
+- Discuss integration with existing VLM models
+- Provide implementation suggestions if possible
+
+## Development Best Practices
+
+### Performance Considerations
+- System handles 25k+ annotations efficiently
+- Database queries optimized for large datasets
+- Memory usage monitored during batch processing
+- Token usage tracking for cost management
+
+### Security
+- No API keys in code or commits
+- Environment variables for sensitive configuration
+- Secure handling of research data
+
+### Research Ethics
+- Respect NSD dataset usage guidelines
+- Consider annotation quality and bias
+- Document model performance characteristics
+
+## Getting Help
+
+- Check `.context/` files for current development context
+- Review `.rules/` directory for detailed standards
+- Open issues for questions or discussions
+- Reference existing code patterns in the codebase
+
+## Recognition
+
+Contributors will be acknowledged in:
+- README.md contributor section
+- Release notes for significant contributions
+- Research publications where applicable (with permission)
+
+---
+
+By contributing, you agree that your contributions will be licensed under the project's CC-BY-NC-SA 4.0 license.
diff --git a/README.md b/README.md
@@ -1,47 +1,132 @@
-# Image Annotation Tool
+# The Annotation Garden Project
 
-Web-based annotation interface for the Annotation Garden Initiative.
+🌐 **[View Live Dashboard](https://neuromechanist.github.io/image-annotation)**
 
-## Origin
+A VLM-based image annotation system for Natural Scene Dataset (NSD) using multiple Vision-Language Models (VLMs).
 
-This repository is adapted from [neuromechanist/image-annotation](https://github.com/neuromechanist/image-annotation) with AGI branding and enhanced HED integration.
+## Key Features
 
-## Features
+- **Multi-model support**: OLLAMA, OpenAI GPT-4V, Anthropic Claude
+- **Batch processing**: Handle 25k+ annotations with real-time progress
+- **Web dashboard**: Interactive visualization and analysis interface
+- **Annotation tools**: Reorder, filter, export, and manipulate annotations
+- **Research-ready**: Structured JSON output with comprehensive metrics
 
-- Web-based interface for annotating static images
-- HED (Hierarchical Event Descriptors) tag integration
-- BIDS-compliant output format
-- Collaborative annotation workflows
-- Version control for annotations
+## Quick Start
 
-## Design Integration
+### Prerequisites
 
-- AGI logo positioned top-left
-- AGI color theme throughout interface
-- Consistent with annotation.garden website design
+- Python 3.11+
+- Node.js 18+
+- OLLAMA (for local models)
+- API keys for OpenAI/Anthropic (optional)
 
-## Installation
+### Installation
 
-*To be documented after cloning from source repository*
+```bash
+# Clone and setup
+git clone https://github.com/neuromechanist/hed-image-annotation.git
+cd hed-image-annotation
 
-## Usage
+# Python environment
+conda activate torch-312  # or create: conda create -n torch-312 python=3.12
+pip install -e .
 
-*To be documented*
+# Frontend
+cd frontend && npm install
+```
+
+### Quick Usage
+
+```bash
+# Start OLLAMA (for local models)
+ollama serve
+
+# Test VLM service
+python -m image_annotation.services.vlm_service
+
+# Run frontend dashboard
+cd frontend && npm run dev
+# Visit http://localhost:3000
+
+# Configuration
+cp config/config.example.json config/config.json
+# Edit config.json with API keys and NSD image paths
+```
+
+## Architecture
+
+- **Backend**: FastAPI with OLLAMA/OpenAI/Anthropic integration
+- **Frontend**: Next.js dashboard with real-time progress tracking  
+- **Storage**: JSON files with database support for large datasets
+- **Processing**: Stateless VLM calls with comprehensive metrics
+
+## Annotation Tools
+
+Powerful CLI tools for post-processing annotations:
+
+```python
+from image_annotation.utils import reorder_annotations, remove_model, export_to_csv
+
+# Reorder model annotations
+reorder_annotations("annotations/", ["best_model", "second_best"])
+
+# Remove underperforming models
+remove_model("annotations/", "poor_model")
+
+# Export for analysis
+export_to_csv("annotations/", "results.csv", include_metrics=True)
+```
+
+## Programmatic Usage
+
+```python
+from image_annotation.services.vlm_service import VLMService, VLMPrompt
+
+# Initialize and process
+service = VLMService(model="gemma3:4b")
+results = service.process_batch(
+    image_paths=["path/to/image.jpg"],
+    prompts=[VLMPrompt(id="describe", text="Describe this image")],
+    models=["gemma3:4b", "llava:latest"]
+)
+
+# Results include comprehensive metrics
+for result in results:
+    print(f"Tokens: {result.token_metrics.total_tokens}")
+    print(f"Speed: {result.performance_metrics.tokens_per_second}/sec")
+```
 
 ## Development
 
-*To be documented*
+```bash
+# Test with real data (no mocks)
+pytest tests/ --cov
 
-## Integration with AGI
+# Format code
+ruff check --fix . && ruff format .
+```
 
-This tool serves as the primary interface for annotating static image datasets in the Annotation Garden ecosystem, starting with:
-- Natural Scenes Dataset (NSD): 73,000 COCO images
-- Other image-based stimulus repositories
+
+## NSD Research Usage
+
+1. **Download NSD images** to `/path/to/NSD_stimuli/shared1000/`
+2. **Configure models** in `config/config.json`
+3. **Process in batches** using `VLMService.process_batch()`
+4. **Post-process** with annotation tools
+5. **Export results** to CSV for analysis
+
+See [CONTRIBUTING.md](CONTRIBUTING.md) for detailed research workflows.
 
 ## Contributing
 
-See [CONTRIBUTING.md](https://github.com/Annotation-Garden/management/blob/main/CONTRIBUTING.md) in the management repository.
+See [CONTRIBUTING.md](CONTRIBUTING.md) for development setup, standards, and submission process.
 
 ## License
 
-*To be determined based on source repository*
+This project is licensed under CC-BY-NC-SA 4.0 - see the [LICENSE](LICENSE) file for details.
+
+## Acknowledgments
+
+- Natural Scene Dataset (NSD) team
+- LangChain and OLLAMA communities