Skip to content

Commit 96f5acb

Browse files
Update README and CONTRIBUTING from source repository
1 parent 2b4f7a3 commit 96f5acb

File tree

2 files changed

+252
-25
lines changed

2 files changed

+252
-25
lines changed

CONTRIBUTING.md

Lines changed: 142 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,142 @@
1+
# Contributing to The Annotation Garden Project
2+
3+
Thank you for your interest in contributing to this VLM-based image annotation system for Natural Scene Dataset (NSD) research!
4+
5+
## Getting Started
6+
7+
### Development Setup
8+
9+
1. Fork and clone the repository
10+
2. Set up the development environment:
11+
```bash
12+
conda activate torch-312 # or create: conda create -n torch-312 python=3.12
13+
pip install -e .
14+
```
15+
16+
3. Install pre-commit hooks:
17+
```bash
18+
pip install pre-commit
19+
pre-commit install
20+
```
21+
22+
### Project Structure
23+
24+
- `src/image_annotation/` - Core Python package
25+
- `frontend/` - Next.js web dashboard
26+
- `tests/` - Test suite (real data only, NO MOCKS)
27+
- `.context/` - Development context files
28+
- `.rules/` - Development standards and patterns
29+
30+
## Development Workflow
31+
32+
### Branching Strategy
33+
34+
1. Create feature branches: `git checkout -b feature/short-description`
35+
2. Make atomic commits with descriptive messages (<50 chars, no emojis)
36+
3. Test thoroughly before pushing
37+
4. Submit pull requests to main branch
38+
39+
### Code Standards
40+
41+
#### Python
42+
- Follow `.rules/python.md` standards
43+
- Use ruff for formatting: `ruff check --fix . && ruff format .`
44+
- Type hints required for all functions
45+
- Real tests only - absolutely NO MOCKS (see `.rules/testing.md`)
46+
47+
#### Testing Philosophy
48+
- Use real NSD images for testing
49+
- Real OLLAMA/LLM API calls only
50+
- Docker containers for test databases
51+
- Test against actual behavior, not mocked interfaces
52+
53+
#### Documentation
54+
- Examples over explanations
55+
- Keep README concise - details go in separate docs
56+
- Update `.context/` files for development context
57+
58+
## Contribution Types
59+
60+
### Code Contributions
61+
- Bug fixes and feature implementations
62+
- Performance optimizations
63+
- Test coverage improvements
64+
- Documentation updates
65+
66+
### Research Contributions
67+
- New VLM model integrations
68+
- Annotation quality improvements
69+
- NSD processing enhancements
70+
- Performance benchmarking
71+
72+
### Infrastructure
73+
- CI/CD improvements
74+
- Docker configurations
75+
- Deployment optimizations
76+
77+
## Pull Request Process
78+
79+
1. **Check Context**: Review `.context/plan.md` for current priorities
80+
2. **Research**: Update `.context/research.md` if exploring new approaches
81+
3. **Document Failures**: Log attempts in `.context/scratch_history.md`
82+
4. **Test**: Run `pytest tests/ --cov` with real data
83+
5. **Format**: Run `ruff check --fix . && ruff format .`
84+
6. **Commit**: Atomic commits, descriptive messages
85+
7. **PR**: Reference relevant issues and context files
86+
87+
### PR Requirements
88+
- All tests pass with real data
89+
- Code coverage maintained or improved
90+
- Documentation updated where needed
91+
- No breaking changes without discussion
92+
- Performance implications considered
93+
94+
## Issue Guidelines
95+
96+
### Bug Reports
97+
- Include environment details (Python version, OS, dependencies)
98+
- Provide minimal reproduction case with real data
99+
- Include relevant log output
100+
- Reference specific NSD images if applicable
101+
102+
### Feature Requests
103+
- Describe the research use case
104+
- Consider impact on 25k+ annotation processing
105+
- Discuss integration with existing VLM models
106+
- Provide implementation suggestions if possible
107+
108+
## Development Best Practices
109+
110+
### Performance Considerations
111+
- System handles 25k+ annotations efficiently
112+
- Database queries optimized for large datasets
113+
- Memory usage monitored during batch processing
114+
- Token usage tracking for cost management
115+
116+
### Security
117+
- No API keys in code or commits
118+
- Environment variables for sensitive configuration
119+
- Secure handling of research data
120+
121+
### Research Ethics
122+
- Respect NSD dataset usage guidelines
123+
- Consider annotation quality and bias
124+
- Document model performance characteristics
125+
126+
## Getting Help
127+
128+
- Check `.context/` files for current development context
129+
- Review `.rules/` directory for detailed standards
130+
- Open issues for questions or discussions
131+
- Reference existing code patterns in the codebase
132+
133+
## Recognition
134+
135+
Contributors will be acknowledged in:
136+
- README.md contributor section
137+
- Release notes for significant contributions
138+
- Research publications where applicable (with permission)
139+
140+
---
141+
142+
By contributing, you agree that your contributions will be licensed under the project's CC-BY-NC-SA 4.0 license.

README.md

Lines changed: 110 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,47 +1,132 @@
1-
# Image Annotation Tool
1+
# The Annotation Garden Project
22

3-
Web-based annotation interface for the Annotation Garden Initiative.
3+
🌐 **[View Live Dashboard](https://neuromechanist.github.io/image-annotation)**
44

5-
## Origin
5+
A VLM-based image annotation system for Natural Scene Dataset (NSD) using multiple Vision-Language Models (VLMs).
66

7-
This repository is adapted from [neuromechanist/image-annotation](https://github.com/neuromechanist/image-annotation) with AGI branding and enhanced HED integration.
7+
## Key Features
88

9-
## Features
9+
- **Multi-model support**: OLLAMA, OpenAI GPT-4V, Anthropic Claude
10+
- **Batch processing**: Handle 25k+ annotations with real-time progress
11+
- **Web dashboard**: Interactive visualization and analysis interface
12+
- **Annotation tools**: Reorder, filter, export, and manipulate annotations
13+
- **Research-ready**: Structured JSON output with comprehensive metrics
1014

11-
- Web-based interface for annotating static images
12-
- HED (Hierarchical Event Descriptors) tag integration
13-
- BIDS-compliant output format
14-
- Collaborative annotation workflows
15-
- Version control for annotations
15+
## Quick Start
1616

17-
## Design Integration
17+
### Prerequisites
1818

19-
- AGI logo positioned top-left
20-
- AGI color theme throughout interface
21-
- Consistent with annotation.garden website design
19+
- Python 3.11+
20+
- Node.js 18+
21+
- OLLAMA (for local models)
22+
- API keys for OpenAI/Anthropic (optional)
2223

23-
## Installation
24+
### Installation
2425

25-
*To be documented after cloning from source repository*
26+
```bash
27+
# Clone and setup
28+
git clone https://github.com/neuromechanist/hed-image-annotation.git
29+
cd hed-image-annotation
2630

27-
## Usage
31+
# Python environment
32+
conda activate torch-312 # or create: conda create -n torch-312 python=3.12
33+
pip install -e .
2834

29-
*To be documented*
35+
# Frontend
36+
cd frontend && npm install
37+
```
38+
39+
### Quick Usage
40+
41+
```bash
42+
# Start OLLAMA (for local models)
43+
ollama serve
44+
45+
# Test VLM service
46+
python -m image_annotation.services.vlm_service
47+
48+
# Run frontend dashboard
49+
cd frontend && npm run dev
50+
# Visit http://localhost:3000
51+
52+
# Configuration
53+
cp config/config.example.json config/config.json
54+
# Edit config.json with API keys and NSD image paths
55+
```
56+
57+
## Architecture
58+
59+
- **Backend**: FastAPI with OLLAMA/OpenAI/Anthropic integration
60+
- **Frontend**: Next.js dashboard with real-time progress tracking
61+
- **Storage**: JSON files with database support for large datasets
62+
- **Processing**: Stateless VLM calls with comprehensive metrics
63+
64+
## Annotation Tools
65+
66+
Powerful CLI tools for post-processing annotations:
67+
68+
```python
69+
from image_annotation.utils import reorder_annotations, remove_model, export_to_csv
70+
71+
# Reorder model annotations
72+
reorder_annotations("annotations/", ["best_model", "second_best"])
73+
74+
# Remove underperforming models
75+
remove_model("annotations/", "poor_model")
76+
77+
# Export for analysis
78+
export_to_csv("annotations/", "results.csv", include_metrics=True)
79+
```
80+
81+
## Programmatic Usage
82+
83+
```python
84+
from image_annotation.services.vlm_service import VLMService, VLMPrompt
85+
86+
# Initialize and process
87+
service = VLMService(model="gemma3:4b")
88+
results = service.process_batch(
89+
image_paths=["path/to/image.jpg"],
90+
prompts=[VLMPrompt(id="describe", text="Describe this image")],
91+
models=["gemma3:4b", "llava:latest"]
92+
)
93+
94+
# Results include comprehensive metrics
95+
for result in results:
96+
print(f"Tokens: {result.token_metrics.total_tokens}")
97+
print(f"Speed: {result.performance_metrics.tokens_per_second}/sec")
98+
```
3099

31100
## Development
32101

33-
*To be documented*
102+
```bash
103+
# Test with real data (no mocks)
104+
pytest tests/ --cov
34105

35-
## Integration with AGI
106+
# Format code
107+
ruff check --fix . && ruff format .
108+
```
36109

37-
This tool serves as the primary interface for annotating static image datasets in the Annotation Garden ecosystem, starting with:
38-
- Natural Scenes Dataset (NSD): 73,000 COCO images
39-
- Other image-based stimulus repositories
110+
111+
## NSD Research Usage
112+
113+
1. **Download NSD images** to `/path/to/NSD_stimuli/shared1000/`
114+
2. **Configure models** in `config/config.json`
115+
3. **Process in batches** using `VLMService.process_batch()`
116+
4. **Post-process** with annotation tools
117+
5. **Export results** to CSV for analysis
118+
119+
See [CONTRIBUTING.md](CONTRIBUTING.md) for detailed research workflows.
40120

41121
## Contributing
42122

43-
See [CONTRIBUTING.md](https://github.com/Annotation-Garden/management/blob/main/CONTRIBUTING.md) in the management repository.
123+
See [CONTRIBUTING.md](CONTRIBUTING.md) for development setup, standards, and submission process.
44124

45125
## License
46126

47-
*To be determined based on source repository*
127+
This project is licensed under CC-BY-NC-SA 4.0 - see the [LICENSE](LICENSE) file for details.
128+
129+
## Acknowledgments
130+
131+
- Natural Scene Dataset (NSD) team
132+
- LangChain and OLLAMA communities

0 commit comments

Comments
 (0)