Skip to content

Contributing Guide ‐ Osservatorio

Andrea Bozzo edited this page Jul 20, 2025 · 1 revision

Contributing Guide - Osservatorio ISTAT

🎯 Come Contribuire

Benvenuto! Siamo felici che tu voglia contribuire al progetto Osservatorio ISTAT. Questa guida ti aiuterà a iniziare nel modo giusto.

🚀 Quick Start per Contributors

1. Setup Iniziale

# Fork il repository su GitHub
# Poi clona il tuo fork
git clone https://github.com/YOUR_USERNAME/Osservatorio.git
cd Osservatorio

# Aggiungi upstream remote
git remote add upstream https://github.com/AndreaBozzo/Osservatorio.git

# Setup ambiente locale
python -m venv venv
source venv/bin/activate  # Linux/Mac
venv\Scripts\activate     # Windows

pip install -r requirements.txt
pip install -r requirements-dev.txt

2. Pre-commit Setup

# Installa pre-commit hooks
pre-commit install

# Test hooks
pre-commit run --all-files

3. Verifica Setup

# Test che tutto funzioni
pytest tests/unit/test_config.py -v
python src/api/istat_api.py

📋 Types of Contributions

🐛 Bug Reports

Usa il Bug Report Template:

  • Descrizione chiara del problema
  • Steps per riprodurre
  • Environment details (OS, Python version)
  • Stack trace se disponibile

✨ Feature Requests

Usa il Feature Request Template:

  • Descrizione del problema che risolve
  • Soluzione proposta
  • Alternative considerate
  • Implementation notes

📖 Documentation

Usiamo il Documentation Template:

  • Improve existing docs
  • Translate to other languages
  • Add examples and tutorials
  • API documentation

🔧 Code Contributions

Seguire il Development Workflow

🔄 Development Workflow

Branch Strategy

# 1. Sync con upstream
git checkout main
git pull upstream main

# 2. Crea feature branch
git checkout -b feature/description
# o bug/description
# o docs/description

# 3. Lavora sui cambiamenti
# ... edit files ...

# 4. Commit con conventional commits
git add .
git commit -m "feat: add new data validation feature"

# 5. Push to fork
git push origin feature/description

# 6. Crea Pull Request su GitHub

Conventional Commits

Usiamo Conventional Commits:

# Format: type(scope): description
feat: add new feature
fix: bug fix
docs: documentation changes
style: formatting changes
refactor: code refactoring
test: adding tests
chore: maintenance tasks

# Examples:
git commit -m "feat(api): add PowerBI dataset validation"
git commit -m "fix(dashboard): resolve memory leak in data loader"
git commit -m "docs(wiki): add troubleshooting guide"
git commit -m "test(converter): add unit tests for XML parsing"

🧪 Testing Requirements

Before Submitting PR

# 1. Run all tests
pytest

# 2. Check coverage (target: 60%+)
pytest --cov=src tests/ --cov-report=term

# 3. Lint and format
black .
flake8 .
isort .

# 4. Security scan
bandit -r src/
safety check

# 5. Pre-commit hooks
pre-commit run --all-files

Writing Tests

# File: tests/unit/test_new_feature.py
import pytest
from src.module import NewFeature

class TestNewFeature:
    def test_basic_functionality(self):
        """Test basic functionality."""
        feature = NewFeature()
        result = feature.process()
        assert result is not None
    
    def test_error_handling(self):
        """Test error handling."""
        feature = NewFeature()
        with pytest.raises(ValueError):
            feature.process(invalid_input=True)
    
    @pytest.mark.parametrize("input,expected", [
        ("test1", "result1"),
        ("test2", "result2"),
    ])
    def test_multiple_inputs(self, input, expected):
        """Test multiple input scenarios."""
        feature = NewFeature()
        result = feature.process(input)
        assert result == expected

📝 Code Style Guidelines

Python Code Style

# ✅ Good: Follow PEP 8
class DataProcessor:
    """Process ISTAT data for analysis."""
    
    def __init__(self, config: dict) -> None:
        """Initialize processor with configuration."""
        self.config = config
        self._logger = get_logger(__name__)
    
    def process_data(self, data: pd.DataFrame) -> pd.DataFrame:
        """Process the input data and return cleaned version."""
        try:
            cleaned_data = self._clean_data(data)
            return self._validate_data(cleaned_data)
        except Exception as e:
            self._logger.error(f"Data processing failed: {e}")
            raise
    
    def _clean_data(self, data: pd.DataFrame) -> pd.DataFrame:
        """Private method for data cleaning."""
        return data.dropna()

# ❌ Bad: Poor style
class dataprocessor:  # PascalCase missing
    def __init__(self,config):  # No type hints, spacing
        self.config=config  # No spacing around =
    def process_data(self,data):  # No spacing, type hints
        cleanedData=data.dropna()  # camelCase in Python
        return cleanedData

Documentation Style

def convert_xml_to_tableau(
    self, 
    xml_input: Union[str, Path], 
    dataset_id: str, 
    dataset_name: str
) -> Dict[str, Any]:
    """
    Convert ISTAT XML data to Tableau-compatible formats.
    
    Args:
        xml_input: Path to XML file or XML content string
        dataset_id: ISTAT dataset identifier (e.g., 'DCIS_POPRES1')
        dataset_name: Human-readable dataset name
    
    Returns:
        Dictionary containing conversion results with keys:
        - success: bool indicating conversion success
        - files_created: dict with paths to generated files
        - data_quality: dict with quality metrics
        - summary: dict with conversion summary
    
    Raises:
        ValueError: If XML content is invalid
        FileNotFoundError: If XML file path doesn't exist
        SecurityError: If file path validation fails
    
    Example:
        >>> converter = IstatXMLtoTableauConverter()
        >>> result = converter.convert_xml_to_tableau(
        ...     "data/raw/population.xml",
        ...     "DCIS_POPRES1",
        ...     "Popolazione Residente"
        ... )
        >>> print(result['summary']['files_created'])
        3
    """

🔒 Security Guidelines

Secure Coding Practices

# ✅ Good: Use security utilities
from src.utils.secure_path import SecurePathValidator
from src.utils.security_enhanced import security_manager

def process_file(file_path: str) -> None:
    """Process file with security validation."""
    validator = SecurePathValidator()
    safe_path = validator.validate_path(file_path)
    
    with validator.safe_open(safe_path, 'r') as file:
        content = file.read()
    
    # Process content...

# ❌ Bad: Direct file access
def process_file(file_path: str) -> None:
    """Insecure file processing."""
    with open(file_path, 'r') as file:  # No path validation
        content = file.read()

Input Validation

# ✅ Good: Validate all inputs
def api_endpoint(user_input: str) -> str:
    """API endpoint with input validation."""
    # Sanitize input
    clean_input = security_manager.sanitize_input(user_input)
    
    # Validate input
    if not clean_input or len(clean_input) > 1000:
        raise ValueError("Invalid input")
    
    return process_input(clean_input)

# ❌ Bad: No validation
def api_endpoint(user_input: str) -> str:
    """Insecure API endpoint."""
    return process_input(user_input)  # Direct use of user input

📊 Performance Guidelines

Efficient Data Processing

# ✅ Good: Use generators and chunking
def process_large_dataset(file_path: str) -> Iterator[pd.DataFrame]:
    """Process large dataset in chunks."""
    chunk_size = 10000
    
    for chunk in pd.read_csv(file_path, chunksize=chunk_size):
        yield process_chunk(chunk)

# ❌ Bad: Load everything in memory
def process_large_dataset(file_path: str) -> pd.DataFrame:
    """Memory-intensive processing."""
    df = pd.read_csv(file_path)  # Could be huge
    return process_dataframe(df)

Caching Best Practices

# ✅ Good: Use caching for expensive operations
from functools import lru_cache

@lru_cache(maxsize=128)
def expensive_calculation(param: str) -> float:
    """Cached expensive calculation."""
    # Expensive computation...
    return result

# Clear cache when needed
expensive_calculation.cache_clear()

🎯 Pull Request Guidelines

PR Checklist

Prima di sottomettere il PR:

  • Descrizione chiara del cambiamento
  • Tests aggiornati/aggiunti
  • Documentation aggiornata
  • Changelog entry (se necessario)
  • All tests pass locally
  • Code coverage mantiene/migliora %
  • Security scan passed
  • Pre-commit hooks passed

PR Template

## Description
Brief description of changes

## Type of Change
- [ ] Bug fix (non-breaking change)
- [ ] New feature (non-breaking change)
- [ ] Breaking change (fix/feature causing existing functionality to not work)
- [ ] Documentation update

## Testing
- [ ] Unit tests added/updated
- [ ] Integration tests added/updated
- [ ] Manual testing performed

## Screenshots (if applicable)
Add screenshots for UI changes

## Additional Notes
Any additional context or notes

Review Process

  1. Automated Checks: CI/CD must pass
  2. Code Review: At least 1 approval required
  3. Manual Testing: For significant changes
  4. Documentation Review: For public API changes
  5. Security Review: For security-related changes

🏷️ Issue Labels

Usiamo labels per categorizzare issues:

Type Labels

  • bug - Bug reports
  • enhancement - Feature requests
  • documentation - Documentation improvements
  • question - General questions
  • duplicate - Duplicate issues
  • invalid - Invalid issues

Priority Labels

  • priority: critical - Critical issues (security, data loss)
  • priority: high - High priority features/fixes
  • priority: medium - Medium priority items
  • priority: low - Low priority nice-to-haves

Component Labels

  • component: api - API-related issues
  • component: dashboard - Dashboard/UI issues
  • component: converter - Data conversion issues
  • component: security - Security-related issues
  • component: testing - Testing infrastructure
  • component: docs - Documentation issues

Status Labels

  • status: waiting-for-feedback - Waiting for user feedback
  • status: in-progress - Currently being worked on
  • status: ready-for-review - Ready for code review
  • status: blocked - Blocked by external dependencies

🎖️ Recognition

Contributors Hall of Fame

Contributors con contributi significativi:

  • Core maintainers
  • Feature contributors
  • Documentation contributors
  • Bug hunters
  • Community helpers

Contribution Types

Riconosciamo vari tipi di contribuzioni:

  • 💻 Code contributions
  • 📖 Documentation
  • 🐛 Bug reports
  • 💡 Ideas & suggestions
  • 🔍 Code reviews
  • 📢 Community building
  • 🌍 Translations

📞 Getting Help

Community Resources

  • GitHub Discussions: Per domande generali
  • Issues: Per bug reports e feature requests
  • Wiki: Per documentazione dettagliata
  • IRC/Discord: [Link quando disponibile]

Mentorship

  • New contributors welcome!
  • Pair programming sessions disponibili
  • Code review learning opportunities
  • Documentation contribution guidance

Office Hours

  • Weekly office hours: [TBD]
  • Timezone-friendly sessions
  • Open to all contributors
  • Focus on mentoring and Q&A

📝 Changelog Guidelines

Keep a Changelog

Seguiamo Keep a Changelog:

## [Unreleased]

### Added
- New feature X for better data processing

### Changed
- Improved performance of API calls

### Fixed
- Fixed bug in XML parsing

### Security
- Updated dependencies for security patches

## [1.2.0] - 2025-01-20

### Added
- Dashboard live deployment
- Security manager implementation

⚖️ Code of Conduct

Our Pledge

We pledge to make participation in our community a harassment-free experience for everyone.

Standards

Examples of behavior that contributes to a positive environment:

  • Using welcoming and inclusive language
  • Being respectful of differing viewpoints
  • Gracefully accepting constructive criticism
  • Focusing on community benefit
  • Showing empathy towards community members

Enforcement

Violations can be reported to the project maintainers. All complaints will be reviewed and investigated promptly and fairly.


🎉 Grazie!

Grazie per il tuo interesse nel contribuire a Osservatorio ISTAT! Every contribution, no matter how small, helps make this project better for everyone.

Happy coding! 🚀

Clone this wiki locally