Contributing Guide ‐ Osservatorio

Contributing Guide - Osservatorio ISTAT

🎯 Come Contribuire

Benvenuto! Siamo felici che tu voglia contribuire al progetto Osservatorio ISTAT. Questa guida ti aiuterà a iniziare nel modo giusto.

🚀 Quick Start per Contributors

1. Setup Iniziale

# Fork il repository su GitHub
# Poi clona il tuo fork
git clone https://github.com/YOUR_USERNAME/Osservatorio.git
cd Osservatorio

# Aggiungi upstream remote
git remote add upstream https://github.com/AndreaBozzo/Osservatorio.git

# Setup ambiente locale
python -m venv venv
source venv/bin/activate  # Linux/Mac
venv\Scripts\activate     # Windows

pip install -r requirements.txt
pip install -r requirements-dev.txt

2. Pre-commit Setup

# Installa pre-commit hooks
pre-commit install

# Test hooks
pre-commit run --all-files

3. Verifica Setup

# Test che tutto funzioni
pytest tests/unit/test_config.py -v
python src/api/istat_api.py

📋 Types of Contributions

🐛 Bug Reports

Usa il Bug Report Template:

Descrizione chiara del problema
Steps per riprodurre
Environment details (OS, Python version)
Stack trace se disponibile

✨ Feature Requests

Usa il Feature Request Template:

Descrizione del problema che risolve
Soluzione proposta
Alternative considerate
Implementation notes

📖 Documentation

Usiamo il Documentation Template:

Improve existing docs
Translate to other languages
Add examples and tutorials
API documentation

🔧 Code Contributions

Seguire il Development Workflow

🔄 Development Workflow

Branch Strategy

# 1. Sync con upstream
git checkout main
git pull upstream main

# 2. Crea feature branch
git checkout -b feature/description
# o bug/description
# o docs/description

# 3. Lavora sui cambiamenti
# ... edit files ...

# 4. Commit con conventional commits
git add .
git commit -m "feat: add new data validation feature"

# 5. Push to fork
git push origin feature/description

# 6. Crea Pull Request su GitHub

Conventional Commits

Usiamo Conventional Commits:

# Format: type(scope): description
feat: add new feature
fix: bug fix
docs: documentation changes
style: formatting changes
refactor: code refactoring
test: adding tests
chore: maintenance tasks

# Examples:
git commit -m "feat(api): add PowerBI dataset validation"
git commit -m "fix(dashboard): resolve memory leak in data loader"
git commit -m "docs(wiki): add troubleshooting guide"
git commit -m "test(converter): add unit tests for XML parsing"

🧪 Testing Requirements

Before Submitting PR

# 1. Run all tests
pytest

# 2. Check coverage (target: 60%+)
pytest --cov=src tests/ --cov-report=term

# 3. Lint and format
black .
flake8 .
isort .

# 4. Security scan
bandit -r src/
safety check

# 5. Pre-commit hooks
pre-commit run --all-files

Writing Tests

# File: tests/unit/test_new_feature.py
import pytest
from src.module import NewFeature

class TestNewFeature:
    def test_basic_functionality(self):
        """Test basic functionality."""
        feature = NewFeature()
        result = feature.process()
        assert result is not None
    
    def test_error_handling(self):
        """Test error handling."""
        feature = NewFeature()
        with pytest.raises(ValueError):
            feature.process(invalid_input=True)
    
    @pytest.mark.parametrize("input,expected", [
        ("test1", "result1"),
        ("test2", "result2"),
    ])
    def test_multiple_inputs(self, input, expected):
        """Test multiple input scenarios."""
        feature = NewFeature()
        result = feature.process(input)
        assert result == expected

📝 Code Style Guidelines

Python Code Style

# ✅ Good: Follow PEP 8
class DataProcessor:
    """Process ISTAT data for analysis."""
    
    def __init__(self, config: dict) -> None:
        """Initialize processor with configuration."""
        self.config = config
        self._logger = get_logger(__name__)
    
    def process_data(self, data: pd.DataFrame) -> pd.DataFrame:
        """Process the input data and return cleaned version."""
        try:
            cleaned_data = self._clean_data(data)
            return self._validate_data(cleaned_data)
        except Exception as e:
            self._logger.error(f"Data processing failed: {e}")
            raise
    
    def _clean_data(self, data: pd.DataFrame) -> pd.DataFrame:
        """Private method for data cleaning."""
        return data.dropna()

# ❌ Bad: Poor style
class dataprocessor:  # PascalCase missing
    def __init__(self,config):  # No type hints, spacing
        self.config=config  # No spacing around =
    def process_data(self,data):  # No spacing, type hints
        cleanedData=data.dropna()  # camelCase in Python
        return cleanedData

Documentation Style

def convert_xml_to_tableau(
    self, 
    xml_input: Union[str, Path], 
    dataset_id: str, 
    dataset_name: str
) -> Dict[str, Any]:
    """
    Convert ISTAT XML data to Tableau-compatible formats.
    
    Args:
        xml_input: Path to XML file or XML content string
        dataset_id: ISTAT dataset identifier (e.g., 'DCIS_POPRES1')
        dataset_name: Human-readable dataset name
    
    Returns:
        Dictionary containing conversion results with keys:
        - success: bool indicating conversion success
        - files_created: dict with paths to generated files
        - data_quality: dict with quality metrics
        - summary: dict with conversion summary
    
    Raises:
        ValueError: If XML content is invalid
        FileNotFoundError: If XML file path doesn't exist
        SecurityError: If file path validation fails
    
    Example:
        >>> converter = IstatXMLtoTableauConverter()
        >>> result = converter.convert_xml_to_tableau(
        ...     "data/raw/population.xml",
        ...     "DCIS_POPRES1",
        ...     "Popolazione Residente"
        ... )
        >>> print(result['summary']['files_created'])
        3
    """

🔒 Security Guidelines

Secure Coding Practices

# ✅ Good: Use security utilities
from src.utils.secure_path import SecurePathValidator
from src.utils.security_enhanced import security_manager

def process_file(file_path: str) -> None:
    """Process file with security validation."""
    validator = SecurePathValidator()
    safe_path = validator.validate_path(file_path)
    
    with validator.safe_open(safe_path, 'r') as file:
        content = file.read()
    
    # Process content...

# ❌ Bad: Direct file access
def process_file(file_path: str) -> None:
    """Insecure file processing."""
    with open(file_path, 'r') as file:  # No path validation
        content = file.read()

Input Validation

# ✅ Good: Validate all inputs
def api_endpoint(user_input: str) -> str:
    """API endpoint with input validation."""
    # Sanitize input
    clean_input = security_manager.sanitize_input(user_input)
    
    # Validate input
    if not clean_input or len(clean_input) > 1000:
        raise ValueError("Invalid input")
    
    return process_input(clean_input)

# ❌ Bad: No validation
def api_endpoint(user_input: str) -> str:
    """Insecure API endpoint."""
    return process_input(user_input)  # Direct use of user input

📊 Performance Guidelines

Efficient Data Processing

# ✅ Good: Use generators and chunking
def process_large_dataset(file_path: str) -> Iterator[pd.DataFrame]:
    """Process large dataset in chunks."""
    chunk_size = 10000
    
    for chunk in pd.read_csv(file_path, chunksize=chunk_size):
        yield process_chunk(chunk)

# ❌ Bad: Load everything in memory
def process_large_dataset(file_path: str) -> pd.DataFrame:
    """Memory-intensive processing."""
    df = pd.read_csv(file_path)  # Could be huge
    return process_dataframe(df)

Caching Best Practices

# ✅ Good: Use caching for expensive operations
from functools import lru_cache

@lru_cache(maxsize=128)
def expensive_calculation(param: str) -> float:
    """Cached expensive calculation."""
    # Expensive computation...
    return result

# Clear cache when needed
expensive_calculation.cache_clear()

🎯 Pull Request Guidelines

PR Checklist

Prima di sottomettere il PR:

PR Template

## Description
Brief description of changes

## Type of Change
- [ ] Bug fix (non-breaking change)
- [ ] New feature (non-breaking change)
- [ ] Breaking change (fix/feature causing existing functionality to not work)
- [ ] Documentation update

## Testing
- [ ] Unit tests added/updated
- [ ] Integration tests added/updated
- [ ] Manual testing performed

## Screenshots (if applicable)
Add screenshots for UI changes

## Additional Notes
Any additional context or notes

Review Process

Automated Checks: CI/CD must pass
Code Review: At least 1 approval required
Manual Testing: For significant changes
Documentation Review: For public API changes
Security Review: For security-related changes

🏷️ Issue Labels

Usiamo labels per categorizzare issues:

Type Labels

bug - Bug reports
enhancement - Feature requests
documentation - Documentation improvements
question - General questions
duplicate - Duplicate issues
invalid - Invalid issues

Priority Labels

priority: critical - Critical issues (security, data loss)
priority: high - High priority features/fixes
priority: medium - Medium priority items
priority: low - Low priority nice-to-haves

Component Labels

component: api - API-related issues
component: dashboard - Dashboard/UI issues
component: converter - Data conversion issues
component: security - Security-related issues
component: testing - Testing infrastructure
component: docs - Documentation issues

Status Labels

status: waiting-for-feedback - Waiting for user feedback
status: in-progress - Currently being worked on
status: ready-for-review - Ready for code review
status: blocked - Blocked by external dependencies

🎖️ Recognition

Contributors Hall of Fame

Contributors con contributi significativi:

Core maintainers
Feature contributors
Documentation contributors
Bug hunters
Community helpers

Contribution Types

Riconosciamo vari tipi di contribuzioni:

💻 Code contributions
📖 Documentation
🐛 Bug reports
💡 Ideas & suggestions
🔍 Code reviews
📢 Community building
🌍 Translations

📞 Getting Help

Community Resources

GitHub Discussions: Per domande generali
Issues: Per bug reports e feature requests
Wiki: Per documentazione dettagliata
IRC/Discord: [Link quando disponibile]

Mentorship

New contributors welcome!
Pair programming sessions disponibili
Code review learning opportunities
Documentation contribution guidance

Office Hours

Weekly office hours: [TBD]
Timezone-friendly sessions
Open to all contributors
Focus on mentoring and Q&A

📝 Changelog Guidelines

Keep a Changelog

Seguiamo Keep a Changelog:

## [Unreleased]

### Added
- New feature X for better data processing

### Changed
- Improved performance of API calls

### Fixed
- Fixed bug in XML parsing

### Security
- Updated dependencies for security patches

## [1.2.0] - 2025-01-20

### Added
- Dashboard live deployment
- Security manager implementation

⚖️ Code of Conduct

Our Pledge

We pledge to make participation in our community a harassment-free experience for everyone.

Standards

Examples of behavior that contributes to a positive environment:

Using welcoming and inclusive language
Being respectful of differing viewpoints
Gracefully accepting constructive criticism
Focusing on community benefit
Showing empathy towards community members

Enforcement

Violations can be reported to the project maintainers. All complaints will be reviewed and investigated promptly and fairly.

🎉 Grazie!

Grazie per il tuo interesse nel contribuire a Osservatorio ISTAT! Every contribution, no matter how small, helps make this project better for everyone.

Happy coding! 🚀

Contributing Guide ‐ Osservatorio

Contributing Guide - Osservatorio ISTAT

🎯 Come Contribuire

🚀 Quick Start per Contributors

1. Setup Iniziale

2. Pre-commit Setup

3. Verifica Setup

📋 Types of Contributions

🐛 Bug Reports

✨ Feature Requests

📖 Documentation

🔧 Code Contributions

🔄 Development Workflow

Branch Strategy

Conventional Commits

🧪 Testing Requirements

Before Submitting PR

Writing Tests

📝 Code Style Guidelines

Python Code Style

Documentation Style

🔒 Security Guidelines

Secure Coding Practices

Input Validation

📊 Performance Guidelines

Efficient Data Processing

Caching Best Practices

🎯 Pull Request Guidelines

PR Checklist

PR Template

Review Process

🏷️ Issue Labels

Type Labels

Priority Labels

Component Labels

Status Labels

🎖️ Recognition

Contributors Hall of Fame

Contribution Types

📞 Getting Help

Community Resources

Mentorship

Office Hours

📝 Changelog Guidelines

Keep a Changelog

⚖️ Code of Conduct

Our Pledge

Standards

Enforcement

🎉 Grazie!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally