Skip to content

Optimize JSON parsing and add caching for performance #126

@cgrindel

Description

@cgrindel

Summary

The codebase currently performs multiple JSON parsing passes and lacks caching mechanisms, which can impact performance especially for larger projects with many Swift packages. We need to optimize these operations.

Performance Issues Identified

Multiple JSON Parsing Passes

  • spreso package: Parses Package.resolved multiple times for version detection
  • swiftpkg package: Redundant parsing of package description and dump JSON
  • Dependency resolution: Re-parsing the same package files repeatedly

Missing Caching

  • Package parsing: No caching for repeated PackageInfo creation
  • Dependency resolution: No memoization for expensive resolution operations
  • File system operations: No caching for file reads

Memory Allocation

  • Large data structures: No size limits or streaming for very large projects
  • Frequent allocations: Could benefit from object pooling

Proposed Optimizations

JSON Parsing Optimization

// Current: Multiple parsing passes
func detectVersion(data []byte) (int, error) {
    // Parse once to detect version, then parse again for actual data
}

// Better: Single pass with version detection
type VersionedPackageResolved struct {
    Version int
    Data    interface{} // Parsed based on detected version
}

Caching Implementation

type CachedPackageInfo struct {
    info      *PackageInfo
    timestamp time.Time
    hash      string
}

type PackageInfoCache struct {
    cache map[string]*CachedPackageInfo
    mutex sync.RWMutex
}

func (resolver *PackageResolver) getCachedPackageInfo(path string) (*PackageInfo, bool) {
    // Check file modification time and hash for cache validity
    // Return cached result if valid, otherwise parse and cache
}

Memory Pool for Frequent Allocations

var packageInfoPool = sync.Pool{
    New: func() interface{} {
        return &PackageInfo{}
    },
}

func NewPackageInfo() *PackageInfo {
    pkg := packageInfoPool.Get().(*PackageInfo)
    // Reset fields
    return pkg
}

func (pkg *PackageInfo) Release() {
    // Reset to zero values
    packageInfoPool.Put(pkg)
}

Implementation Areas

High Priority

  1. Package.resolved parsing - Eliminate redundant JSON parsing
  2. PackageInfo caching - Cache parsed package information
  3. Dependency resolution memoization - Cache resolution results

Medium Priority

  1. File system caching - Cache file reads with modification time checks
  2. Memory pooling - Reuse frequently allocated objects
  3. Streaming for large data - Process large package sets incrementally

Low Priority

  1. Compression - Compress cached data for memory efficiency
  2. Persistent caching - Cache across gazelle runs
  3. Background preloading - Preload common packages

Performance Monitoring

Add Performance Instrumentation

import \"time\"

func (di *DependencyIndex) ResolveModulesToProducts(modules []string, pkgIdentities []string) ResolutionResult {
    start := time.Now()
    defer func() {
        duration := time.Since(start)
        if duration > 5*time.Second {
            log.Printf(\"WARNING: Slow dependency resolution took %v for %d modules\", duration, len(modules))
        }
    }()
    
    // ... existing implementation
}

Acceptance Criteria

  • JSON parsing passes are minimized (ideally single pass per file)
  • Package information is cached with proper invalidation
  • Dependency resolution uses memoization for repeated operations
  • Performance monitoring identifies slow operations (>5s)
  • Memory usage doesn't grow unbounded for large projects
  • Cache hit rates are measured and reported
  • Performance improvements are benchmarked and documented

Benchmarking Plan

  1. Create performance benchmarks for critical paths
  2. Measure baseline performance before optimizations
  3. Implement optimizations incrementally with measurements
  4. Verify improvements don't break functionality
  5. Document performance characteristics for different project sizes

Expected Benefits

  • Faster gazelle runs especially for large projects
  • Reduced memory usage through efficient caching and pooling
  • Better scalability for projects with many Swift packages
  • Improved developer experience with faster BUILD file generation

Metadata

Metadata

Assignees

No one assigned

    Labels

    choreDeveloper chore or clean up

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions