This guide explains Cicada's performance characteristics, optimization strategies, and best practices for large codebases.
| Operation | Small Project (<1000 files) | Large Project (>1000 files) | Very Large (>10,000 files) |
|---|---|---|---|
| Initial Index | 5-10 seconds | 30-60 seconds | 2-5 minutes |
| Incremental Index | <5 seconds | 5-15 seconds | 15-30 seconds |
| Query Response | <1 second | <1 second | <2 seconds |
| Disk Space | 5-10 MB | 10-50 MB | 50-200 MB |
First-time indexing parses all files in your project:
cd /path/to/project
cicada indexPerformance by Language:
| Language | Approach | Speed | Notes |
|---|---|---|---|
| Elixir | Tree-sitter | Fast | ~200-300 files/sec |
| Erlang | Tree-sitter | Fast | ~200-300 files/sec |
| Python | SCIP | Medium | Requires scip-python external tool |
| TypeScript | SCIP | Medium | Requires scip-typescript |
| Go | SCIP | Medium | Requires scip-go |
| Rust | SCIP | Medium | Requires rust-analyzer |
| Other SCIP | SCIP | Medium | External indexer required |
Factors Affecting Speed:
- File count and size
- Language backend (tree-sitter vs SCIP)
- External tool availability
- CPU and disk I/O
Subsequent indexes only process changed files:
cicada index # Auto-detects changes via SHA-256 hashingHow it Works:
- Hash each file (SHA-256)
- Compare against
hashes.json - Re-index only modified files
- Merge with existing index
Typical Performance: <5 seconds for small changes
Watch Mode: Automatically reindex on file changes:
cicada watchCicada queries are optimized for AI agent workflows:
| Query Type | Typical Response | Token Count |
|---|---|---|
search_module |
<500ms | 200-600 tokens |
search_function |
<500ms | 300-800 tokens |
query (keyword) |
<1s | 150-500 tokens |
git_history |
1-2s | 300-1200 tokens |
All queries use in-memory index for instant access:
- Module lookups: O(1) hash table
- Function searches: O(n) with early termination
- Keyword matching: O(n) with pre-computed scores
MCP tool responses consume tokens in AI conversations. Cicada provides controls to optimize token usage:
Compact Mode (Default):
# Minimal output - identifiers and file:line only
search_module("MyApp.User", verbose=False)
# ~200-400 tokensVerbose Mode:
# Full documentation, specs, examples
search_module("MyApp.User", verbose=True)
# ~600-1200 tokensBrief Format (Recommended for AI agents):
git_history("lib/auth.ex", include_pr_description=False)Output:
## History for lib/auth.ex (5 commits)
- 589000d (2025-11-25) Optimize password hashing (#175)
- 0885638 (2025-11-24) Add 2FA support (#178)
- e68df27 (2025-11-23) Refactor session management (#160)Token count: ~120 tokens
Standard Format:
git_history("lib/auth.ex", include_pr_description=True)Includes:
- PR titles and URLs
- Authors
- PR descriptions (truncated)
Token count: ~400-800 tokens
Verbose Format:
git_history(
"lib/auth.ex",
include_pr_description=True,
include_review_comments=True
)Includes everything plus review comments.
Token count: ~1200-2000 tokens
For large result sets, use pagination to control token usage:
# First 10 results
search_function("create*", head_limit=10)
# Next 10 results
search_function("create*", head_limit=10, offset=10)-
Start with compact output
# Query first query("authentication") # 150-300 tokens # Then get details for specific matches search_function("authenticate/2") # 300-500 tokens
-
Use filters to narrow results
# Instead of search_function("*") # Returns hundreds of functions # Do search_function("*", glob="lib/auth/**") # Focused results
-
Disable unused features
# If you don't need usage examples search_function("authenticate", include_usage_examples=False)
-
Request specific information
# Instead of git_history("lib/auth.ex") # Full history # Do git_history("lib/auth.ex", recent=True, max_results=5) # Last 5 recent changes
All indexes stored in ~/.cicada/projects/<repo_hash>/:
~/.cicada/projects/<hash>/
├── index.json # 2-50 MB (main code index)
├── config.yaml # <1 KB (project config)
├── hashes.json # 100-500 KB (file tracking)
├── pr_index.json # 500 KB-5 MB (PR attribution)
└── index.scip # 5-50 MB (SCIP languages only)
Total Disk Usage per Project:
- Small project: 5-10 MB
- Medium project: 10-50 MB
- Large project: 50-200 MB
Remove stale indexes:
# Remove specific project index
cicada clean # In project directory
# Or manually
rm -rf ~/.cicada/projects/<hash>/-
Exclude unnecessary directories
Edit
~/.cicada/projects/<hash>/config.yaml:exclude_patterns: - "deps/*" - "build/*" - "_build/*" - "node_modules/*" - "vendor/*" - "*.gen.ex" # Generated files
-
Use file extension filters
file_extensions: - ".ex" - ".exs" # Don't index test files if not needed # - ".exs"
-
Disable PR indexing (if not needed)
# Skip PR indexing during cicada index # PR indexing requires gh CLI and takes extra time
Python Optimization: The SCIP converter was optimized from O(n²) to O(n):
- Before: 60 seconds for large projects
- After: <5 seconds for same projects
- 71x faster with recent optimizations
TypeScript Optimization:
Use project-specific tsconfig.json to exclude unnecessary files:
{
"exclude": [
"node_modules",
"dist",
"build"
]
}Check index health:
cicada statsOutput:
Project: /path/to/project
Language: elixir
Total modules: 234
Total functions: 1,842
Index size: 12.4 MB
Last indexed: 2025-01-05 10:23:14
Files tracked: 547
Use the benchmarking script to measure tool call frequency:
# Test specific prompts
python tests/benchmark/benchmark_mcp_tool_calls.py \
--prompt "Show me the User module"
# Run full test suite
python tests/benchmark/benchmark_mcp_tool_calls.py --test-suiteMetrics Tracked:
- Total tool calls per query
- Tool call breakdown
- Response duration
- Token usage estimation
Pros:
- Fast incremental parsing
- No external dependencies
- Real-time indexing (<5s)
Cons:
- Syntactic analysis only
- Manual AST traversal
- Per-language custom extractors
Best for: Languages without SCIP indexers, rapid prototyping
Pros:
- Compiler-accurate results
- Shared converter infrastructure (3x faster to add new languages)
- Type-aware analysis
- Cross-repository references
Cons:
- Requires external indexer
- Slower initial indexing
- Must rebuild on changes (no incremental)
Best for: Mainstream languages with mature tooling
Symptoms: First index takes >5 minutes
Solutions:
- Check excluded patterns - may be indexing unnecessary files
- For SCIP languages, ensure external indexer is optimized
- Check disk I/O - slow drive can impact performance
- Reduce file count by excluding test directories
Symptoms: index.json >100 MB
Solutions:
- Review exclude patterns - likely indexing generated code
- Check for large docstrings or comments
- Consider splitting into multiple smaller projects
- Use
cicada cleanand reindex with tighter exclusions
Symptoms: Queries take >2 seconds
Solutions:
- Check index file size - may need cleanup
- Use more specific queries (avoid wildcards like
*) - Add filters:
glob,path,recent - Reduce
head_limitfor initial exploration
Symptoms: Indexing crashes with memory error
Solutions:
- Exclude large generated files
- Process in batches (not currently supported - file issue)
- Increase available RAM
- Use SCIP for large codebases (more memory-efficient)
| Task | Cicada (Compact) | Cicada (Verbose) | Raw Git Commands |
|---|---|---|---|
| File history (5 PRs) | 300 tokens | 1200 tokens | 800 tokens |
| Module search | 250 tokens | 800 tokens | N/A (no equivalent) |
| Function calls | 400 tokens | 1000 tokens | N/A (no equivalent) |
Key Insight: Compact mode optimizes for AI agents while verbose mode provides human-readable context.
Adding new language support:
| Approach | Effort | Accuracy | Maintenance |
|---|---|---|---|
| Tree-sitter | 12-17 days | Syntactic | Per-language custom code |
| SCIP | 4-6 days | Semantic | Shared converter |
SCIP is 3x faster for languages with existing indexers.
- Architecture - Why Cicada uses both tree-sitter and SCIP
- MCP Tools Reference - Available query tools and parameters
- Installation - Setup and configuration