Skip to content

p81sunshine/rulesd

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

RSD: Rule-based Speculative Decoding for Code Generation

A rule-based speculative decoding framework that accelerates code generation by predicting common patterns like indentation, brackets, and code structure without requiring a separate draft model.

🚀 Overview

RSD (Rule-based Speculative Decoding) is an innovative approach to accelerate large language model inference for code generation tasks. Unlike traditional speculative decoding that requires a separate draft model, RSD uses rule-based heuristics to predict common code patterns, particularly focusing on Python code generation with Llama-3 models.

✨ Key Features

  • Rule-based Draft Generation: Predicts common code patterns without requiring a separate draft model
  • Python Code Optimization: Specifically designed for Python code generation with intelligent indentation prediction
  • Llama-3 Compatibility: Optimized for Llama-3 tokenization and code generation patterns
  • KV Cache Support: Full support for KV cache to accelerate inference
  • Visual Token Analysis: Interactive notebook for analyzing token patterns and rule effectiveness
  • Performance Monitoring: Real-time performance metrics and acceptance rate tracking

🏗️ Architecture

Core Components

  1. SpeculativeDecoding Class: Main implementation with both rule-based and traditional speculative decoding
  2. Rule-based Draft Generator: Analyzes code context to predict indentation and common patterns
  3. Token Visualization: Interactive tools for analyzing token patterns and rule effectiveness
  4. Performance Metrics: Comprehensive benchmarking and analysis tools

Rule-based Prediction Strategy

The framework implements several rule-based predictions:

  • Indentation Prediction: After a colon (:) in Python code, predicts the appropriate indentation level
  • Space Token Optimization: Pre-encodes common space patterns for faster lookup
  • Context Analysis: Analyzes current code structure to determine appropriate next tokens

📦 Installation

# Clone the repository
git clone <repository-url>
cd rsd

# Install dependencies
pip install torch transformers human-eval matplotlib ipywidgets

🚀 Quick Start

Basic Usage

from main import SpeculativeDecoding

# Initialize the decoder
decoder = SpeculativeDecoding(
    small_model_name="",  # Not used in rule-based mode
    large_model_name="/path/to/llama-3-8b-instruct",
    gamma=4,
    device="cuda",
    use_rule_based_only=True  # Enable rule-based mode
)

# Generate code
result = decoder.generate_text(
    prompt="def fibonacci(n):",
    max_length=512,
    temperature=0,
    use_speculative=True
)

print(f"Generated {result['total_tokens']} tokens in {result['elapsed_time']:.2f}s")
print(f"Speed: {result['tokens_per_second']:.2f} tokens/s")

Running the Demo

python main.py --use_speculative

This will run the framework on HumanEval dataset examples and provide performance comparisons.

📊 Performance Analysis

Token Visualization

Use the included Jupyter notebook (find_token.ipynb) to analyze token patterns:

from find_token import colorize_tokens

# Visualize token patterns
colorize_tokens(tokenizer, generated_tokens, max_colors=20)

This provides interactive visualization of:

  • Token boundaries and patterns
  • Space and indentation tokens
  • Code structure analysis
  • Rule effectiveness evaluation

Performance Metrics

The framework tracks several key metrics:

  • Tokens per Second: Generation speed
  • Latency: End-to-end generation time

🎯 Rule-based Prediction Details

Indentation Rules

The framework implements sophisticated indentation prediction:

  1. Context Analysis: Analyzes the current code structure and indentation level
  2. Colon Detection: Identifies when a colon (:) indicates the need for indentation
  3. Indentation Calculation: Computes the appropriate indentation level based on context
  4. Token Generation: Generates the correct number of space tokens

🔬 Research Applications

This framework is particularly useful for:

  • Code Generation Research: Analyzing token patterns in code generation
  • Speculative Decoding Studies: Comparing rule-based vs. model-based approaches
  • Performance Optimization: Identifying bottlenecks in code generation
  • Tokenization Analysis: Understanding how different models tokenize code

📈 Benchmarks

HumanEval Dataset Performance

We evaluated the framework on 10 HumanEval examples using the current indentation-only rule implementation:

Performance Results

  • Standard Decoding: 37.6652 tokens/s
  • Rule-based Speculation: 38.6796 tokens/s

Visualization Results

The framework provides real-time visualization of rule-based predictions during generation:

  • Green Text: Since indentation itself cannot be directly highlighted, tokens shown in green indicate that the indentation prediction was accepted by the model. The green token is the first token following the successfully predicted indentation.
  • Normal Text: Standard generation without rule-based prediction

Rule-based Prediction Visualization

🚀 Future Development

We plan to extend the framework with more sophisticated rules including bracket prediction, common code patterns, and multi-language support. The goal is to achieve 1.2-1.5x speedup with comprehensive rule implementations.

🤝 Contributing

We welcome contributions! You can help by:

  • Implementing new rules for code patterns
  • Adding support for more programming languages
  • Improving performance and evaluation tools
  • Contributing to documentation and examples

Feel free to fork the repository and submit pull requests. Together we can build a comprehensive rule-based speculative decoding framework!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors