RSD: Rule-based Speculative Decoding for Code Generation

A rule-based speculative decoding framework that accelerates code generation by predicting common patterns like indentation, brackets, and code structure without requiring a separate draft model.

🚀 Overview

RSD (Rule-based Speculative Decoding) is an innovative approach to accelerate large language model inference for code generation tasks. Unlike traditional speculative decoding that requires a separate draft model, RSD uses rule-based heuristics to predict common code patterns, particularly focusing on Python code generation with Llama-3 models.

✨ Key Features

Rule-based Draft Generation: Predicts common code patterns without requiring a separate draft model
Python Code Optimization: Specifically designed for Python code generation with intelligent indentation prediction
Llama-3 Compatibility: Optimized for Llama-3 tokenization and code generation patterns
KV Cache Support: Full support for KV cache to accelerate inference
Visual Token Analysis: Interactive notebook for analyzing token patterns and rule effectiveness
Performance Monitoring: Real-time performance metrics and acceptance rate tracking

🏗️ Architecture

Core Components

SpeculativeDecoding Class: Main implementation with both rule-based and traditional speculative decoding
Rule-based Draft Generator: Analyzes code context to predict indentation and common patterns
Token Visualization: Interactive tools for analyzing token patterns and rule effectiveness
Performance Metrics: Comprehensive benchmarking and analysis tools

Rule-based Prediction Strategy

The framework implements several rule-based predictions:

Indentation Prediction: After a colon (:) in Python code, predicts the appropriate indentation level
Space Token Optimization: Pre-encodes common space patterns for faster lookup
Context Analysis: Analyzes current code structure to determine appropriate next tokens

📦 Installation

# Clone the repository
git clone <repository-url>
cd rsd

# Install dependencies
pip install torch transformers human-eval matplotlib ipywidgets

🚀 Quick Start

Basic Usage

from main import SpeculativeDecoding

# Initialize the decoder
decoder = SpeculativeDecoding(
    small_model_name="",  # Not used in rule-based mode
    large_model_name="/path/to/llama-3-8b-instruct",
    gamma=4,
    device="cuda",
    use_rule_based_only=True  # Enable rule-based mode
)

# Generate code
result = decoder.generate_text(
    prompt="def fibonacci(n):",
    max_length=512,
    temperature=0,
    use_speculative=True
)

print(f"Generated {result['total_tokens']} tokens in {result['elapsed_time']:.2f}s")
print(f"Speed: {result['tokens_per_second']:.2f} tokens/s")

Running the Demo

python main.py --use_speculative

This will run the framework on HumanEval dataset examples and provide performance comparisons.

📊 Performance Analysis

Token Visualization

Use the included Jupyter notebook (find_token.ipynb) to analyze token patterns:

from find_token import colorize_tokens

# Visualize token patterns
colorize_tokens(tokenizer, generated_tokens, max_colors=20)

This provides interactive visualization of:

Token boundaries and patterns
Space and indentation tokens
Code structure analysis
Rule effectiveness evaluation

Performance Metrics

The framework tracks several key metrics:

Tokens per Second: Generation speed
Latency: End-to-end generation time

🎯 Rule-based Prediction Details

Indentation Rules

The framework implements sophisticated indentation prediction:

Context Analysis: Analyzes the current code structure and indentation level
Colon Detection: Identifies when a colon (:) indicates the need for indentation
Indentation Calculation: Computes the appropriate indentation level based on context
Token Generation: Generates the correct number of space tokens

🔬 Research Applications

This framework is particularly useful for:

Code Generation Research: Analyzing token patterns in code generation
Speculative Decoding Studies: Comparing rule-based vs. model-based approaches
Performance Optimization: Identifying bottlenecks in code generation
Tokenization Analysis: Understanding how different models tokenize code

📈 Benchmarks

HumanEval Dataset Performance

We evaluated the framework on 10 HumanEval examples using the current indentation-only rule implementation:

Performance Results

Standard Decoding: 37.6652 tokens/s
Rule-based Speculation: 38.6796 tokens/s

Visualization Results

The framework provides real-time visualization of rule-based predictions during generation:

Green Text: Since indentation itself cannot be directly highlighted, tokens shown in green indicate that the indentation prediction was accepted by the model. The green token is the first token following the successfully predicted indentation.
Normal Text: Standard generation without rule-based prediction

🚀 Future Development

We plan to extend the framework with more sophisticated rules including bracket prediction, common code patterns, and multi-language support. The goal is to achieve 1.2-1.5x speedup with comprehensive rule implementations.

🤝 Contributing

We welcome contributions! You can help by:

Implementing new rules for code patterns
Adding support for more programming languages
Improving performance and evaluation tools
Contributing to documentation and examples

Feel free to fork the repository and submit pull requests. Together we can build a comprehensive rule-based speculative decoding framework!

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
find_token.ipynb		find_token.ipynb
main.py		main.py
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RSD: Rule-based Speculative Decoding for Code Generation

🚀 Overview

✨ Key Features

🏗️ Architecture

Core Components

Rule-based Prediction Strategy

📦 Installation

🚀 Quick Start

Basic Usage

Running the Demo

📊 Performance Analysis

Token Visualization

Performance Metrics

🎯 Rule-based Prediction Details

Indentation Rules

🔬 Research Applications

📈 Benchmarks

HumanEval Dataset Performance

Performance Results

Visualization Results

🚀 Future Development

🤝 Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RSD: Rule-based Speculative Decoding for Code Generation

🚀 Overview

✨ Key Features

🏗️ Architecture

Core Components

Rule-based Prediction Strategy

📦 Installation

🚀 Quick Start

Basic Usage

Running the Demo

📊 Performance Analysis

Token Visualization

Performance Metrics

🎯 Rule-based Prediction Details

Indentation Rules

🔬 Research Applications

📈 Benchmarks

HumanEval Dataset Performance

Performance Results

Visualization Results

🚀 Future Development

🤝 Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages