Skip to content

Latest commit

Β 

History

History
1189 lines (911 loc) Β· 40.3 KB

File metadata and controls

1189 lines (911 loc) Β· 40.3 KB

Merkle Trees 🌳

Introduction

Merkle Trees (also known as Hash Trees) are a fundamental data structure in blockchain technology that enables efficient and secure verification of large data structures. Named after Ralph Merkle who patented the concept in 1979, these trees provide a way to summarize all transactions in a block with a single hash.

What is a Merkle Tree? πŸ€”

A Merkle Tree is a binary tree where:

  • Leaf nodes contain hashes of individual data items (transactions)
  • Internal nodes contain hashes of their child nodes
  • Root node contains a hash that represents the entire tree

Visual Representation

                    Root Hash
                   /         \
              H(AB)             H(CD)
             /    \            /    \
         H(A)      H(B)    H(C)      H(D)
          |         |       |         |
       Tx A      Tx B    Tx C      Tx D

Real-World Analogy πŸ“š

Think of a Merkle Tree like a library catalog system:

  • Each book (transaction) has a unique ID (hash)
  • Books are grouped into sections (leaf pairs)
  • Sections are grouped into floors (internal nodes)
  • The entire library has one master catalog number (root hash)

If any book changes, the master catalog number changes too!

How Merkle Trees Work πŸ”§

Construction Process

import hashlib
import json
from typing import List, Optional

class MerkleNode:
    def __init__(self, hash_value: str, left: Optional['MerkleNode'] = None, 
                 right: Optional['MerkleNode'] = None, data: Optional[str] = None):
        self.hash = hash_value
        self.left = left
        self.right = right
        self.data = data  # Only for leaf nodes

class MerkleTree:
    def __init__(self, transactions: List[str]):
        self.transactions = transactions
        self.root = self._build_tree()
    
    def _hash(self, data: str) -> str:
        """Create SHA-256 hash of input data"""
        return hashlib.sha256(data.encode()).hexdigest()
    
    def _build_tree(self) -> MerkleNode:
        """Build Merkle tree from transactions"""
        if not self.transactions:
            return None
        
        # Create leaf nodes
        nodes = []
        for tx in self.transactions:
            hash_value = self._hash(tx)
            leaf = MerkleNode(hash_value, data=tx)
            nodes.append(leaf)
            print(f"Leaf: {tx[:20]}... -> {hash_value[:16]}...")
        
        # Build tree bottom-up
        level = 1
        while len(nodes) > 1:
            next_level = []
            print(f"\nLevel {level}:")
            
            # Process pairs of nodes
            for i in range(0, len(nodes), 2):
                left = nodes[i]
                
                # Handle odd number of nodes (duplicate last node)
                if i + 1 < len(nodes):
                    right = nodes[i + 1]
                else:
                    right = nodes[i]  # Duplicate for odd number
                
                # Create parent node
                combined = left.hash + right.hash
                parent_hash = self._hash(combined)
                parent = MerkleNode(parent_hash, left, right)
                
                next_level.append(parent)
                print(f"  {left.hash[:16]}... + {right.hash[:16]}... -> {parent_hash[:16]}...")
            
            nodes = next_level
            level += 1
        
        return nodes[0]
    
    def get_root(self) -> str:
        """Get the root hash"""
        return self.root.hash if self.root else ""
    
    def generate_proof(self, transaction: str) -> List[dict]:
        """Generate Merkle proof for a specific transaction"""
        def find_proof(node: MerkleNode, target_hash: str, proof: List[dict]) -> bool:
            if not node:
                return False
            
            # Found the target (leaf node)
            if node.data and self._hash(node.data) == target_hash:
                return True
            
            # Search left subtree
            if node.left and find_proof(node.left, target_hash, proof):
                if node.right:
                    proof.append({
                        'hash': node.right.hash,
                        'direction': 'right'
                    })
                return True
            
            # Search right subtree  
            if node.right and find_proof(node.right, target_hash, proof):
                if node.left:
                    proof.append({
                        'hash': node.left.hash,
                        'direction': 'left'
                    })
                return True
            
            return False
        
        target_hash = self._hash(transaction)
        proof = []
        
        if find_proof(self.root, target_hash, proof):
            return proof
        else:
            return []
    
    def verify_proof(self, transaction: str, proof: List[dict], root_hash: str) -> bool:
        """Verify a Merkle proof"""
        current_hash = self._hash(transaction)
        
        for step in proof:
            if step['direction'] == 'right':
                combined = current_hash + step['hash']
            else:
                combined = step['hash'] + current_hash
            
            current_hash = self._hash(combined)
        
        return current_hash == root_hash

# Demo Merkle Tree construction
print("🌳 Building Merkle Tree")
print("=" * 50)

transactions = [
    "Alice sends 5 BTC to Bob",
    "Bob sends 2 BTC to Carol", 
    "Carol sends 1 BTC to Dave",
    "Dave sends 3 BTC to Eve"
]

merkle_tree = MerkleTree(transactions)
root_hash = merkle_tree.get_root()

print(f"\n🎯 Root Hash: {root_hash}")
print(f"πŸ“Š Tree represents {len(transactions)} transactions with single hash!")

Merkle Proof Verification

# Generate and verify proof
print("\nπŸ” Merkle Proof Verification")
print("=" * 50)

# Generate proof for a specific transaction
target_tx = "Bob sends 2 BTC to Carol"
proof = merkle_tree.generate_proof(target_tx)

print(f"Transaction: {target_tx}")
print(f"Proof steps: {len(proof)}")

for i, step in enumerate(proof):
    print(f"  Step {i+1}: {step['direction']} - {step['hash'][:16]}...")

# Verify the proof
is_valid = merkle_tree.verify_proof(target_tx, proof, root_hash)
print(f"\nβœ… Proof verification: {'Valid' if is_valid else 'Invalid'}")

# Try with tampered transaction
tampered_tx = "Bob sends 20 BTC to Carol"  # Changed amount!
is_valid_tampered = merkle_tree.verify_proof(tampered_tx, proof, root_hash)
print(f"πŸ” Tampered tx verification: {'Valid' if is_valid_tampered else 'Invalid'}")

Merkle Trees in Blockchain πŸ”—

Bitcoin Block Structure

class BitcoinBlock:
    def __init__(self, block_height: int, previous_hash: str, transactions: List[dict]):
        self.height = block_height
        self.previous_hash = previous_hash
        self.transactions = transactions
        self.merkle_tree = self._build_merkle_tree()
        self.merkle_root = self.merkle_tree.get_root()
        self.timestamp = int(time.time())
        self.nonce = 0
        self.hash = self._calculate_block_hash()
    
    def _build_merkle_tree(self) -> MerkleTree:
        """Build Merkle tree from transactions"""
        tx_strings = []
        for tx in self.transactions:
            tx_string = f"{tx['from']}:{tx['to']}:{tx['amount']}:{tx['fee']}"
            tx_strings.append(tx_string)
        
        return MerkleTree(tx_strings)
    
    def _calculate_block_hash(self) -> str:
        """Calculate block hash using header information"""
        header = f"{self.height}:{self.previous_hash}:{self.merkle_root}:{self.timestamp}:{self.nonce}"
        return hashlib.sha256(header.encode()).hexdigest()
    
    def verify_transaction_inclusion(self, tx_index: int) -> dict:
        """Verify a transaction is included in this block"""
        if tx_index >= len(self.transactions):
            return {'valid': False, 'error': 'Transaction index out of range'}
        
        tx = self.transactions[tx_index]
        tx_string = f"{tx['from']}:{tx['to']}:{tx['amount']}:{tx['fee']}"
        
        # Generate proof
        proof = self.merkle_tree.generate_proof(tx_string)
        
        # Verify proof
        is_valid = self.merkle_tree.verify_proof(tx_string, proof, self.merkle_root)
        
        return {
            'valid': is_valid,
            'transaction': tx,
            'proof': proof,
            'merkle_root': self.merkle_root
        }

# Demo Bitcoin block with Merkle tree
print("\nβ‚Ώ Bitcoin Block with Merkle Tree")
print("=" * 50)

block_transactions = [
    {'from': '1A1zP1eP...', 'to': '1BvBMSEY...', 'amount': 0.5, 'fee': 0.0001},
    {'from': '1BvBMSEY...', 'to': '1Cdid9KF...', 'amount': 0.2, 'fee': 0.0001},
    {'from': '1Cdid9KF...', 'to': '1A1zP1eP...', 'amount': 0.1, 'fee': 0.0001},
    {'from': '1FfmbHfn...', 'to': '1BvBMSEY...', 'amount': 1.0, 'fee': 0.0002},
]

bitcoin_block = BitcoinBlock(12345, "0000a1b2c3d4...", block_transactions)

print(f"Block #{bitcoin_block.height}")
print(f"Previous Hash: {bitcoin_block.previous_hash}")
print(f"Merkle Root: {bitcoin_block.merkle_root}")
print(f"Block Hash: {bitcoin_block.hash}")
print(f"Transactions: {len(bitcoin_block.transactions)}")

# Verify a transaction is in the block
verification = bitcoin_block.verify_transaction_inclusion(1)
print(f"\nTransaction verification: {'βœ… Valid' if verification['valid'] else '❌ Invalid'}")
print(f"Proof size: {len(verification['proof'])} steps")

Ethereum State Trees

class EthereumStateTree:
    """Simplified Ethereum state tree using Patricia Merkle Tree concept"""
    
    def __init__(self):
        self.accounts = {}
        self.state_root = ""
    
    def update_account(self, address: str, balance: float, nonce: int, code_hash: str = ""):
        """Update account state"""
        account_data = {
            'balance': balance,
            'nonce': nonce,
            'code_hash': code_hash or hashlib.sha256(b"").hexdigest(),
            'storage_root': hashlib.sha256(b"").hexdigest()
        }
        
        self.accounts[address] = account_data
        self._update_state_root()
    
    def _update_state_root(self):
        """Update state root hash"""
        account_hashes = []
        
        for address in sorted(self.accounts.keys()):
            account = self.accounts[address]
            account_string = f"{address}:{account['balance']}:{account['nonce']}:{account['code_hash']}:{account['storage_root']}"
            account_hash = hashlib.sha256(account_string.encode()).hexdigest()
            account_hashes.append(account_hash)
        
        # Build Merkle tree from account hashes
        if account_hashes:
            tree = MerkleTree(account_hashes)
            self.state_root = tree.get_root()
        else:
            self.state_root = hashlib.sha256(b"").hexdigest()
    
    def get_account_proof(self, address: str) -> dict:
        """Get Merkle proof for account inclusion in state"""
        if address not in self.accounts:
            return {'valid': False, 'error': 'Account not found'}
        
        account = self.accounts[address]
        account_string = f"{address}:{account['balance']}:{account['nonce']}:{account['code_hash']}:{account['storage_root']}"
        account_hash = hashlib.sha256(account_string.encode()).hexdigest()
        
        # Build tree from all account hashes
        account_hashes = []
        for addr in sorted(self.accounts.keys()):
            acc = self.accounts[addr]
            acc_string = f"{addr}:{acc['balance']}:{acc['nonce']}:{acc['code_hash']}:{acc['storage_root']}"
            acc_hash = hashlib.sha256(acc_string.encode()).hexdigest()
            account_hashes.append(acc_hash)
        
        tree = MerkleTree(account_hashes)
        proof = tree.generate_proof(account_hash)
        
        return {
            'valid': True,
            'account': account,
            'proof': proof,
            'state_root': self.state_root
        }

# Demo Ethereum state tree
print("\n⚑ Ethereum State Tree")
print("=" * 50)

eth_state = EthereumStateTree()

# Update some accounts
eth_state.update_account("0x742d35Cc6635C0532925a3b8D8Cf97E", 10.5, 42)
eth_state.update_account("0x8ba1f109551bD432803012645Hac136c", 5.2, 15)
eth_state.update_account("0x1f9840a85d5aF5bf1D1762F925BDADdC4201F984", 100.0, 7)

print(f"State Root: {eth_state.state_root}")
print(f"Accounts: {len(eth_state.accounts)}")

# Get proof for specific account
proof_result = eth_state.get_account_proof("0x742d35Cc6635C0532925a3b8D8Cf97E")
if proof_result['valid']:
    print(f"Account proof generated with {len(proof_result['proof'])} steps")
    print(f"Balance: {proof_result['account']['balance']} ETH")

Advanced Merkle Tree Variants πŸš€

1. Patricia Merkle Tree (Trie)

class TrieNode:
    def __init__(self):
        self.children = {}
        self.is_end = False
        self.value = None
        self.hash = None

class PatriciaTrie:
    """Simplified Patricia Trie for Ethereum"""
    
    def __init__(self):
        self.root = TrieNode()
    
    def insert(self, key: str, value: str):
        """Insert key-value pair"""
        node = self.root
        
        for char in key:
            if char not in node.children:
                node.children[char] = TrieNode()
            node = node.children[char]
        
        node.is_end = True
        node.value = value
        self._update_hashes()
    
    def _calculate_node_hash(self, node: TrieNode) -> str:
        """Calculate hash for a node"""
        if node.is_end:
            content = f"value:{node.value}"
        else:
            content = "branch"
        
        # Add child hashes
        child_hashes = []
        for key in sorted(node.children.keys()):
            child_hash = self._calculate_node_hash(node.children[key])
            child_hashes.append(f"{key}:{child_hash}")
        
        if child_hashes:
            content += "|" + "|".join(child_hashes)
        
        return hashlib.sha256(content.encode()).hexdigest()
    
    def _update_hashes(self):
        """Update all node hashes"""
        self.root.hash = self._calculate_node_hash(self.root)
    
    def get_root_hash(self) -> str:
        """Get root hash of the trie"""
        return self.root.hash

# Demo Patricia Trie
print("\nπŸ” Patricia Trie Demo")
print("=" * 50)

trie = PatriciaTrie()
trie.insert("balance_alice", "100")
trie.insert("balance_bob", "50")
trie.insert("balance_carol", "75")
trie.insert("nonce_alice", "42")

print(f"Patricia Trie Root Hash: {trie.get_root_hash()}")

2. Sparse Merkle Trees

class SparseMerkleTree:
    """Sparse Merkle Tree for efficient storage of sparse data"""
    
    def __init__(self, depth: int = 10):
        self.depth = depth
        self.nodes = {}  # Only store non-empty nodes
        self.default_hash = "0" * 64  # Default hash for empty nodes
    
    def _hash_pair(self, left: str, right: str) -> str:
        """Hash two nodes together"""
        return hashlib.sha256((left + right).encode()).hexdigest()
    
    def update(self, index: int, value: str):
        """Update value at specific index"""
        if index >= 2**self.depth:
            raise ValueError(f"Index {index} exceeds tree capacity")
        
        # Hash the value
        leaf_hash = hashlib.sha256(value.encode()).hexdigest()
        
        # Update path from leaf to root
        current_hash = leaf_hash
        current_index = index
        
        for level in range(self.depth):
            # Determine sibling
            sibling_index = current_index ^ 1  # Flip last bit
            
            # Get sibling hash (default if not stored)
            sibling_key = f"{level}_{sibling_index}"
            sibling_hash = self.nodes.get(sibling_key, self.default_hash)
            
            # Calculate parent hash
            if current_index % 2 == 0:  # Current is left child
                parent_hash = self._hash_pair(current_hash, sibling_hash)
            else:  # Current is right child
                parent_hash = self._hash_pair(sibling_hash, current_hash)
            
            # Store current node (only if non-default)
            if current_hash != self.default_hash:
                self.nodes[f"{level}_{current_index}"] = current_hash
            
            # Move up the tree
            current_hash = parent_hash
            current_index = current_index // 2
        
        # Store root
        self.nodes[f"{self.depth}_0"] = current_hash
    
    def get_root(self) -> str:
        """Get root hash"""
        return self.nodes.get(f"{self.depth}_0", self.default_hash)
    
    def generate_proof(self, index: int) -> List[str]:
        """Generate inclusion proof for index"""
        proof = []
        current_index = index
        
        for level in range(self.depth):
            sibling_index = current_index ^ 1
            sibling_key = f"{level}_{sibling_index}"
            sibling_hash = self.nodes.get(sibling_key, self.default_hash)
            proof.append(sibling_hash)
            current_index = current_index // 2
        
        return proof
    
    def verify_proof(self, index: int, value: str, proof: List[str]) -> bool:
        """Verify inclusion proof"""
        if len(proof) != self.depth:
            return False
        
        current_hash = hashlib.sha256(value.encode()).hexdigest()
        current_index = index
        
        for level in range(self.depth):
            sibling_hash = proof[level]
            
            if current_index % 2 == 0:  # Current is left child
                current_hash = self._hash_pair(current_hash, sibling_hash)
            else:  # Current is right child
                current_hash = self._hash_pair(sibling_hash, current_hash)
            
            current_index = current_index // 2
        
        return current_hash == self.get_root()

# Demo Sparse Merkle Tree
print("\n🌌 Sparse Merkle Tree Demo")
print("=" * 50)

smt = SparseMerkleTree(depth=8)  # Can hold 256 items

# Update sparse positions
smt.update(10, "Alice has 100 tokens")
smt.update(50, "Bob has 50 tokens")  
smt.update(200, "Carol has 75 tokens")

print(f"Sparse tree root: {smt.get_root()}")
print(f"Tree capacity: {2**8} items")
print(f"Stored nodes: {len(smt.nodes)}")

# Generate and verify proof
proof = smt.generate_proof(50)
is_valid = smt.verify_proof(50, "Bob has 50 tokens", proof)
print(f"Proof for index 50: {'βœ… Valid' if is_valid else '❌ Invalid'}")
print(f"Proof size: {len(proof)} hashes")

Optimization Techniques ⚑

1. Merkle Tree Caching

class OptimizedMerkleTree:
    def __init__(self, transactions: List[str]):
        self.transactions = transactions
        self.node_cache = {}  # Cache computed hashes
        self.root = self._build_tree_cached()
    
    def _hash_cached(self, data: str) -> str:
        """Hash with caching"""
        if data not in self.node_cache:
            self.node_cache[data] = hashlib.sha256(data.encode()).hexdigest()
        return self.node_cache[data]
    
    def _build_tree_cached(self) -> str:
        """Build tree with node caching"""
        if not self.transactions:
            return ""
        
        current_level = [self._hash_cached(tx) for tx in self.transactions]
        
        while len(current_level) > 1:
            next_level = []
            
            for i in range(0, len(current_level), 2):
                left = current_level[i]
                right = current_level[i + 1] if i + 1 < len(current_level) else current_level[i]
                
                # Cache the parent computation
                combined = left + right
                parent_hash = self._hash_cached(combined)
                next_level.append(parent_hash)
            
            current_level = next_level
        
        return current_level[0]
    
    def add_transaction(self, new_tx: str):
        """Efficiently add new transaction"""
        self.transactions.append(new_tx)
        # In a real implementation, we'd incrementally update the tree
        # For demo, we rebuild (but with caching benefit)
        self.root = self._build_tree_cached()
    
    def get_cache_stats(self) -> dict:
        """Get caching statistics"""
        return {
            'cached_nodes': len(self.node_cache),
            'cache_hit_ratio': len(self.node_cache) / (len(self.transactions) * 2) if self.transactions else 0
        }

# Demo optimized tree
print("\n⚑ Optimized Merkle Tree")
print("=" * 50)

opt_tree = OptimizedMerkleTree(transactions[:4])
stats = opt_tree.get_cache_stats()

print(f"Initial root: {opt_tree.root}")
print(f"Cached nodes: {stats['cached_nodes']}")

# Add more transactions
opt_tree.add_transaction("Eve sends 1 BTC to Frank")
opt_tree.add_transaction("Frank sends 2 BTC to Grace")

stats = opt_tree.get_cache_stats()
print(f"After additions: {stats['cached_nodes']} cached nodes")

2. Parallel Hash Computation

import concurrent.futures
import multiprocessing

class ParallelMerkleTree:
    def __init__(self, transactions: List[str]):
        self.transactions = transactions
        self.root = self._build_tree_parallel()
    
    def _hash_batch(self, data_batch: List[str]) -> List[str]:
        """Hash a batch of data items"""
        return [hashlib.sha256(item.encode()).hexdigest() for item in data_batch]
    
    def _build_tree_parallel(self) -> str:
        """Build tree using parallel processing"""
        if not self.transactions:
            return ""
        
        current_level = self.transactions[:]
        
        while len(current_level) > 1:
            # Determine optimal batch size
            num_cores = multiprocessing.cpu_count()
            batch_size = max(1, len(current_level) // num_cores)
            
            # Create batches for parallel processing
            batches = []
            for i in range(0, len(current_level), 2):
                left = current_level[i]
                right = current_level[i + 1] if i + 1 < len(current_level) else current_level[i]
                batches.append(left + right)
            
            # Process batches in parallel
            with concurrent.futures.ThreadPoolExecutor(max_workers=num_cores) as executor:
                # Split batches among workers
                batch_groups = [batches[i:i + batch_size] for i in range(0, len(batches), batch_size)]
                
                futures = []
                for batch_group in batch_groups:
                    future = executor.submit(self._hash_batch, batch_group)
                    futures.append(future)
                
                # Collect results
                next_level = []
                for future in concurrent.futures.as_completed(futures):
                    next_level.extend(future.result())
            
            current_level = next_level
        
        return current_level[0] if current_level else ""

# Note: Parallel processing demo would require actual computation load
# This is a conceptual implementation
print("\nπŸš€ Parallel Merkle Tree (Concept)")
print("=" * 50)
print("Parallel processing can significantly speed up large tree construction")
print(f"Available CPU cores: {multiprocessing.cpu_count()}")

Real-World Applications 🌍

1. Git Version Control

class GitCommitTree:
    """Simplified Git-style commit tree using Merkle concepts"""
    
    def __init__(self):
        self.commits = []
        self.file_objects = {}
    
    def add_file(self, filename: str, content: str) -> str:
        """Add file and return its hash"""
        file_hash = hashlib.sha256(content.encode()).hexdigest()
        self.file_objects[file_hash] = {
            'type': 'blob',
            'filename': filename,
            'content': content
        }
        return file_hash
    
    def create_commit(self, files: dict, message: str, parent: str = None) -> str:
        """Create a commit with file tree"""
        # Create tree object from files
        tree_entries = []
        for filename, content in files.items():
            file_hash = self.add_file(filename, content)
            tree_entries.append(f"{filename}:{file_hash}")
        
        # Sort for consistent hashing
        tree_entries.sort()
        tree_content = "|".join(tree_entries)
        tree_hash = hashlib.sha256(tree_content.encode()).hexdigest()
        
        # Store tree object
        self.file_objects[tree_hash] = {
            'type': 'tree',
            'entries': tree_entries
        }
        
        # Create commit object
        commit_content = f"tree:{tree_hash}|message:{message}"
        if parent:
            commit_content += f"|parent:{parent}"
        
        commit_hash = hashlib.sha256(commit_content.encode()).hexdigest()
        
        commit_obj = {
            'type': 'commit',
            'tree': tree_hash,
            'message': message,
            'parent': parent,
            'timestamp': int(time.time())
        }
        
        self.file_objects[commit_hash] = commit_obj
        self.commits.append(commit_hash)
        
        return commit_hash
    
    def verify_commit(self, commit_hash: str) -> bool:
        """Verify commit integrity"""
        if commit_hash not in self.file_objects:
            return False
        
        commit = self.file_objects[commit_hash]
        if commit['type'] != 'commit':
            return False
        
        # Verify tree exists
        tree_hash = commit['tree']
        if tree_hash not in self.file_objects:
            return False
        
        tree = self.file_objects[tree_hash]
        if tree['type'] != 'tree':
            return False
        
        # Verify all files in tree exist
        for entry in tree['entries']:
            filename, file_hash = entry.split(':', 1)
            if file_hash not in self.file_objects:
                return False
        
        return True

# Demo Git-style commits
print("\nπŸ“ Git-Style Commit Tree")
print("=" * 50)

git_tree = GitCommitTree()

# First commit
files1 = {
    'README.md': '# My Project\nHello world!',
    'main.py': 'print("Hello, world!")'
}
commit1 = git_tree.create_commit(files1, "Initial commit")
print(f"Commit 1: {commit1[:16]}...")

# Second commit
files2 = {
    'README.md': '# My Project\nHello world!\n\nAdded features!',
    'main.py': 'print("Hello, world!")\nprint("Version 2.0")',
    'utils.py': 'def helper():\n    return "Utility function"'
}
commit2 = git_tree.create_commit(files2, "Added new features", commit1)
print(f"Commit 2: {commit2[:16]}...")

# Verify commits
print(f"Commit 1 valid: {git_tree.verify_commit(commit1)}")
print(f"Commit 2 valid: {git_tree.verify_commit(commit2)}")
print(f"Total objects: {len(git_tree.file_objects)}")

2. Certificate Transparency

class CertificateTransparencyLog:
    """Certificate Transparency using Merkle Tree concept"""
    
    def __init__(self):
        self.certificates = []
        self.tree = None
        self.log_size = 0
    
    def add_certificate(self, cert_data: dict):
        """Add new certificate to transparency log"""
        cert_entry = {
            'domain': cert_data['domain'],
            'issuer': cert_data['issuer'],
            'timestamp': int(time.time()),
            'public_key_hash': cert_data['public_key_hash']
        }
        
        self.certificates.append(cert_entry)
        self.log_size += 1
        self._rebuild_tree()
        
        return self.log_size - 1  # Return certificate index
    
    def _rebuild_tree(self):
        """Rebuild Merkle tree with all certificates"""
        cert_strings = []
        for cert in self.certificates:
            cert_string = f"{cert['domain']}|{cert['issuer']}|{cert['timestamp']}|{cert['public_key_hash']}"
            cert_strings.append(cert_string)
        
        if cert_strings:
            self.tree = MerkleTree(cert_strings)
    
    def get_inclusion_proof(self, cert_index: int) -> dict:
        """Get proof that certificate is in the log"""
        if cert_index >= len(self.certificates):
            return {'valid': False, 'error': 'Certificate not found'}
        
        cert = self.certificates[cert_index]
        cert_string = f"{cert['domain']}|{cert['issuer']}|{cert['timestamp']}|{cert['public_key_hash']}"
        
        proof = self.tree.generate_proof(cert_string)
        
        return {
            'valid': True,
            'certificate': cert,
            'proof': proof,
            'tree_size': self.log_size,
            'root_hash': self.tree.get_root()
        }
    
    def verify_certificate_inclusion(self, cert_index: int, expected_cert: dict, proof: List[dict], root_hash: str) -> bool:
        """Verify certificate is included in specific tree state"""
        if cert_index >= len(self.certificates):
            return False
        
        cert_string = f"{expected_cert['domain']}|{expected_cert['issuer']}|{expected_cert['timestamp']}|{expected_cert['public_key_hash']}"
        
        return self.tree.verify_proof(cert_string, proof, root_hash)

# Demo Certificate Transparency
print("\nπŸ“‹ Certificate Transparency Log")
print("=" * 50)

ct_log = CertificateTransparencyLog()

# Add certificates
certs = [
    {'domain': 'example.com', 'issuer': 'Let\'s Encrypt', 'public_key_hash': 'abc123...'},
    {'domain': 'secure.com', 'issuer': 'DigiCert', 'public_key_hash': 'def456...'},
    {'domain': 'mysite.org', 'issuer': 'Comodo', 'public_key_hash': 'ghi789...'}
]

for cert in certs:
    index = ct_log.add_certificate(cert)
    print(f"Added certificate for {cert['domain']} at index {index}")

# Get inclusion proof
proof_result = ct_log.get_inclusion_proof(1)
if proof_result['valid']:
    print(f"\nInclusion proof for {proof_result['certificate']['domain']}:")
    print(f"Proof steps: {len(proof_result['proof'])}")
    print(f"Tree size: {proof_result['tree_size']}")
    print(f"Root hash: {proof_result['root_hash'][:16]}...")

Performance Analysis πŸ“Š

Time Complexity

def analyze_merkle_performance():
    """Analyze Merkle tree performance characteristics"""
    
    print("πŸ“Š Merkle Tree Performance Analysis")
    print("=" * 50)
    
    # Test different tree sizes
    sizes = [10, 100, 1000, 10000]
    
    for size in sizes:
        # Generate test data
        test_data = [f"Transaction {i}" for i in range(size)]
        
        # Measure construction time
        start_time = time.time()
        tree = MerkleTree(test_data)
        construction_time = time.time() - start_time
        
        # Measure proof generation time
        start_time = time.time()
        proof = tree.generate_proof(test_data[size // 2])
        proof_time = time.time() - start_time
        
        # Calculate metrics
        tree_height = math.ceil(math.log2(size)) if size > 0 else 0
        proof_size = len(proof)
        
        print(f"\nSize: {size:,} transactions")
        print(f"  Tree height: {tree_height}")
        print(f"  Construction time: {construction_time:.4f}s")
        print(f"  Proof generation: {proof_time:.6f}s") 
        print(f"  Proof size: {proof_size} hashes")
        print(f"  Verification complexity: O(log n) = O({tree_height})")

import time
import math

analyze_merkle_performance()

Space Efficiency

def analyze_space_efficiency():
    """Analyze space efficiency of Merkle trees"""
    
    print("\nπŸ’Ύ Space Efficiency Analysis")
    print("=" * 50)
    
    # Compare different approaches
    transaction_sizes = [100, 1000, 10000]
    
    for tx_count in transaction_sizes:
        # Assume each transaction is 250 bytes (typical Bitcoin transaction)
        tx_size = 250
        hash_size = 32  # SHA-256 hash size
        
        # Original data size
        original_size = tx_count * tx_size
        
        # Merkle tree size (all internal nodes)
        tree_nodes = tx_count - 1  # n-1 internal nodes for n leaves
        tree_size = tree_nodes * hash_size
        
        # Root hash size
        root_size = hash_size
        
        # Proof size for verification
        proof_height = math.ceil(math.log2(tx_count))
        proof_size = proof_height * hash_size
        
        print(f"\n{tx_count:,} transactions:")
        print(f"  Original data: {original_size:,} bytes ({original_size/1024:.1f} KB)")
        print(f"  Tree overhead: {tree_size:,} bytes ({tree_size/1024:.1f} KB)")
        print(f"  Root hash: {root_size} bytes")
        print(f"  Verification proof: {proof_size} bytes")
        print(f"  Compression ratio: {original_size/root_size:,.0f}:1")
        
        # Bandwidth savings for light clients
        full_download = original_size
        light_client = proof_size + tx_size  # Only need proof + transaction
        bandwidth_saving = (1 - light_client/full_download) * 100
        
        print(f"  Light client bandwidth saving: {bandwidth_saving:.1f}%")

analyze_space_efficiency()

Security Considerations πŸ”’

1. Second Pre-image Attacks

def demonstrate_second_preimage_resistance():
    """Show why Merkle trees resist second pre-image attacks"""
    
    print("πŸ›‘οΈ Second Pre-image Resistance")
    print("=" * 50)
    
    # Original transactions
    original_txs = [
        "Alice sends 5 BTC to Bob",
        "Bob sends 2 BTC to Carol"
    ]
    
    # Build original tree
    original_tree = MerkleTree(original_txs)
    original_root = original_tree.get_root()
    
    print(f"Original root: {original_root[:16]}...")
    
    # Attacker tries to find different transactions with same root
    print("\nπŸ” Attacker attempting to find collision...")
    
    # This would require breaking SHA-256 (computationally infeasible)
    attempts = 0
    max_attempts = 1000
    
    for i in range(max_attempts):
        # Generate random alternative transactions
        fake_txs = [
            f"Alice sends {5+i} BTC to Bob",
            f"Bob sends {2+i} BTC to Carol"
        ]
        
        fake_tree = MerkleTree(fake_txs)
        fake_root = fake_tree.get_root()
        
        attempts += 1
        
        if fake_root == original_root:
            print(f"❌ COLLISION FOUND after {attempts} attempts!")
            print(f"   Fake transactions: {fake_txs}")
            break
    else:
        print(f"βœ… No collision found after {attempts:,} attempts")
        print("   Merkle tree provides strong collision resistance")

demonstrate_second_preimage_resistance()

2. Proof Verification Security

def secure_proof_verification():
    """Demonstrate secure proof verification practices"""
    
    print("\nπŸ” Secure Proof Verification")
    print("=" * 50)
    
    class SecureMerkleVerifier:
        def __init__(self, trusted_root: str):
            self.trusted_root = trusted_root
        
        def verify_transaction(self, transaction: str, proof: List[dict]) -> dict:
            """Securely verify transaction with comprehensive checks"""
            result = {
                'valid': False,
                'transaction': transaction,
                'errors': []
            }
            
            try:
                # 1. Validate proof structure
                if not isinstance(proof, list):
                    result['errors'].append("Proof must be a list")
                    return result
                
                for i, step in enumerate(proof):
                    if not isinstance(step, dict):
                        result['errors'].append(f"Proof step {i} must be a dictionary")
                        return result
                    
                    if 'hash' not in step or 'direction' not in step:
                        result['errors'].append(f"Proof step {i} missing required fields")
                        return result
                    
                    if step['direction'] not in ['left', 'right']:
                        result['errors'].append(f"Invalid direction in step {i}")
                        return result
                    
                    # Validate hash format
                    hash_value = step['hash']
                    if not isinstance(hash_value, str) or len(hash_value) != 64:
                        result['errors'].append(f"Invalid hash format in step {i}")
                        return result
                    
                    try:
                        int(hash_value, 16)  # Verify it's hex
                    except ValueError:
                        result['errors'].append(f"Hash in step {i} is not valid hex")
                        return result
                
                # 2. Verify proof depth is reasonable
                if len(proof) > 64:  # Sanity check for massive trees
                    result['errors'].append("Proof depth exceeds reasonable limit")
                    return result
                
                # 3. Compute root hash step by step
                current_hash = hashlib.sha256(transaction.encode()).hexdigest()
                
                for step in proof:
                    if step['direction'] == 'right':
                        combined = current_hash + step['hash']
                    else:
                        combined = step['hash'] + current_hash
                    
                    current_hash = hashlib.sha256(combined.encode()).hexdigest()
                
                # 4. Compare with trusted root
                if current_hash == self.trusted_root:
                    result['valid'] = True
                    result['computed_root'] = current_hash
                else:
                    result['errors'].append("Computed root does not match trusted root")
                    result['computed_root'] = current_hash
                    result['expected_root'] = self.trusted_root
            
            except Exception as e:
                result['errors'].append(f"Verification error: {str(e)}")
            
            return result
    
    # Demo secure verification
    test_txs = ["Alice sends 5 BTC to Bob", "Bob sends 2 BTC to Carol"]
    tree = MerkleTree(test_txs)
    trusted_root = tree.get_root()
    
    verifier = SecureMerkleVerifier(trusted_root)
    
    # Test valid proof
    valid_proof = tree.generate_proof(test_txs[0])
    result = verifier.verify_transaction(test_txs[0], valid_proof)
    print(f"Valid proof result: {'βœ… Valid' if result['valid'] else '❌ Invalid'}")
    
    # Test invalid proof
    invalid_proof = [{'hash': 'invalid_hash', 'direction': 'right'}]
    result = verifier.verify_transaction(test_txs[0], invalid_proof)
    print(f"Invalid proof result: {'βœ… Valid' if result['valid'] else '❌ Invalid'}")
    if result['errors']:
        print(f"Errors: {result['errors']}")

secure_proof_verification()

Quiz: Test Your Knowledge πŸ“

  1. What is the main advantage of Merkle trees in blockchain?

    • Efficient verification of large datasets with a single hash
  2. How many hashes are needed to verify inclusion in a Merkle tree of 1000 items?

    • Approximately logβ‚‚(1000) β‰ˆ 10 hashes
  3. What happens to the root hash if you change one leaf node?

    • The root hash completely changes due to the avalanche effect
  4. Why are Merkle trees better than simply hashing all transactions together?

    • Allow verification of individual transactions without downloading all data
  5. What is a Sparse Merkle Tree optimized for?

    • Efficiently storing data where most positions are empty

Summary 🎯

Merkle Trees are the backbone of efficient blockchain verification, providing:

Key Benefits

  • Scalable Verification: O(log n) proof size instead of O(n)
  • Data Integrity: Any change is immediately detectable
  • Bandwidth Efficiency: Light clients can verify without full data
  • Storage Optimization: Single hash represents entire dataset

Core Concepts

  1. Binary Tree Structure: Each internal node is hash of children
  2. Root Hash: Single hash representing entire tree
  3. Merkle Proofs: Logarithmic-size inclusion proofs
  4. Avalanche Effect: Small changes cascade to root

Applications

  • Bitcoin: Transaction verification in blocks
  • Ethereum: State trees and transaction trees
  • IPFS: Content addressing and verification
  • Git: Version control and file integrity
  • Certificate Transparency: Public audit logs

Variants

  • Patricia Trie: Optimized for key-value storage (Ethereum)
  • Sparse Merkle Trees: Efficient for sparse datasets
  • Binary Merkle Trees: Standard blockchain implementation

Next Steps


🌳 Merkle Trees: Where mathematics meets efficiency - enabling blockchain to scale from concept to global infrastructure.