Maps the main paper tables, figures, and inline claims to the scripts that
produced them and the raw data files that back them. The raw artifacts listed
here are bundled with the repository; some scripts are turnkey, while others
still require explicit GGUF model paths, cached Hugging Face weights, or a
local llama.cpp checkout at the tested commit. Supplementary local artifacts
released with the repository are listed separately below. Supplementary
JSON/log files use sanitized placeholder runner/model paths where provenance
fields were preserved from the original capture environment.
LaTeX table/figure numbers follow source order (all floats use [ht] or [t]).
| # | Label | Description | Script | Data |
|---|---|---|---|---|
| 1 | thresh_ablation |
Threshold formula ablation (Qwen2.5-0.5B real K) | scripts/analysis/f4_threshold_ablation.py |
results/f4_thresh_ablation_results.json |
| 2 | ppl_crossmodel |
PPL across 7+2 models, 3 configs | scripts/eval/run_ppl.sh |
results/results_all.json, results/ppl/*/ |
| 3 | niah |
NIAH retrieval 45/45 (3 models) | scripts/eval/f5_niah.py |
results/logs/final_tests_full.log |
| 4 | longctx |
Long context PPL (4K/8K/16K) | scripts/eval/run_ppl.sh |
results/long_context_results.json |
| 5 | sink |
Sink token ablation (Mistral-7B) | scripts/eval/run_ppl.sh |
results/logs/final_tests_full.log |
| 6 | throughput |
RTX 5090 throughput (llama-bench) | scripts/eval/throughput_bench.sh |
results/throughput/bench_*.md |
| 7 | longbench_overall |
LongBench-v2 overall accuracy (503 ex, 3 seeds) | scripts/eval/longbench_v2_eval.py |
results/longbench/*/results_all.json |
| 8 | longbench_domain |
Qwen2.5-7B per-domain LongBench-v2 | scripts/eval/longbench_v2_eval.py |
results/longbench/qwen25_seed*/ |
| 9 | divergence |
PPL degradation vs task accuracy | scripts/analysis/f1_ppl_analysis.py |
results/results_sweet.json, results/longbench/ |
| 10 | entropy |
Attention entropy under simulated EKV | scripts/analysis/simulate_attention_haze.py |
results/attention_entropy_v2_results.json |
| 11 | ablation_random |
EKV vs random zeroing (matched sparsity) | scripts/analysis/f4_threshold_ablation.py |
results/ablation_random_vs_ekv_results.json |
| 12 | paired_chunks |
Paired per-chunk analysis (†F1 models) | scripts/analysis/f1_deep_analysis.py |
results/ppl/*/, results/logs/final_tests_full.log |
| 13 | spr_asi |
SPR and ASI across architectures | scripts/analysis/f2_spr_real_kv.py |
results/f2_spr_real_kv_results.json, paper Table 13 |
| # | Label | Description | Script | File |
|---|---|---|---|---|
| 1 | sparsity_curve |
Sparsity–quality trade-off (Llama-3.1-8B) | scripts/plot/gen_fig65_sparsity_curve.py |
paper/figures/fig65_sparsity_curve.pdf |
| 2 | architecture |
ElasticKV pipeline diagram | scripts/plot/generate_paper_figures.py |
paper/figures/fig0_architecture.pdf |
| 3 | f1_kv_regression |
KV heads vs PPL regression (7 models) | scripts/analysis/f1_ppl_analysis.py |
paper/figures/f1_kv_heads_vs_ppl.png |
| 4 | three_seed |
Three-seed LongBench reproducibility | scripts/plot/generate_paper_figures.py |
paper/figures/fig25_three_seed.pdf |
| 5 | kv_head_dependence |
KV head count moderation effect | scripts/plot/generate_paper_figures.py |
paper/figures/fig35_kv_head_dependence.pdf |
| 6 | attention_haze |
Attention haze visualization (64-token) | scripts/plot/generate_paper_figures.py |
paper/figures/fig15_attention_haze.pdf |
| 7 | entropy_sparsity |
Entropy change vs sparsity (EKV + random) | scripts/plot/generate_paper_figures.py |
paper/figures/fig45_entropy_vs_sparsity.pdf |
| 8 | precision_recall |
Per-domain accuracy change (Qwen2.5-7B) | scripts/plot/generate_paper_figures.py |
paper/figures/fig55_precision_recall.pdf |
| Claim | Location | Script | Data |
|---|---|---|---|
| Compression 1.48× (Mistral sweet) | Abstract, §4.1 | scripts/eval/run_ppl.sh |
results/results_sweet.json |
| <0.31% PPL degradation (GQA) | Abstract, §4.1 | scripts/eval/run_ppl.sh |
results/results_all.json |
| 3,400× ablation divergence | Abstract, §5.3 | scripts/analysis/f4_threshold_ablation.py |
results/ablation_random_vs_ekv_results.json |
| SPR_cond = 54.6% (real KV) | §5.2 | scripts/analysis/f2_spr_real_kv.py |
results/f2_spr_real_kv_results.json |
EKV removes 36.6% of elements but only 0.53% of L2 at s=0.45 |
§4.3, §5.2 | scripts/analysis/f4b_tau_band_analysis.py |
results/supplementary/local_baselines/attention_haze_tau_bands_qwen05b.json |
H2O/ScissorHands remove 46.3% / 75.2% of L2 under the same τ_h frame |
§4.3, §5.2 | scripts/analysis/f4c_h2o_scissor_eviction_analysis.py |
results/supplementary/local_baselines/attention_h2o_scissor_evictions_qwen05b.json |
| zlib 1.52× / lz4 1.23× | §7 | scripts/analysis/f7_compression_proof.py |
results/f7_compression_proof.json |
| +0.8pp Qwen2.5-7B regularization | §4.4, §5.3 | scripts/eval/longbench_v2_eval.py |
results/longbench/qwen25_seed*/ |
| ΔPP regression R²=0.53 | §4.1, §5.1 | scripts/analysis/f1_ppl_analysis.py |
results/results_all.json |
| 0.4% decode overhead | §4.3 | scripts/eval/throughput_bench.sh |
results/throughput/bench_*.md |
| Paired t>100 (chunk-level) | §5.1 | scripts/analysis/f1_deep_analysis.py |
results/ppl/*/ |
| Artifact | Description | Script or patch | Data |
|---|---|---|---|
local_baselines |
Same-run local H2O, ScissorHands, and EKV comparison on GTX 1070 (llama-perplexity, flash_attn off, c=256, b=256, ub=32, 8 chunks) |
cuda_hook/experimental/llama-context_attention_baselines.cpp |
results/supplementary/local_baselines/h2o_wsl_compatible_models_comparison.json, results/supplementary/local_baselines/logs/ |
h2o_smoke |
Short smoke validation showing score capture and heavy+recent eviction in the local H2O path | cuda_hook/experimental/llama-context_attention_baselines.cpp |
results/supplementary/local_baselines/h2o_smoke_qwen05b_comparison.json |
tau_band_qwen05b |
Real-KV pre-threshold ` | K | /τ_h` band analysis showing that EKV's removed region is high-density but low-energy on Qwen2.5-0.5B |
baseline_eviction_geometry |
Real-KV replay that projects H2O and ScissorHands evictions onto the same ` | K | /τ_h` bands for mechanistic comparison with EKV |
gtx1070_llama_bench |
Local GTX 1070 throughput sweep for evict_every sensitivity and sweet/aggressive decode trade-offs |
scripts/eval/run_gtx1070_llama_bench.sh |
results/supplementary/throughput_gtx1070_llama_bench/ |
| File | Format | Description |
|---|---|---|
results_all.json |
JSON | All PPL results across models and configs (Mistral primary) |
results_baseline.json |
JSON | Baseline (no EKV) PPL |
results_sweet.json |
JSON | Sweet-spot config (sK=0.45/sV=0.50) |
results_aggressive.json |
JSON | Aggressive config (sK=sV=0.50) |
ablation_random_vs_ekv_results.json |
JSON | Matched-sparsity EKV vs random ablation |
attention_entropy_v2_results.json |
JSON | Entropy simulation results |
f4_thresh_ablation_results.json |
JSON | Threshold-formula ablation archive for Table 1 |
f2_spr_real_kv_results.json |
JSON | Archived SPR/ASI empirical summary for Table 13 and real-KV claims |
f7_compression_proof.json |
JSON | Archived compression-proof summary for Section 7 |
long_context_results.json |
JSON | 4K/8K/16K context PPL scaling |
| Directory | Model | Contents |
|---|---|---|
ppl/llama3/ |
Llama-3-8B FP16 | results_{all,baseline,sweet,aggressive}.json |
ppl/qwen25/ |
Qwen2.5-7B FP16 | results_{all,baseline,sweet,aggressive}.json |
ppl/qwen3/ |
Qwen3-4B BF16 | results_{all,baseline,sweet,aggressive}.json |
| Directory | Model × Seed | Contents |
|---|---|---|
longbench/qwen25_seed{1,2,3}/ |
Qwen2.5-7B × 3 seeds | results_{all,baseline,sweet,aggressive}.json |
longbench/qwen3_seed{1,2,3}/ |
Qwen3-4B × 3 seeds | results_{all,baseline,sweet,aggressive}.json |
| File | Description |
|---|---|
bench_baseline.md |
llama-bench output, ELASTICKV=0 |
bench_sweet.md |
llama-bench output, sweet config |
bench_aggressive.md |
llama-bench output, aggressive config |
| File | Description |
|---|---|
final_tests_full.log |
Complete PPL+NIAH+sink test run |
ablation_results.log |
Ablation experiment log |
longbench_3seeds_full.log |
3-seed LongBench Qwen2.5-7B run |
longbench_3seeds_qwen3_full.log |
3-seed LongBench Qwen3-4B run |