Results Manifest

Maps the main paper tables, figures, and inline claims to the scripts that produced them and the raw data files that back them. The raw artifacts listed here are bundled with the repository; some scripts are turnkey, while others still require explicit GGUF model paths, cached Hugging Face weights, or a local llama.cpp checkout at the tested commit. Supplementary local artifacts released with the repository are listed separately below. Supplementary JSON/log files use sanitized placeholder runner/model paths where provenance fields were preserved from the original capture environment.

LaTeX table/figure numbers follow source order (all floats use [ht] or [t]).

Tables (13)

#	Label	Description	Script	Data
1	`thresh_ablation`	Threshold formula ablation (Qwen2.5-0.5B real K)	`scripts/analysis/f4_threshold_ablation.py`	`results/f4_thresh_ablation_results.json`
2	`ppl_crossmodel`	PPL across 7+2 models, 3 configs	`scripts/eval/run_ppl.sh`	`results/results_all.json`, `results/ppl/*/`
3	`niah`	NIAH retrieval 45/45 (3 models)	`scripts/eval/f5_niah.py`	`results/logs/final_tests_full.log`
4	`longctx`	Long context PPL (4K/8K/16K)	`scripts/eval/run_ppl.sh`	`results/long_context_results.json`
5	`sink`	Sink token ablation (Mistral-7B)	`scripts/eval/run_ppl.sh`	`results/logs/final_tests_full.log`
6	`throughput`	RTX 5090 throughput (llama-bench)	`scripts/eval/throughput_bench.sh`	`results/throughput/bench_*.md`
7	`longbench_overall`	LongBench-v2 overall accuracy (503 ex, 3 seeds)	`scripts/eval/longbench_v2_eval.py`	`results/longbench/*/results_all.json`
8	`longbench_domain`	Qwen2.5-7B per-domain LongBench-v2	`scripts/eval/longbench_v2_eval.py`	`results/longbench/qwen25_seed*/`
9	`divergence`	PPL degradation vs task accuracy	`scripts/analysis/f1_ppl_analysis.py`	`results/results_sweet.json`, `results/longbench/`
10	`entropy`	Attention entropy under simulated EKV	`scripts/analysis/simulate_attention_haze.py`	`results/attention_entropy_v2_results.json`
11	`ablation_random`	EKV vs random zeroing (matched sparsity)	`scripts/analysis/f4_threshold_ablation.py`	`results/ablation_random_vs_ekv_results.json`
12	`paired_chunks`	Paired per-chunk analysis (†F1 models)	`scripts/analysis/f1_deep_analysis.py`	`results/ppl/*/`, `results/logs/final_tests_full.log`
13	`spr_asi`	SPR and ASI across architectures	`scripts/analysis/f2_spr_real_kv.py`	`results/f2_spr_real_kv_results.json`, paper Table 13

Figures (8)

#	Label	Description	Script	File
1	`sparsity_curve`	Sparsity–quality trade-off (Llama-3.1-8B)	`scripts/plot/gen_fig65_sparsity_curve.py`	`paper/figures/fig65_sparsity_curve.pdf`
2	`architecture`	ElasticKV pipeline diagram	`scripts/plot/generate_paper_figures.py`	`paper/figures/fig0_architecture.pdf`
3	`f1_kv_regression`	KV heads vs PPL regression (7 models)	`scripts/analysis/f1_ppl_analysis.py`	`paper/figures/f1_kv_heads_vs_ppl.png`
4	`three_seed`	Three-seed LongBench reproducibility	`scripts/plot/generate_paper_figures.py`	`paper/figures/fig25_three_seed.pdf`
5	`kv_head_dependence`	KV head count moderation effect	`scripts/plot/generate_paper_figures.py`	`paper/figures/fig35_kv_head_dependence.pdf`
6	`attention_haze`	Attention haze visualization (64-token)	`scripts/plot/generate_paper_figures.py`	`paper/figures/fig15_attention_haze.pdf`
7	`entropy_sparsity`	Entropy change vs sparsity (EKV + random)	`scripts/plot/generate_paper_figures.py`	`paper/figures/fig45_entropy_vs_sparsity.pdf`
8	`precision_recall`	Per-domain accuracy change (Qwen2.5-7B)	`scripts/plot/generate_paper_figures.py`	`paper/figures/fig55_precision_recall.pdf`

Inline Claims

Claim	Location	Script	Data
Compression 1.48× (Mistral sweet)	Abstract, §4.1	`scripts/eval/run_ppl.sh`	`results/results_sweet.json`
<0.31% PPL degradation (GQA)	Abstract, §4.1	`scripts/eval/run_ppl.sh`	`results/results_all.json`
3,400× ablation divergence	Abstract, §5.3	`scripts/analysis/f4_threshold_ablation.py`	`results/ablation_random_vs_ekv_results.json`
SPR_cond = 54.6% (real KV)	§5.2	`scripts/analysis/f2_spr_real_kv.py`	`results/f2_spr_real_kv_results.json`
EKV removes 36.6% of elements but only 0.53% of L2 at `s=0.45`	§4.3, §5.2	`scripts/analysis/f4b_tau_band_analysis.py`	`results/supplementary/local_baselines/attention_haze_tau_bands_qwen05b.json`
H2O/ScissorHands remove 46.3% / 75.2% of L2 under the same `τ_h` frame	§4.3, §5.2	`scripts/analysis/f4c_h2o_scissor_eviction_analysis.py`	`results/supplementary/local_baselines/attention_h2o_scissor_evictions_qwen05b.json`
zlib 1.52× / lz4 1.23×	§7	`scripts/analysis/f7_compression_proof.py`	`results/f7_compression_proof.json`
+0.8pp Qwen2.5-7B regularization	§4.4, §5.3	`scripts/eval/longbench_v2_eval.py`	`results/longbench/qwen25_seed*/`
ΔPP regression R²=0.53	§4.1, §5.1	`scripts/analysis/f1_ppl_analysis.py`	`results/results_all.json`
0.4% decode overhead	§4.3	`scripts/eval/throughput_bench.sh`	`results/throughput/bench_*.md`
Paired t>100 (chunk-level)	§5.1	`scripts/analysis/f1_deep_analysis.py`	`results/ppl/*/`

Supplementary Local Artifacts

Artifact	Description	Script or patch	Data
`local_baselines`	Same-run local H2O, ScissorHands, and EKV comparison on GTX 1070 (`llama-perplexity`, `flash_attn` off, `c=256`, `b=256`, `ub=32`, 8 chunks)	`cuda_hook/experimental/llama-context_attention_baselines.cpp`	`results/supplementary/local_baselines/h2o_wsl_compatible_models_comparison.json`, `results/supplementary/local_baselines/logs/`
`h2o_smoke`	Short smoke validation showing score capture and heavy+recent eviction in the local H2O path	`cuda_hook/experimental/llama-context_attention_baselines.cpp`	`results/supplementary/local_baselines/h2o_smoke_qwen05b_comparison.json`
`tau_band_qwen05b`	Real-KV pre-threshold `	K	/τ_h` band analysis showing that EKV's removed region is high-density but low-energy on Qwen2.5-0.5B
`baseline_eviction_geometry`	Real-KV replay that projects H2O and ScissorHands evictions onto the same `	K	/τ_h` bands for mechanistic comparison with EKV
`gtx1070_llama_bench`	Local GTX 1070 throughput sweep for `evict_every` sensitivity and sweet/aggressive decode trade-offs	`scripts/eval/run_gtx1070_llama_bench.sh`	`results/supplementary/throughput_gtx1070_llama_bench/`

Data Files

Top-level aggregates (`results/`)

File	Format	Description
`results_all.json`	JSON	All PPL results across models and configs (Mistral primary)
`results_baseline.json`	JSON	Baseline (no EKV) PPL
`results_sweet.json`	JSON	Sweet-spot config (sK=0.45/sV=0.50)
`results_aggressive.json`	JSON	Aggressive config (sK=sV=0.50)
`ablation_random_vs_ekv_results.json`	JSON	Matched-sparsity EKV vs random ablation
`attention_entropy_v2_results.json`	JSON	Entropy simulation results
`f4_thresh_ablation_results.json`	JSON	Threshold-formula ablation archive for Table 1
`f2_spr_real_kv_results.json`	JSON	Archived SPR/ASI empirical summary for Table 13 and real-KV claims
`f7_compression_proof.json`	JSON	Archived compression-proof summary for Section 7
`long_context_results.json`	JSON	4K/8K/16K context PPL scaling

Per-model PPL (`results/ppl/`)

Directory	Model	Contents
`ppl/llama3/`	Llama-3-8B FP16	`results_{all,baseline,sweet,aggressive}.json`
`ppl/qwen25/`	Qwen2.5-7B FP16	`results_{all,baseline,sweet,aggressive}.json`
`ppl/qwen3/`	Qwen3-4B BF16	`results_{all,baseline,sweet,aggressive}.json`

LongBench-v2 3-seed (`results/longbench/`)

Directory	Model × Seed	Contents
`longbench/qwen25_seed{1,2,3}/`	Qwen2.5-7B × 3 seeds	`results_{all,baseline,sweet,aggressive}.json`
`longbench/qwen3_seed{1,2,3}/`	Qwen3-4B × 3 seeds	`results_{all,baseline,sweet,aggressive}.json`

Throughput (`results/throughput/`)

File	Description
`bench_baseline.md`	llama-bench output, ELASTICKV=0
`bench_sweet.md`	llama-bench output, sweet config
`bench_aggressive.md`	llama-bench output, aggressive config

Logs (`results/logs/`)

File	Description
`final_tests_full.log`	Complete PPL+NIAH+sink test run
`ablation_results.log`	Ablation experiment log
`longbench_3seeds_full.log`	3-seed LongBench Qwen2.5-7B run
`longbench_3seeds_qwen3_full.log`	3-seed LongBench Qwen3-4B run

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Results Manifest

Tables (13)

Figures (8)

Inline Claims

Supplementary Local Artifacts

Data Files

Top-level aggregates (`results/`)

Per-model PPL (`results/ppl/`)

LongBench-v2 3-seed (`results/longbench/`)

Throughput (`results/throughput/`)

Logs (`results/logs/`)

FilesExpand file tree

MANIFEST.md

Latest commit

History

MANIFEST.md

File metadata and controls

Results Manifest

Tables (13)

Figures (8)

Inline Claims

Supplementary Local Artifacts

Data Files

Top-level aggregates (results/)

Per-model PPL (results/ppl/)

LongBench-v2 3-seed (results/longbench/)

Throughput (results/throughput/)

Logs (results/logs/)

Top-level aggregates (`results/`)

Per-model PPL (`results/ppl/`)

LongBench-v2 3-seed (`results/longbench/`)

Throughput (`results/throughput/`)

Logs (`results/logs/`)