Skip to content

AndrewRWilliams/ForecastBench_expert_personas

Repository files navigation

ForecastBench_expert_personas

Repository for analyzing how expert personas behave on ForecastBench

TODO: Set up uv env

If you have brew:

brew install uv

uv venv -- once

uv pip install -e . -- once

source .venv/bin/activate

deactivate

Experiment 1 = select top k or bottom k and make LLM generate

Experiment 2 = human eval of forecasts, check with LLM judge

Experiment 3 = trying system prompt

Experiment 4 = trying to do expert elicitation with few shot, (LLM judge or manually selecting)

Experiment 5 = Try sampling one question per topic and have all 7 topic experts forecast on these questions to see how experts that don't know anything about this topic perform

Experiment 6 = Try sampling one question per topic and do random selection of X filtered forecasts for the few shot prompt (based on what I understood from "Could you try random selection of filtered forecasts, instead of topic-relevant? There might be a difference").

Experiment 7 = Comparison of reasonings --> have a rubric and rating similarity on a scale of 1-5 and then seeing if an LLM judge can do that properly.

Analysis: - Add mean brier score and relative brier score

Appendix

Supporting documentation for this project:

  1. Rationale & Methodology - Detailed explanation of experimental design and approach
  2. Feedback Variants - All feedback variants for all feedback types
  3. LLM as a Judge Framework - Rubric and evaluation methodology
  4. Additional Results & Analysis - Supplementary findings and visualizations

About

Repository for analyzing how expert personas behave on ForecastBench

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors