lechmazur

lechmazur

CEO, Advameg, Inc.

Achievements

writing writing Public

This benchmark tests how well LLMs incorporate a set of 10 mandatory story elements (characters, objects, core concepts, attributes, motivations, etc.) in a short creative story

122 2
confabulations confabulations Public

Hallucinations (Confabulations) Document-Based Benchmark for RAG. Includes human-verified questions and answers.

HTML 101 3
step_game step_game Public

Multi-Agent Step Race Benchmark: Assessing LLM Collaboration and Deception Under Pressure. A multi-player “step-race” that challenges LLMs to engage in public conversation before secretly picking a…

42 3
elimination_game elimination_game Public

A multi-player tournament benchmark that tests LLMs in social reasoning, strategy, and deception. Players engage in public and private conversations, form alliances, and vote to eliminate each other

42 3
nyt-connections nyt-connections Public

Benchmark that evaluates LLMs using 601 NYT Connections puzzles extended with extra trick words

Python 40 3
generalization generalization Public

Thematic Generalization Benchmark: measures how effectively various LLMs can infer a narrow or specific "theme" (category/rule) from a small set of examples and anti-examples, then detect which ite…

40 1