Skip to content
@cxcscmu

cxcscmu

Popular repositories Loading

  1. Craw4LLM Craw4LLM Public

    Official repository for "Craw4LLM: Efficient Web Crawling for LLM Pretraining"

    Python 649 60

  2. RAGViz RAGViz Public

    Official repository for RAGViz: Diagnose and Visualize Retrieval-Augmented Generation [EMNLP 2024]

    TypeScript 88 13

  3. MATES MATES Public

    Official repository for MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models [NeurIPS 2024]

    Python 79 9

  4. Montessori-Instruct Montessori-Instruct Public

    Official repository for Montessori-Instruct: Generate Influential Training Data Tailored for Student Learning [ICLR 2025]

    Python 50 4

  5. AutoGEO AutoGEO Public

    [ICLR'26] AutoGEO: a framework to automatically learn generative engine preferences, and rewrite web contents for more traction.

    Python 38 4

  6. deepresearch_benchmarking deepresearch_benchmarking Public

    Python 24 1

Repositories

Showing 10 of 25 repositories

Top languages

Loading…

Most used topics

Loading…