From 5aa362f3b3cf63407f6395bb029245dd9c72b3e4 Mon Sep 17 00:00:00 2001 From: Hailang Date: Wed, 27 Nov 2024 15:29:29 +0800 Subject: [PATCH] Add MMGenBench Add MMGenBench --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index efe958f..0599354 100644 --- a/README.md +++ b/README.md @@ -430,6 +430,7 @@ This is the first work to correct hallucination in multimodal large language mod ## Evaluation | Title | Venue | Date | Page | |:--------|:--------:|:--------:|:--------:| +| ![Stars](https://img.shields.io/github/stars/lerogo/MMGenBench?style=social&label=Star)
[**MMGenBench: Evaluating the Limits of LMMs from the Text-to-Image Generation Perspective**](https://arxiv.org/abs/2411.14062)
| arXiv | 2024-11-21 | [Github](https://github.com/lerogo/MMGenBench) | | ![Stars](https://img.shields.io/github/stars/multimodal-art-projection/OmniBench?style=social&label=Star)
[**OmniBench: Towards The Future of Universal Omni-Language Models**](https://arxiv.org/pdf/2409.15272)
| arXiv | 2024-09-23 | [Github](https://github.com/multimodal-art-projection/OmniBench) | | ![Stars](https://img.shields.io/github/stars/yfzhang114/MME-RealWorld?style=social&label=Star)
[**MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?**](https://arxiv.org/pdf/2408.13257)
| arXiv | 2024-08-23 | [Github](https://github.com/yfzhang114/MME-RealWorld) | | ![Stars](https://img.shields.io/github/stars/guoyang9/UNK-VQA?style=social&label=Star)
[**UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models**](https://arxiv.org/pdf/2310.10942)
| TPAMI | 2023-10-17 | [Github](https://github.com/guoyang9/UNK-VQA) | @@ -586,6 +587,7 @@ This is the first work to correct hallucination in multimodal large language mod ## Benchmarks for Evaluation | Name | Paper | Link | Notes | |:-----|:-----:|:----:|:-----:| +| **MMGenBench** | [MMGenBench: Evaluating the Limits of LMMs from the Text-to-Image Generation Perspective](https://arxiv.org/abs/2411.14062) | [Link](https://github.com/lerogo/MMGenBench) | A benchmark that only uses images and text-to-image models for evaluation | | **LiveXiv** | [LiveXiv -- A Multi-Modal Live Benchmark Based on Arxiv Papers Content](https://arxiv.org/pdf/2410.10783) | [Link](https://huggingface.co/datasets/LiveXiv/LiveXiv) | A live benchmark based on arXiv papers | | **TemporalBench** | [TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models](https://arxiv.org/pdf/2410.10818) | [Link](https://huggingface.co/datasets/microsoft/TemporalBench) | A benchmark for evaluation of fine-grained temporal understanding | | **OmniBench** | [OmniBench: Towards The Future of Universal Omni-Language Models](https://arxiv.org/pdf/2409.15272) | [Link](https://huggingface.co/datasets/m-a-p/OmniBench) | A benchmark that evaluates models' capabilities of processing visual, acoustic, and textual inputs simultaneously |