Suggesting to cite prior work mPLUG-Owl3 and other related cross-attention based efficient MLLMs #18

LukeForeverYoung · 2025-02-08T11:33:29Z

Excellent work on advancing efficient MLLMs. I observe that LLaVA-Mini employs transformer layers initialized from the language model for multimodal fusion through cross-attention mechanisms, sharing similarities with prior works such as mPLUG-Owl3, which repurposes the transformer layers within the language model to execute both cross-attention and self-attention operations in parallel. To strengthen the contextual foundation of efficient MLLM research, we suggest adding related cross-attention architectures in your references. Specifically, foundational works like Flamingo, EVLM, and LLaMA-Vision could be cited to better situate your work within the landscape of efficient MLLM development.

mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
(https://arxiv.org/abs/2408.04840)[https://arxiv.org/abs/2408.04840]

MonolithFoundation · 2025-02-11T09:05:14Z

Many work uses this version, but added too much parameters, not sure with such large additional computation added, less token is meaningful or not

MiloQ · 2025-02-12T08:16:03Z

@MonolithFoundation Do you mean Llava-mini or mPLUG-Owl3?

MonolithFoundation · 2025-02-12T11:42:17Z

Anything with a Resampler

MiloQ · 2025-02-13T02:59:26Z

@MonolithFoundation How to explain the speed gain mentioned in these papers, such as Flops ？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggesting to cite prior work mPLUG-Owl3 and other related cross-attention based efficient MLLMs #18

Suggesting to cite prior work mPLUG-Owl3 and other related cross-attention based efficient MLLMs #18

LukeForeverYoung commented Feb 8, 2025

MonolithFoundation commented Feb 11, 2025

MiloQ commented Feb 12, 2025

MonolithFoundation commented Feb 12, 2025

MiloQ commented Feb 13, 2025

Suggesting to cite prior work mPLUG-Owl3 and other related cross-attention based efficient MLLMs #18

Suggesting to cite prior work mPLUG-Owl3 and other related cross-attention based efficient MLLMs #18

Comments

LukeForeverYoung commented Feb 8, 2025

MonolithFoundation commented Feb 11, 2025

MiloQ commented Feb 12, 2025

MonolithFoundation commented Feb 12, 2025

MiloQ commented Feb 13, 2025