Skip to content

About poolformer as a tool for demonstration of MetaFormer #51

@YoojLee

Description

@YoojLee

Hi, Thanks for the wonderful work, and I am really impressed with the proposed 'MetaFormer' concepts and experimental results you have provided! While reading the paper, some questions were raised regarding the poolformer and the concept of MetaFormer that I wanted to share with you.

  1. As far as I understand, the metaformer basically consists of 'input embedding + iteration of blocks with [norm - token mixer - residual connection - norm - channel mixer - residual connection].' Then does MetaFormer not have consideration for non-overlapping patches or a sequence of flattened patches? If so, is the combination of token mixer and channel mixer with other components basically what we have for the 'MetaFormer' regardless of the hierarchical structure of networks or shape of inputs?
  2. The poolformer has non-parametric 2D pooling for the token mixer, which is extremely simple compared to previous token mixers. However, the patch embedding inserted between the blocks seems to have implicit token mixing since it is a convolution with a smaller stride than its kernel size and eventually yields overlapped patches. Under the assumption of overlapping patches, I believe the resulting patches share information on the same spatial locations.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions