About poolformer as a tool for demonstration of MetaFormer

Hi, Thanks for the wonderful work, and I am really impressed with the proposed 'MetaFormer' concepts and experimental results you have provided! While reading the paper, some questions were raised regarding the poolformer and the concept of MetaFormer that I wanted to share with you.

1. As far as I understand, the metaformer basically consists of 'input embedding + iteration of blocks with [norm - token mixer - residual connection - norm - channel mixer - residual connection].' Then does MetaFormer not have consideration for non-overlapping patches or a sequence of flattened patches? If so, is the combination of token mixer and channel mixer with other components basically what we have for the 'MetaFormer' regardless of the hierarchical structure of networks or shape of inputs?
2. The poolformer has non-parametric 2D pooling for the token mixer, which is extremely simple compared to previous token mixers. However, the patch embedding inserted between the blocks seems to have implicit token mixing since it is a convolution with a smaller stride than its kernel size and eventually yields overlapped patches. Under the assumption of overlapping patches, I believe the resulting patches share information on the same spatial locations.


Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

About poolformer as a tool for demonstration of MetaFormer #51

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

About poolformer as a tool for demonstration of MetaFormer #51

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions