Skip to content

MultiHeadAttention parameter setting #180

Open
@LXXiaogege

Description

@LXXiaogege

Is the output linear layer parameter of the MultiHeadAttention class incorrectly set in mha.py file? in_features should be heads*d_k?

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions