How can I perform multiple concurrent calls to speed up the entire runtime

When I use the case provided by cpm.cu for multi-concurrent inference, it seems that cpm.cu does not support multiple concurrent calls. It reports a RuntimeError: The size of tensor a (13) must match the size of tensor b (12) at non-singleton dimension 0. The error occurs at tokens[1+i:1+i+append_length].copy_(self.tree_draft_ids[:append_length]). How can I perform multiple concurrent calls to speed up the entire runtime?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How can I perform multiple concurrent calls to speed up the entire runtime #9

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How can I perform multiple concurrent calls to speed up the entire runtime #9

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions