undesirable behaviour of find_lengths function #138

mbaroni · 2020-11-05T13:30:50Z

def find_lengths(messages: torch.Tensor) -> torch.Tensor:
"""
:param messages: A tensor of term ids, encoded as Long values, of size (batch size, max sequence length).
:returns A tensor with lengths of the sequences, including the end-of-sequence symbol (in EGG, it is 0).
If no is found, the full length is returned (i.e. messages.size(1)).

This leads to counterintuitive behaviour in which, if max_len is 3, [1, 2, 3] and [1, 2, 0] have the same length.

robertodessi · 2020-11-05T16:26:27Z

Quickly thikning about it I see two possible options:

not allowing (read raising an error/throwing and exception or returning a special value like 0 or -1) messages withouth EOS
considering [1, 2, 3] as length 3 and [1, 2, 0] as length 2

I have a preference for 2. but am open to discuss it

tomkouwenhoven · 2023-06-23T07:12:51Z

Hi,

Is there any update on this issue? I am also working on variable-length communication, and adding an EOS token to each message (see three code lines below) causes the lengths to be of length: opts.max_len + 1 when the sender itself produces no EOS token. This is a bit counterintuitive when one specifies max_len to be a specific value and observes a length returned by the find_lengths function that is longer than the specified value.

sequence = torch.stack(sequence).permute(1, 0)
zeros = torch.zeros((sequence.size(0), 1)).to(sequence.device)
sequence = torch.cat([sequence, zeros.long()], dim=1)

In terms of the options you mentioned on November 5th 2020 I would also opt for option 2.

Is the current best solution still to increase max_len parameter with one as mentioned here #188 ?

Thanks in advance,
Tom Kouwenhoven

robertodessi · 2023-06-23T11:59:33Z

Hi,

No concrete plans to work on this in the near future. Do you want to give it a go? :)

mbaroni assigned robertodessi, eugene-kharitonov and mbaroni Nov 5, 2020

mbaroni added the enhancement New feature or request label Nov 13, 2020

robertodessi mentioned this issue Jan 6, 2021

SinusoidalPositionEmbedding lenght does not match message length #188

Closed

robertodessi changed the title ~~undesirable behavious of find_lengths function~~ undesirable behaviour of find_lengths function Feb 1, 2021

robertodessi mentioned this issue Sep 8, 2022

TransformerSenderReinforce max_len parameter #247

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

undesirable behaviour of find_lengths function #138

undesirable behaviour of find_lengths function #138

mbaroni commented Nov 5, 2020

robertodessi commented Nov 5, 2020 •

edited

Loading

tomkouwenhoven commented Jun 23, 2023 •

edited

Loading

robertodessi commented Jun 23, 2023

undesirable behaviour of find_lengths function #138

undesirable behaviour of find_lengths function #138

Comments

mbaroni commented Nov 5, 2020

robertodessi commented Nov 5, 2020 • edited Loading

tomkouwenhoven commented Jun 23, 2023 • edited Loading

robertodessi commented Jun 23, 2023

robertodessi commented Nov 5, 2020 •

edited

Loading

tomkouwenhoven commented Jun 23, 2023 •

edited

Loading