EmbeddingBag for ragged indices #2544

shkarupa-alex · 2021-08-19T11:57:01Z

Describe the feature and the current behavior/state.
Currently EmbeddingBag and underlying op works only with dense tensors.
But a lot of nlp ops/tasks in modern require to use RaggedTensors. Converting them to Dense will cause performance downgrade.

For example, let's take FastText ngram method.
We have a batch of words [BATCH]. We want to split them into ngrams [BATCH, NGRAMS].
Then we look them up in vocabulary to obtain indices [BATCH, NGRAM_INDICES].
Next we want to obtain embeddings and reduce them with sum/mean/etc.
In this case ngrams and indices are ragged tensors, so it would be cool if we can use EmbeddingBag for the last two operations (embed + reduce).

Relevant information

Are you willing to contribute it (yes/no): no
Are you willing to maintain it going forward? (yes/no): no
Is there a relevant academic paper? (if so, where): no
Does the relavent academic paper exceed 50 citations? (yes/no): no
Is there already an implementation in another framework? (if so, where): don't know
Was it part of tf.contrib? (if so, where): no

Which API type would this fall under (layer, metric, optimizer, etc.)
Layer & op.

Who will benefit with this feature?
This will extend layer usage to a large number of NLP tasks (which use RaggedTensors in most cases).

bhack · 2021-08-19T14:07:36Z

We have discussed a little bit about the sparsity at:

#2352 (comment)

/cc @tanguycdls @aartbik

tanguycdls · 2021-08-23T07:01:51Z

Hello ! +1 for that issue, We migrated from Torch to Tensorflow and we're also missing the EmbeddingBag op https://pytorch.org/docs/stable/generated/torch.nn.EmbeddingBag.html that takes as input a ragged input instead of a dense one.
We started by using safe_embedding_lookup_sparse but since we were only using the sum aggregator we recently moved to using a sparse dense matmul instead to reduce ram consomption: the bottleneck is the conversion from the ragged format to indicator COO format. We could share our implem once its finished and cleaned !

tanguycdls · 2022-01-21T14:31:11Z

@bhack @shkarupa-alex are you still interested by this ? if it's the case we can open a pr and discuss if it should be a new op or an improvement over the Embedding Bag merged here: #2352

bhack · 2022-01-21T14:41:34Z

As many other reusable NLP components are starting to land in keras-team/keras-hub#10 you could try to open a ticket there to check if they are interested.

shkarupa-alex · 2022-01-31T06:56:52Z

@bhack @shkarupa-alex are you still interested by this ?

Yes, still interested

seanpmorgan · 2023-03-01T05:26:05Z

TensorFlow Addons is transitioning to a minimal maintenance and release mode. New features will not be added to this repository. For more information, please see our public messaging on this decision:
TensorFlow Addons Wind Down

Please consider sending feature requests / contributions to other repositories in the TF community with a similar charters to TFA:
Keras
Keras-CV
Keras-NLP

seanpmorgan closed this as completed Mar 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EmbeddingBag for ragged indices #2544

EmbeddingBag for ragged indices #2544

shkarupa-alex commented Aug 19, 2021

bhack commented Aug 19, 2021

tanguycdls commented Aug 23, 2021

tanguycdls commented Jan 21, 2022

bhack commented Jan 21, 2022 •

edited

Loading

shkarupa-alex commented Jan 31, 2022 •

edited

Loading

seanpmorgan commented Mar 1, 2023

EmbeddingBag for ragged indices #2544

EmbeddingBag for ragged indices #2544

Comments

shkarupa-alex commented Aug 19, 2021

bhack commented Aug 19, 2021

tanguycdls commented Aug 23, 2021

tanguycdls commented Jan 21, 2022

bhack commented Jan 21, 2022 • edited Loading

shkarupa-alex commented Jan 31, 2022 • edited Loading

seanpmorgan commented Mar 1, 2023

bhack commented Jan 21, 2022 •

edited

Loading

shkarupa-alex commented Jan 31, 2022 •

edited

Loading