Skip to content

EmbeddingBag for ragged indices #2544

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
shkarupa-alex opened this issue Aug 19, 2021 · 6 comments
Closed

EmbeddingBag for ragged indices #2544

shkarupa-alex opened this issue Aug 19, 2021 · 6 comments

Comments

@shkarupa-alex
Copy link
Contributor

Describe the feature and the current behavior/state.
Currently EmbeddingBag and underlying op works only with dense tensors.
But a lot of nlp ops/tasks in modern require to use RaggedTensors. Converting them to Dense will cause performance downgrade.

For example, let's take FastText ngram method.
We have a batch of words [BATCH]. We want to split them into ngrams [BATCH, NGRAMS].
Then we look them up in vocabulary to obtain indices [BATCH, NGRAM_INDICES].
Next we want to obtain embeddings and reduce them with sum/mean/etc.
In this case ngrams and indices are ragged tensors, so it would be cool if we can use EmbeddingBag for the last two operations (embed + reduce).

Relevant information

  • Are you willing to contribute it (yes/no): no
  • Are you willing to maintain it going forward? (yes/no): no
  • Is there a relevant academic paper? (if so, where): no
  • Does the relavent academic paper exceed 50 citations? (yes/no): no
  • Is there already an implementation in another framework? (if so, where): don't know
  • Was it part of tf.contrib? (if so, where): no

Which API type would this fall under (layer, metric, optimizer, etc.)
Layer & op.

Who will benefit with this feature?
This will extend layer usage to a large number of NLP tasks (which use RaggedTensors in most cases).

@bhack
Copy link
Contributor

bhack commented Aug 19, 2021

We have discussed a little bit about the sparsity at:

#2352 (comment)

/cc @tanguycdls @aartbik

@tanguycdls
Copy link

Hello ! +1 for that issue, We migrated from Torch to Tensorflow and we're also missing the EmbeddingBag op https://pytorch.org/docs/stable/generated/torch.nn.EmbeddingBag.html that takes as input a ragged input instead of a dense one.
We started by using safe_embedding_lookup_sparse but since we were only using the sum aggregator we recently moved to using a sparse dense matmul instead to reduce ram consomption: the bottleneck is the conversion from the ragged format to indicator COO format. We could share our implem once its finished and cleaned !

@tanguycdls
Copy link

@bhack @shkarupa-alex are you still interested by this ? if it's the case we can open a pr and discuss if it should be a new op or an improvement over the Embedding Bag merged here: #2352

@bhack
Copy link
Contributor

bhack commented Jan 21, 2022

As many other reusable NLP components are starting to land in keras-team/keras-hub#10 you could try to open a ticket there to check if they are interested.

@shkarupa-alex
Copy link
Contributor Author

shkarupa-alex commented Jan 31, 2022

@bhack @shkarupa-alex are you still interested by this ?

Yes, still interested

@seanpmorgan
Copy link
Member

TensorFlow Addons is transitioning to a minimal maintenance and release mode. New features will not be added to this repository. For more information, please see our public messaging on this decision:
TensorFlow Addons Wind Down

Please consider sending feature requests / contributions to other repositories in the TF community with a similar charters to TFA:
Keras
Keras-CV
Keras-NLP

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants