-
Notifications
You must be signed in to change notification settings - Fork 614
EmbeddingBag for ragged indices #2544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
We have discussed a little bit about the sparsity at: /cc @tanguycdls @aartbik |
Hello ! +1 for that issue, We migrated from Torch to Tensorflow and we're also missing the EmbeddingBag op https://pytorch.org/docs/stable/generated/torch.nn.EmbeddingBag.html that takes as input a ragged input instead of a dense one. |
@bhack @shkarupa-alex are you still interested by this ? if it's the case we can open a pr and discuss if it should be a new op or an improvement over the Embedding Bag merged here: #2352 |
As many other reusable NLP components are starting to land in keras-team/keras-hub#10 you could try to open a ticket there to check if they are interested. |
Yes, still interested |
TensorFlow Addons is transitioning to a minimal maintenance and release mode. New features will not be added to this repository. For more information, please see our public messaging on this decision: Please consider sending feature requests / contributions to other repositories in the TF community with a similar charters to TFA: |
Describe the feature and the current behavior/state.
Currently EmbeddingBag and underlying op works only with dense tensors.
But a lot of nlp ops/tasks in modern require to use RaggedTensors. Converting them to Dense will cause performance downgrade.
For example, let's take FastText ngram method.
We have a batch of words [BATCH]. We want to split them into ngrams [BATCH, NGRAMS].
Then we look them up in vocabulary to obtain indices [BATCH, NGRAM_INDICES].
Next we want to obtain embeddings and reduce them with sum/mean/etc.
In this case ngrams and indices are ragged tensors, so it would be cool if we can use EmbeddingBag for the last two operations (embed + reduce).
Relevant information
Which API type would this fall under (layer, metric, optimizer, etc.)
Layer & op.
Who will benefit with this feature?
This will extend layer usage to a large number of NLP tasks (which use RaggedTensors in most cases).
The text was updated successfully, but these errors were encountered: