Skip to content

Commit d817da2

Browse files
NirantKgenerallAnush008
authored
Add Splade v1 (#144)
* Add SPLADE v1 * WIP SPLADE Export errors * add ONNX model to HF hub and use that * Update sentences in Converting_SPLADE_to_ONNX.ipynb * Remove unnecessary files and directories * Rename var in TextEmbedding class to use EMBEDDING_MODEL_TYPE * Add SPLADE to list of text embeddings * Add SPLADE model support for text embedding * Fix deprecation warning in embedding.py * Add test for batch embedding with sparse embeddings * Refactor import statement in test_sparse_embeddings.py * Rename nbs * Update vocab size in SPLADE model * Fix canonical vector lookup in test_text_onnx_embeddings.py * review refactoring * restore list_supported_models in OnnxTextEmbedding * Remove unused method _preprocess_onnx_input() in SpladePP class * Update SPLADE_PP_en_v1 source in splade_pp.py * Refactor onnx_model.py to change base model behavior * extend tests to sparse values as well as indicies * chore: pre-commit hooks --------- Co-authored-by: generall <[email protected]> Co-authored-by: Anush008 <[email protected]>
1 parent 68635ef commit d817da2

18 files changed

+1965
-1259
lines changed

.gitignore

+2-8
Original file line numberDiff line numberDiff line change
@@ -168,11 +168,5 @@ local_cache/*/*
168168
docs/experimental/*.parquet
169169
docs/experimental/*.bin
170170
qdrant_storage/*
171-
fooling_around/fast-multilingual-e5-large/config.json
172-
fooling_around/fast-multilingual-e5-large/model_optimized.onnx
173-
fooling_around/fast-multilingual-e5-large/model_optimized.onnx.data
174-
fooling_around/fast-multilingual-e5-large/ort_config.json
175-
fooling_around/fast-multilingual-e5-large/sentencepiece.bpe.model
176-
fooling_around/fast-multilingual-e5-large/special_tokens_map.json
177-
fooling_around/fast-multilingual-e5-large/tokenizer_config.json
178-
fooling_around/fast-multilingual-e5-large/tokenizer.json
171+
fooling_around/*
172+
experiments/models/*

docs/Getting Started.ipynb

+6-4
Original file line numberDiff line numberDiff line change
@@ -81,8 +81,8 @@
8181
"\n",
8282
"embeddings_generator = embedding_model.embed(documents) # reminder this is a generator\n",
8383
"embeddings_list = list(embedding_model.embed(documents))\n",
84-
" # you can also convert the generator to a list, and that to a numpy array\n",
85-
"len(embeddings_list[0]) # Vector of 384 dimensions"
84+
"# you can also convert the generator to a list, and that to a numpy array\n",
85+
"len(embeddings_list[0]) # Vector of 384 dimensions"
8686
]
8787
},
8888
{
@@ -185,7 +185,7 @@
185185
}
186186
],
187187
"source": [
188-
"multilingual_large_model = TextEmbedding(\"intfloat/multilingual-e5-large\") # This can take a few minutes to download"
188+
"multilingual_large_model = TextEmbedding(\"intfloat/multilingual-e5-large\") # This can take a few minutes to download"
189189
]
190190
},
191191
{
@@ -206,7 +206,9 @@
206206
}
207207
],
208208
"source": [
209-
"np.array(list(multilingual_large_model.embed([\"Hello, world!\", \"你好世界\", \"¡Hola Mundo!\", \"नमस्ते!\"]))).shape # Vector of 1024 dimensions"
209+
"np.array(\n",
210+
" list(multilingual_large_model.embed([\"Hello, world!\", \"你好世界\", \"¡Hola Mundo!\", \"नमस्ते!\"]))\n",
211+
").shape # Vector of 1024 dimensions"
210212
]
211213
},
212214
{

0 commit comments

Comments
 (0)