Semantic Search Engine

This API returns the top 10 similar results for a user query.10% of Stackoverflow's data is used. You can find it here.

Getting Started

P.S => For notebook.ipynb you can directly run the entire notebook and a flask API will be deployed.Use that for testing purose.All the instructions are mentioned in the notebook.

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

docker
python
tensorflow
flask / fastapi

Installation

Clone the repo

https://github.com/budukhyash/semantic-search-engine

Running this will start open distro's elastic search instance. Read more about it here

docker run -p 8200:9200 -p 8600:9600 -e "discovery.type=single-node" amazon/opendistro-for-elasticsearch:1.8.0

Download the dataset. extract it, download the USE4 Universal Sentence Encoder by Google. Make sure the downloaded files are in the directory of the repository.
For data ingestion run. X denotes the number of documents to be indexed.

example => python elastic_search_ingestion.py X
python elastic_search_ingestion.py 20000

5.After the ingestion is completed. You can start the server by running

uvicorn server:app --reload --port 9999

Documentation

Postman Docs
After starting the server docs can be found here.
http://localhost:9999/docs#/
You should see something like this. -
/semantic returns the top 10 most similar results, this considers the semantic meaning of the query and uses cosine similarity to rank the documents.
/keywords returns the most similar results , this uses the traditional keyword approachusing an inverted index.Elastic search uses a TF-IDF based scheme to rank these documents.

Response time (Ingested 1 lakh documents)

sub 300ms for semantic search
sub 150ms for keyword based search.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md
demo.gif		demo.gif
elastic_search_ingestion.py		elastic_search_ingestion.py
fastAPI_docs.png		fastAPI_docs.png
notebook.ipynb		notebook.ipynb
notebook.pdf		notebook.pdf
postman.png		postman.png
server.py		server.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Semantic Search Engine

Getting Started

Prerequisites

Installation

Documentation

Built With

About

Releases

Packages

Languages

budukhyash/semantic-search-engine

Folders and files

Latest commit

History

Repository files navigation

Semantic Search Engine

Getting Started

Prerequisites

Installation

Documentation

Built With

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages