Skip to content

A cross-modal search engine leveraging semantic search and cosine similarity, built using CLIP, ImageBind, and Flask.

License

Notifications You must be signed in to change notification settings

ahmedembeddedxx/multimodal-search-engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Multimodal Search Engine

This project is a Multi-Modal Search Engine developed using CLIP by OpenAI, with Flask API for backend and HTML/CSS for the frontend web application.

Introduction

This project provides a seamless web interface where users can input text queries, and the system retrieves relevant images based on the textual description based on CLIP architecture read the paper.

Take a look

Screenshot-2024-04-10-at-11-02-46-PM

Screenshot-2024-04-10-at-11-03-23-PM

Screenshot-2024-04-10-at-11-03-51-PM

Screenshot-2024-04-10-at-11-04-14-PM

Demo Video

Watch the YouTube video

  • This video demonstrates how to use our project's main feature.

How to use for your own images?

  • Sample data of 130 images is present in the file or
  • See the video or
  • Place your images in src/minidata
  • Run the notebook src/image-processor
  • Move the data in src/image_embeddings & the data in src/minidata to flaskapp/image_embeddings & flaskapp/static respectively (caution: transfer the data, not the directories)

Features

  • Multi-Modal Search: Users can input textual descriptions of images to retrieve relevant images.
  • Intuitive Web Interface: The frontend is built using React to provide a user-friendly experience.
  • Scalable Backend: Flask API serves as the backend, handling requests and interacting with the CLIP model.

Clone the repository:

git clone https://github.com/ahmedembeddedxx/multimodal-search-engine.git

Usage

Start the backend server:

cd flaskapp/
flask run

Access the web application in your browser at http://127.0.0.1:5000/.

Stacks

  • OpenAI for developing CLIP.
  • Flask for the backend framework.

Future Expectences

  • Shift the app to ReactJs
  • Use ImageBind by MetaAI
  • More accurate modal evaluation
  • Integrate Audio & Video Functionality

About

A cross-modal search engine leveraging semantic search and cosine similarity, built using CLIP, ImageBind, and Flask.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages