Skip to content

TensorCruncher/animal-image-search

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

16 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Animal Image Search ๐Ÿˆ ๐Ÿž๏ธ ๐Ÿ”Ž

Search through 5400 animal images (90 classes x 60 images) using text or image queries.

Results are returned with captions.

See demo on Hugging Face Spaces.

Original image dataset from Kaggle. We use it with slight changes (two images changed).

Tech

  • Image embeddings created using OpenClip.
  • Search provided by FAISS.
  • Captions generated using BLIP.

Scope for improvement

  • Dataset can be expanded to include more images per class so as to provide richer results.

  • Dataset contains some duplicates which need to be removed.

  • OpenClip with ViT-B-32 model and laion2b_s34b_b79k weights works alright for basic queries, but fails to understand more abstract queries. For example, โ€œTiger foodโ€ returns images of tigers and not deers etc. Using a larger model might help in this regard by embedding images and text in a richer embedding space.

  • BLIP model appears to repeat last word in caption sometimes. Needs investigation. BLIP2 might provide better captions at the cost of being slower on CPU.

Project extensions

  • Audio input can be added using whisper model. Model would convert audio to text which can then be joined with existing pipeline.

  • We could use an LLM as a judge to rank / grade returned results based on how closely they match the search query.

About

Multimodal animal image search using FAISS & OpenClip embeddings and image captioning with an LLM

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published