add notebook for efficient-sam (#1557)

eaidova · web-flow · commit 198fb2e569ee · 2023-12-21T09:10:36.000+04:00
* add notebook for efficient-sam

* text descriptions and readme

* Apply suggestions from code review

* reduce pt usage

* add quantization

* apply review comments
diff --git a/.ci/spellcheck/.pyspelling.wordlist.txt b/.ci/spellcheck/.pyspelling.wordlist.txt
@@ -172,6 +172,8 @@ dropdown
 ECCV
 editability
 EfficientNet
+EfficientSAM
+EfficientSAMs
 embeddings
 EnCodec
 encodec
@@ -554,6 +556,8 @@ sagittal
 SALICON
 Saliency
 saliency
+SAMI
+sam
 SavedModel
 scalability
 Scalable
@@ -699,6 +703,7 @@ ViT
 vit
 vits
 VITS
+vitt
 VM
 Vladlen
 VOC
diff --git a/README.md b/README.md
@@ -47,7 +47,8 @@ Check out the latest notebooks that show how to optimize and deploy popular mode
 | [Audio LDM 2](notebooks/270-sound-generation-audioldm2/)<br> | Sound Generation with AudioLDM2 and OpenVINO™ | <img src="https://github.com/openvinotoolkit/openvino_notebooks/assets/76463150/c93a0f86-d9cf-4bd1-93b9-e27532170d75" width=300> |
 | [SDXL-Turbo](notebooks/271-sdxl-turbo/)<br> | Single-step image generation using SDXL-turbo and OpenVINO | <img src="https://github.com/openvinotoolkit/openvino_notebooks/assets/29454499/79b625c7-0f0a-4f19-8e38-e6f896f75c3e" width=300> |
 | [Segmind-VegaRT](notebooks/248-stable-diffusion-xl/248-segmind-vegart.ipynb)<br> | High-resolution image generation with Segmind-VegaRT and OpenVINO | <img src="https://github.com/openvinotoolkit/openvino_notebooks/assets/29454499/66bfe823-01c8-4749-a8aa-419a1d78a070" width=300> |
-| [Stable-Zephyr chatbot](notebooks/273-stable-zephyr-3b-chatbot/)<br> | Use Stable-Zephyr as chatbot assistant with OpenVINO | <img src=https://github.com/openvinotoolkit/openvino_notebooks/assets/29454499/cfac6ddb-6f22-4343-855c-e513269cf2bf width=300> |  
+| [Stable-Zephyr chatbot](notebooks/273-stable-zephyr-3b-chatbot/)<br> | Use Stable-Zephyr as chatbot assistant with OpenVINO | <img src=https://github.com/openvinotoolkit/openvino_notebooks/assets/29454499/cfac6ddb-6f22-4343-855c-e513269cf2bf width=300> |
+| [Efficient-SAM](notebooks/274-efficient-sam)<br> | Object segmentation with EfficientSAM and OpenVINO | <img src=https://github.com/openvinotoolkit/openvino_notebooks/assets/29454499/15d0a23a-0550-43c6-9ca9-f689e772a79a width=300> |
 
 ## Table of Contents
 
@@ -220,6 +221,8 @@ Demos that demonstrate inference on a particular model.
 | [270-sound-generation-audioldm2](notebooks/270-sound-generation-audioldm2/)<br> | Sound Generation with AudioLDM2 and OpenVINO™ | <img src="https://github.com/openvinotoolkit/openvino_notebooks/assets/76463150/c93a0f86-d9cf-4bd1-93b9-e27532170d75" width=225> |
 | [271-sdxl-turbo](notebooks/271-sdxl-turbo/)<br> | Single-step image generation using SDXL-turbo and OpenVINO | <img src="https://github.com/openvinotoolkit/openvino_notebooks/assets/29454499/79b625c7-0f0a-4f19-8e38-e6f896f75c3e" width=225> |
 | [272-paint-by-example](notebooks/272-paint-by-example/)<br>| Exemplar based image editing using diffusion models, [Paint-by-Example](https://github.com/Fantasy-Studio/Paint-by-Example), and OpenVINO™ | <img width="225" alt="ui_example" src="https://user-images.githubusercontent.com/103226580/236958011-0550ba7b-19b3-4dc6-8b50-eaedb84c1681.png"> |
+| [274-efficient-sam](notebooks/274-efficient-sam/)<br>| Object segmentation with EfficientSAM and OpenVINO™ | <img width="225" src="https://github.com/openvinotoolkit/openvino_notebooks/assets/29454499/15d0a23a-0550-43c6-9ca9-f689e772a79a"> |
+
 <div id='-model-training'></div>
 
 ### 🏃 Model Training
diff --git a/notebooks/274-efficient-sam/274-efficient-sam.ipynb b/notebooks/274-efficient-sam/274-efficient-sam.ipynb
diff --git a/notebooks/274-efficient-sam/README.md b/notebooks/274-efficient-sam/README.md
@@ -0,0 +1,32 @@
+# Object segmentations with EfficientSAM and OpenVINO
+
+Segment Anything Model (SAM) has emerged as a powerful tool for numerous vision applications. A key component that drives the impressive performance for zero-shot transfer and high versatility is a super large Transformer model trained on the extensive high-quality SA-1B dataset. While beneficial, the huge computation cost of SAM model has limited its applications to wider real-world applications. To address this limitation, EfficientSAMs, light-weight SAM models that exhibit decent performance with largely reduced complexity, were proposed. The idea behind EfficientSAM is based on leveraging masked image pretraining, SAMI, which learns to reconstruct features from SAM image encoder for effective visual representation learning.
+
+![overview.png](https://yformer.github.io/efficient-sam/EfficientSAM_files/overview.png)
+
+More details about model can be found in [paper](https://arxiv.org/pdf/2312.00863.pdf), [model web page](https://yformer.github.io/efficient-sam/) and [original repository](https://github.com/yformer/EfficientSAM)
+
+In this tutorial, we consider how to convert and run EfficientSAM using OpenVINO. We also demonstrate how to quantize model using [NNCF](https://github.com/openvinotoolkit/nncf)
+
+The image below illustrates the result of the segmented area of the image by provided points
+
+![example.png](https://github.com/openvinotoolkit/openvino_notebooks/assets/29454499/15d0a23a-0550-43c6-9ca9-f689e772a79a)
+
+
+### Notebook Contents
+
+The tutorial consists of the following steps:
+
+- Install prerequisites
+- Load PyTorch model
+- Run PyTorch model inference
+- Convert PyTorch model to OpenVINO Intermediate Representation
+- Run OpenVINO model inference
+- Optimize OpenVINO model using [NNCF](https://github.com/openvinotoolkit/nncf)
+- Launch interactive segmentation demo
+
+## Installation Instructions
+
+This is a self-contained example that relies solely on its own code.</br>
+We recommend  running the notebook in a virtual environment. You only need a Jupyter server to start.
+For details, please refer to [Installation Guide](../../README.md).