DepthPro: Sharp Monocular Depth Estimation

DepthPro is a foundational model designed for zero-shot monocular depth estimation. Leveraging a multi-scale vision transformer (ViT-based, Dinov2), the model optimizes for dense predictions by processing images at multiple scales. Each image is split into patches, encoded using a shared patch encoder across scales, then merged, upsampled, and fused via a DPT decoder.

Research Paper: Depth Pro: Sharp Monocular Metric Depth in Less Than a Second
Authors: Aleksei Bochkovskii, Amaël Delaunoy, et al.
Official Code: apple/ml-depth-pro
Official Weights: apple/DepthPro
Unofficial Weights: geetu040/DepthPro
Web UI Interface: spaces/geetu040/DepthPro
Interface in Transformers (Open PR): huggingface/transformers#34583

DepthPro: Beyond Depth Estimation

In this repository, we use this architechture and the available pretrained weights for depth-estimation, to explore its capabilities in further image processings tasks like Image Segmentation and Image Super Resolution.

Quick Links

Task	Web UI Interface	Code-Based Inference and Weights	Training Code on Colab	Training Code on Kaggle	Training Logs	Validation Outputs
Depth Estimation	DepthPro	geetu040 / DepthPro	-	-	-	-
Human Segmentation	DepthPro Segmentation Human	geetu040 / DepthPro Segmentation Human		-	Training Logs	Validation Outputs
Super Resolution (4x 256p)	DepthPro SR 4x 256p	geetu040 / DepthPro SR 4x 256p			Training Logs	Validation Outputs
Super Resolution (4x 384p)	DepthPro SR 4x 384p	geetu040 / DepthPro SR 4x 384p			Training Logs	Validation Outputs

DepthPro: Image Segmentation (Human)

For Web UI Interface: spaces/geetu040/DepthPro_Segmentation_Human
For Code-Based Inference and model weights: geetu040/DepthPro_Segmentation_Human
For Training, check the notebook on:
- Segmentation_Human.ipynb

Input Image	Ground Truth	Prediction

We modify Apple's DepthPro for Monocular Depth Estimation model for Image Segmentation Task.

The pre-trained depth estimation model is used with slight changes in the head layer to make it compatible with the segmentation task.
Hidden features maps have been generated to get the insights of the encoder and fusion stages of the model.
For training and validation, we use Human Segmentation Dataset - Supervise.ly, from kaggle: tapakah68/supervisely-filtered-segmentation-person-dataset
- It contains 2667 samples which are randomly split into 80% training and 20% validation.
- each sample contains an image and its corresponding mask.
The model produces exceptional results on validation set with an IoU score of 0.964 and Dice score of 0.982, beating the previous state of art IoU score of 0.95 on this dataset.

See the training logs

See all Validation Outputs

DepthPro: Image Super Resolution (4x 256px)

For Web UI Interface: spaces/geetu040/DepthPro_SR_4x_256p
For Code-Based Inference and model weights: geetu040/DepthPro_SR_4x_256p
For Training, check the notebook on:

Low Resolution 256px (Input Image)	Super Resolution 1024px (Depth Pro)	High Resolution 1024px (Ground Truth)

We then modify Apple's DepthPro for Monocular Depth Estimation model for Image Super Resolution Task.

The base model architechture is modified for the task of Image Super Resolution from 256px to 1024px (4x upsampling).
For training and validation, we use Div2k dataset, introduced in NTIRE 2017 Challenge on Single Image Super-Resolution: Dataset and Study
- It contains high resolution images in 2k resolution, which have been downsampled to LR_SIZE=256 and HR_SIZE=1024 for training and validation.
- It contains
  - 800 training samples
  - 200 validation samples
- Dataset has been downloaded from kaggle: soumikrakshit/div2k-high-resolution-images
For testing, we use Urban100 dataset, introduced in Single Image Super-Resolution From Transformed Self-Exemplars
- It contains images in 2 resolutions, 256 (low) and 1024 (high).
- It contains 100 samples.
- Dataset has been downloaded from kaggle: harshraone/urban100
Results:
- Model achieves best PSRN score of 24.80 and SSIM score of 0.74 on validation set.
- PSRN score of 21.36 and SSIM score of 0.62 on test set.
- Model has been able to restore some of the information from low resolution images.
- Results are better than most of the generative techniques applied on kaggle, but still has a long way to go to achieve the state of art results.
- This is because of the nature of Vision Transformers, which are not specifically designed for Super Resolution tasks.

See the training logs

See all Validation Outputs

DepthPro: Image Super Resolution (4x 384px)

For Web UI Interface: spaces/geetu040/DepthPro_SR_4x_384p
For Code-Based Inference and model weights: geetu040/DepthPro_SR_4x_384p
For Training, check the notebook on:

Low Resolution 384px (Input Image)	Super Resolution 1536px (Depth Pro)	High Resolution 1536px (Ground Truth)

We use the modified Apple's DepthPro for Monocular Depth Estimation model for Image Super Resolution Task.

The base model architechture is modified for the task of Image Super Resolution from 384px to 1536px (4x upsampling).
For training and validation, we use Div2k dataset, introduced in NTIRE 2017 Challenge on Single Image Super-Resolution: Dataset and Study
- It contains high resolution images in 2k resolution, which have been downsampled to LR_SIZE=384 and HR_SIZE=1536 for training and validation.
- It contains
  - 800 training samples
  - 200 validation samples
- Dataset has been downloaded from kaggle: soumikrakshit/div2k-high-resolution-images
Results:
- Model achieves best PSRN score of 27.19 and SSIM score of 0.81 on validation set.
- Model has been able to restore some of the information from low resolution images.
- Results are better than the generative techniques applied on kaggle, but are slightly off to achieve the state of art results.
- This is because of the nature of Vision Transformers, which are not specifically designed for Super Resolution tasks.

See the training logs

See all Validation Outputs

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
LICENSE		LICENSE
README.md		README.md
Segmentation_Human.ipynb		Segmentation_Human.ipynb
SuperResolution_4x_256p.ipynb		SuperResolution_4x_256p.ipynb
SuperResolution_4x_384p.ipynb		SuperResolution_4x_384p.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DepthPro: Sharp Monocular Depth Estimation

DepthPro: Beyond Depth Estimation

DepthPro: Image Segmentation (Human)

DepthPro: Image Super Resolution (4x 256px)

DepthPro: Image Super Resolution (4x 384px)

About

Releases

Packages

Languages

License

geetu040/depthpro-beyond-depth

Folders and files

Latest commit

History

Repository files navigation

DepthPro: Sharp Monocular Depth Estimation

DepthPro: Beyond Depth Estimation

DepthPro: Image Segmentation (Human)

DepthPro: Image Super Resolution (4x 256px)

DepthPro: Image Super Resolution (4x 384px)

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages