Skip to content

Commit 8d8ffaf

Browse files
authored
Links update (#251)
* Updated desc for odd size in readme * fixed language * Added odd size issue to issues description in tutorial: * Removed unnecessary line causing a warning message * Updated instructions for skipping notebook execution * Updated absolute links to relative links in documentation * added hidden tags to dataset download cells * Updated link checker' * Updated link checker' * Updated link checker' * Updated link checker' * Updated link checker' * Updated tutorial * Revert accidentally hidden cells * Updated tqdm to tqdm.auto * Updated docs requirements * Updated tutorial notebooks * Updated tags
1 parent 972f060 commit 8d8ffaf

11 files changed

+72
-46
lines changed

.github/workflows/links.yml

+9-4
Original file line numberDiff line numberDiff line change
@@ -17,10 +17,15 @@ jobs:
1717
find . -name '*.html' -delete
1818
- run: |
1919
find . -name '*.md' -exec pandoc -i {} -o {}.html \;
20-
- uses: anishathalye/proof-html@v1
20+
- uses: anishathalye/proof-html@v2
2121
with:
2222
directory: .
23+
check_html: false
2324
check_favicon: false
24-
empty_alt_ignore: true
25-
url_ignore_re: |
26-
^https:\/\/twitter.com\/CleanlabAI
25+
ignore_missing_alt: true
26+
ignore_empty_alt: true
27+
tokens: |
28+
{"https://github.com": "${{ secrets.GITHUB_TOKEN }}"}
29+
swap_urls: |
30+
{"^(\\..*)\\.md(#?.*)$": "\\1.md.html\\2",
31+
"^(https://github\\.com/.*)#.*$": "\\1"}

DEVELOPMENT.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -123,7 +123,7 @@ pip install -r docs/requirements.txt
123123
sphinx-build docs/source cleanvision-docs
124124
```
125125

126-
**Note for faster build**: Executing the Jupyter Notebooks (i.e., the .ipynb files) that make up some portion of the docs, such as the tutorials, takes a long time. If you want to skip rendering these, set the environment variable `SKIP_NOTEBOOKS=1`. You can either set this using `export SKIP_NOTEBOOKS=1`
126+
**Note for faster build**: Executing the Jupyter Notebooks (i.e., the .ipynb files) that make up some portion of the docs, such as the tutorials, takes a long time. If you want to skip rendering these, add `nbsphinx_execute = 'never' to [sphinx configuration](docs/source/conf.py)
127127

128128
4. To view the docs open the file `cleanvision-docs/index.html` file in a browser.
129129

README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -89,7 +89,7 @@ In any collection of image files (most [formats](https://pillow.readthedocs.io/e
8989
| 6 | Light | Irregularly bright images (*over*exposed) | light | ![](https://raw.githubusercontent.com/cleanlab/assets/master/cleanvision/example_issue_images/light.jpg) |
9090
| 7 | Grayscale | Images lacking color | grayscale | ![](https://raw.githubusercontent.com/cleanlab/assets/master/cleanvision/example_issue_images/grayscale.jpg) |
9191
| 8 | Odd Aspect Ratio | Images with an unusual aspect ratio (overly skinny/wide) | odd_aspect_ratio | ![](https://raw.githubusercontent.com/cleanlab/assets/master/cleanvision/example_issue_images/odd_aspect_ratio.jpg) |
92-
| 9 | Odd Size | Images that are abnormally large or small | odd_size | <img src="https://raw.githubusercontent.com/cleanlab/assets/master/cleanvision/example_issue_images/odd_size.png" width=20% height=20%> |
92+
| 9 | Odd Size | Images that are abnormally large or small compared to the rest of the dataset | odd_size | <img src="https://raw.githubusercontent.com/cleanlab/assets/master/cleanvision/example_issue_images/odd_size.png" width=20% height=20%> |
9393

9494
CleanVision supports Linux, macOS, and Windows and runs on Python 3.7+.
9595

docs/requirements.txt

+11-18
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,12 @@
1-
sphinx==5.1.1
2-
sphinx-tabs==3.4.1
3-
nbsphinx==0.8.8
4-
autodocsumm==0.2.9
1+
sphinx==7.1.2
2+
sphinx-tabs==3.4.5
3+
nbsphinx==0.9.3
4+
autodocsumm==0.2.12
55
sphinx-multiversion==0.2.4
6-
sphinx-copybutton==0.5.0
7-
sphinxcontrib-katex==0.8.6
8-
sphinx-autodoc-typehints==1.19.2
9-
furo==2022.06.21
10-
numpy>=1.20.0
11-
pandas>=1.1.5
12-
Pillow>=9.3
13-
matplotlib>=3.4
14-
tqdm>=4.53.0
15-
imagehash>=4.2.0
16-
datasets>=2.7.0
17-
torchvision>=0.12.0
18-
ipykernel==6.8.0
19-
ipywidgets==7.6.5
6+
sphinx-copybutton==0.5.2
7+
sphinxcontrib-katex==0.9.9
8+
sphinx-autodoc-typehints==1.25.2
9+
furo==2023.09.10
10+
ipykernel==6.29.0
11+
ipywidgets==8.1.1
12+
ipython==8.0.1

docs/source/conf.py

-1
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,6 @@
7777

7878
html_title = ""
7979
html_theme = "furo"
80-
html_static_path = ["_static"]
8180
html_logo = "https://raw.githubusercontent.com/cleanlab/assets/master/cleanlab/cleanlab_logo_only.png"
8281

8382
html_theme_options = {

docs/source/faq.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ CleanVision is independent of any machine learning tasks as it directly works on
1010
2. **Can I check for specific issues in my dataset?**
1111

1212

13-
Yes, you can specify issues like ``light`` or ``blurry`` in the issue_types argument when calling ``Imagelab.find_issues``
13+
Yes, you can specify issues like ``light`` or ``blurry`` in the issue_types argument when calling :py:meth:`~cleanvision.imagelab.Imagelab.find_issues`
1414

1515
.. code-block:: python3
1616

docs/source/index.rst

+4-4
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
Documentation
66
=======================================
77

8-
CleanVision automatically detects various issues in image datasets, such as images that are: (near) duplicates, blurry,
8+
CleanVision automatically detects various issues in your image data, such as images that are: (near) duplicates, blurry,
99
over/under-exposed, etc. This data-centric AI package is designed as a quick first step for any computer vision project
1010
to find problems in your dataset, which you may want to address before applying machine learning.
1111

@@ -120,9 +120,9 @@ CleanVision works smoothly with Torchvision datasets too:
120120
121121
Additional Resources
122122
--------------------
123-
- Get started with our `Example Notebook <https://cleanvision.readthedocs.io/en/latest/tutorials/tutorial.html>`_
124-
- Explore more `Example Notebooks <https://github.com/cleanlab/cleanvision-examples>`_
125-
- Learn how to contribute in the `Contribution Guide <https://github.com/cleanlab/cleanvision/blob/main/CONTRIBUTING.md>`_
123+
- Get started with `Starter Tutorial <tutorials/tutorial.ipynb>`_.
124+
- View more `code examples <https://github.com/cleanlab/cleanvision-examples>`_ that demonstrate how to use CleanVision on various datasets.
125+
- Interested in contributing to CleanVision? Check out our `Contribution Guide <https://github.com/cleanlab/cleanvision/blob/main/CONTRIBUTING.md>`_ to get started.
126126

127127

128128
.. toctree::

docs/source/tutorials/custom_issue_manager.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
import numpy as np
44
import pandas as pd
55
from PIL import Image
6-
from tqdm import tqdm
6+
from tqdm.auto import tqdm
77

88
from cleanvision.dataset.base_dataset import Dataset
99
from cleanvision.issue_managers import register_issue_manager

docs/source/tutorials/huggingface_dataset.ipynb

+19-4
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,19 @@
4444
"from cleanvision import Imagelab"
4545
]
4646
},
47+
{
48+
"cell_type": "code",
49+
"execution_count": null,
50+
"metadata": {
51+
"nbsphinx": "hidden"
52+
},
53+
"outputs": [],
54+
"source": [
55+
"import warnings\n",
56+
"\n",
57+
"warnings.filterwarnings(\"ignore\")"
58+
]
59+
},
4760
{
4861
"cell_type": "markdown",
4962
"metadata": {},
@@ -60,7 +73,9 @@
6073
{
6174
"cell_type": "code",
6275
"execution_count": null,
63-
"metadata": {},
76+
"metadata": {
77+
"tags": []
78+
},
6479
"outputs": [],
6580
"source": [
6681
"dataset = load_dataset(\"cats_vs_dogs\", split=\"train\")"
@@ -184,7 +199,7 @@
184199
"metadata": {},
185200
"outputs": [],
186201
"source": [
187-
"imagelab.issues"
202+
"imagelab.issues.head()"
188203
]
189204
},
190205
{
@@ -243,7 +258,7 @@
243258
"cell_type": "markdown",
244259
"metadata": {},
245260
"source": [
246-
"**For more detailed guide on how to use CleanVision, check the [tutorial notebook](https://github.com/cleanlab/cleanvision/blob/main/docs/source/tutorials/tutorial.ipynb).**"
261+
"**For more detailed guide on how to use CleanVision, check the** [tutorial notebook](tutorial.ipynb)."
247262
]
248263
}
249264
],
@@ -263,7 +278,7 @@
263278
"name": "python",
264279
"nbconvert_exporter": "python",
265280
"pygments_lexer": "ipython3",
266-
"version": "3.8.5"
281+
"version": "3.11.7"
267282
}
268283
},
269284
"nbformat": 4,

docs/source/tutorials/torchvision_dataset.ipynb

+7-4
Original file line numberDiff line numberDiff line change
@@ -70,9 +70,12 @@
7070
"cell_type": "code",
7171
"execution_count": null,
7272
"id": "3d207006",
73-
"metadata": {},
73+
"metadata": {
74+
"tags": []
75+
},
7476
"outputs": [],
7577
"source": [
78+
"%%capture\n",
7679
"train_set = CIFAR10(root=\"./\", download=True)\n",
7780
"test_set = CIFAR10(root=\"./\", train=False, download=True)"
7881
]
@@ -200,7 +203,7 @@
200203
"metadata": {},
201204
"outputs": [],
202205
"source": [
203-
"imagelab.issues"
206+
"imagelab.issues.head()"
204207
]
205208
},
206209
{
@@ -264,7 +267,7 @@
264267
"id": "75912aea",
265268
"metadata": {},
266269
"source": [
267-
"**For more detailed guide on how to use CleanVision, check the [tutorial notebook](https://github.com/cleanlab/cleanvision/blob/main/docs/source/tutorials/tutorial.ipynb).**"
270+
"**For more detailed guide on how to use CleanVision, check the** [tutorial notebook](tutorial.ipynb)."
268271
]
269272
}
270273
],
@@ -284,7 +287,7 @@
284287
"name": "python",
285288
"nbconvert_exporter": "python",
286289
"pygments_lexer": "ipython3",
287-
"version": "3.11.0"
290+
"version": "3.10.0"
288291
}
289292
},
290293
"nbformat": 4,

docs/source/tutorials/tutorial.ipynb

+18-7
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,7 @@
5959
"| 6 | Blurry | Images that are blurry or out of focus | blurry |\n",
6060
"| 7 | Grayscale | Images that are grayscale (lacking color) | grayscale |\n",
6161
"| 8 | Low Information | Images that lack much information (e.g. a completely black image with a few white dots) | low_information |\n",
62+
"| 9 | Odd Size | Images that are abnormally large or small compared to the rest of the dataset | odd_size |\n",
6263
"\n",
6364
"\n",
6465
"The **Issue Key** column specifies the name for each type of issue in CleanVision code. See our examples which use these keys to detect only particular issue types and specify nondefault parameter settings to use when checking for certain issues."
@@ -150,7 +151,7 @@
150151
"cell_type": "markdown",
151152
"metadata": {},
152153
"source": [
153-
"The main way to interface with your data is via the [Imagelab](https://cleanvision.readthedocs.io/en/latest/cleanvision/imagelab.html#cleanvision.imagelab.Imagelab) class. This class can be used to understand the issues in your dataset at a high level (global overview) and low level (issues and quality scores for each image) as well as additional information about the dataset. It has three main attributes:\n",
154+
"The main way to interface with your data is via the [Imagelab](../cleanvision/imagelab.rst#cleanvision.imagelab.Imagelab) class. This class can be used to understand the issues in your dataset at a high level (global overview) and low level (issues and quality scores for each image) as well as additional information about the dataset. It has three main attributes:\n",
154155
"\n",
155156
"- `Imagelab.issue_summary`\n",
156157
"- `Imagelab.issues`\n",
@@ -645,7 +646,7 @@
645646
"cell_type": "markdown",
646647
"metadata": {},
647648
"source": [
648-
"You can also create a custom issue type by extending the base class `IssueManager`. CleanVision can then detect your custom issue along with other pre-defined issues in any image dataset! Here's an example of a custom issue manager, which can also be found in the [examples/](https://github.com/cleanlab/cleanvision/blob/main/examples/custom_issue_manager.py) folder of the source code."
649+
"You can also create a custom issue type by extending the base class [IssueManager](../cleanvision/utils/base_issue_manager.rst#cleanvision.utils.base_issue_manager.IssueManager). CleanVision can then detect your custom issue along with other pre-defined issues in any image dataset! Here's an example of a custom issue manager, which can also be found [here](https://github.com/cleanlab/cleanvision/blob/main/docs/source/tutorials/custom_issue_manager.py)"
649650
]
650651
},
651652
{
@@ -659,7 +660,7 @@
659660
"import numpy as np\n",
660661
"import pandas as pd\n",
661662
"from PIL import Image\n",
662-
"from tqdm import tqdm\n",
663+
"from tqdm.auto import tqdm\n",
663664
"\n",
664665
"from cleanvision.dataset.base_dataset import Dataset\n",
665666
"from cleanvision.issue_managers import register_issue_manager\n",
@@ -778,11 +779,21 @@
778779
{
779780
"cell_type": "code",
780781
"execution_count": null,
781-
"metadata": {},
782+
"metadata": {
783+
"tags": []
784+
},
782785
"outputs": [],
783786
"source": [
784787
"issue_types = {issue_name: {}}\n",
785-
"imagelab.find_issues(issue_types)\n",
788+
"imagelab.find_issues(issue_types)"
789+
]
790+
},
791+
{
792+
"cell_type": "code",
793+
"execution_count": null,
794+
"metadata": {},
795+
"outputs": [],
796+
"source": [
786797
"imagelab.report()"
787798
]
788799
},
@@ -791,7 +802,7 @@
791802
"cell_type": "markdown",
792803
"metadata": {},
793804
"source": [
794-
"Beyond the collection of image files demonstrated here, you can alternatively run CleanVision on: [Hugging Face datasets](https://github.com/cleanlab/cleanvision/blob/main/docs/source/tutorials/huggingface_dataset.ipynb), [torchvision datasets](https://github.com/cleanlab/cleanvision/blob/main/docs/source/tutorials/torchvision_dataset.ipynb), as well as [files in cloud storage buckets like S3, GCS, or Azure](https://github.com/cleanlab/cleanvision-examples/blob/main/cloud_dataset.ipynb)."
805+
"Beyond the collection of image files demonstrated here, you can alternatively run CleanVision on: [Hugging Face datasets](huggingface_dataset.ipynb), [torchvision datasets](torchvision_dataset.ipynb), as well as [files in cloud storage buckets like S3, GCS, or Azure](https://github.com/cleanlab/cleanvision-examples/blob/main/cloud_dataset.ipynb)."
795806
]
796807
}
797808
],
@@ -811,7 +822,7 @@
811822
"name": "python",
812823
"nbconvert_exporter": "python",
813824
"pygments_lexer": "ipython3",
814-
"version": "3.11.2"
825+
"version": "3.11.7"
815826
}
816827
},
817828
"nbformat": 4,

0 commit comments

Comments
 (0)