Skip to content

Commit d52e7b5

Browse files
diberrygewarren
andauthored
CleanRepo docs - update from contribut guide review (#387)
Co-authored-by: Genevieve Warren <[email protected]>
1 parent ea044c0 commit d52e7b5

File tree

1 file changed

+13
-43
lines changed

1 file changed

+13
-43
lines changed

cleanrepo/README.md

Lines changed: 13 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -49,65 +49,35 @@ This command-line tool helps you clean up a DocFx-based content repo. It can:
4949
CleanRepo.exe --orphaned-images
5050
```
5151

52-
## Text to image examples
52+
## Image to text examples
5353

54-
The text-to-image functionality supported in the `--catalog-images-with-text` and `--filter-images-for-text` options is provided by the [Tesseract](https://www.nuget.org/packages/tesseract/) NuGet package.
54+
The text-to-image functionality supported in the `--catalog-images-with-text` and `--filter-images-for-text` options is provided by the [Tesseract](https://www.nuget.org/packages/tesseract/) NuGet package.
5555

5656
### Get the Tesseract models
5757

58-
You must determine which Tesseract models you want to use and install them on your system. Tesseract models are generated per operating system. Tesseract models come in a variety of sizes. You will also need to download the language data files for tesseract 4.0.0 or above from [tesseract-tessdata](https://github.com/tesseract-ocr/tessdata/). Use the `--ocr-model-directory` value to set the path.
58+
You must determine which Tesseract models you want to use and install them on your system. Tesseract models are generated per operating system. Tesseract models come in a variety of sizes. You also need to download the language data files for Tesseract 4.0.0 or later from [tesseract-tessdata](https://github.com/tesseract-ocr/tessdata/). Use the `--ocr-model-directory` value to set the path.
5959

6060
### Catalog images with text
6161

62-
To catalog the images with text:
62+
To catalog all the images in a specified directory along with the text shown in each image:
6363

64-
```console
65-
CleanRepo --catalog-images-with-text \
66-
--url-base-path=/azure/developer/javascript \
67-
--articles-directory=c:\\Users\\diberry\\repos\\writing\\docs\\azure-dev-docs-pr-2\\articles \
68-
--media-directory=c:\\Users\\diberry\\repos\\writing\\docs\\azure-dev-docs-pr-2\\articles\\javascript\\media
69-
--ocr-model-directory=c:\\Users\\diberry\\repos\\temp\\tesseract\\tessdata_fast
64+
```shell
65+
CleanRepo.exe --catalog-images-with-text --url-base-path=/azure/developer/javascript
66+
--articles-directory=c:\azure-docs-pr\articles --media-directory=c:\azure-docs-pr\articles\javascript\media --ocr-model-directory=c:\tesseract\tessdata_fast
7067
```
7168

72-
The output file is prefixed with `ImageFiles-` and looks like:
73-
74-
```json
75-
76-
```
69+
The output file is prefixed with `OcrImageFiles-`
7770

7871
### Filter images with text
79-
8072

81-
To file images based on an array of string, use the `--filter-text-json-file` path to the JSON file with the text to filter for:
73+
To filter images based on one or more strings, use the `--filter-text-json-file` path to the JSON file with the text to filter for:
8274

8375
```json
8476
["Azure","Microsoft"]
8577
```
8678

79+
```shell
80+
CleanRepo.exe --filter-images-for-text --filter-text-json-file=c:\filter-text.json --url-base-path=/azure/developer/javascript --ocr-model-directory=c:\tesseract\tessdata_fast --articles-directory=c:\azure-docs-pr\articles --media-directory=c:\azure-docs-pr\articles\javascript\media
81+
```
8782

88-
```console
89-
CleanRepo --filter-images-for-text \
90-
--filter-text-json-file=c:\\Users\\diberry\\repos\\filter-text.json \
91-
--url-base-path=/azure/developer/javascript \
92-
--ocr-model-directory=c:\\Users\\diberry\\repos\\temp\\tesseract\\tessdata_fast \
93-
--articles-directory=c:\\Users\\diberry\\repos\\writing\\docs\\azure-dev-docs-pr-2\\articles \
94-
--media-directory=c:\\Users\\diberry\\repos\\writing\\docs\\azure-dev-docs-pr-2\\articles\\javascript\\media
95-
```
96-
97-
The output file is prefixed with `FilteredOcrImageFiles-` and looks like:
98-
99-
```json
100-
{
101-
"Azure": [
102-
{
103-
"Key": "c:\\Users\\diberry\\repos\\writing\\docs\\azure-dev-docs-pr-2\\articles\\javascript\\media\\visual-studio-code-azure-resources-extension-remove-resource-group.png",
104-
"Value": "*J File Edit Selection View Go Run Terminal Help\n\nQa AZURE oo\n\n\u003E FUNCTIONS\n-v RESOURCE GROUPS\n\\ \u0026 Pay-As-You-Go-diberry Y\n|\nEdit Tags...\n\u00A3\nView Properties\nte Open in Portal\nRefresh\n90\n\n \n\n \n\n \n\n \n"
105-
}],
106-
"Microsoft": [
107-
{
108-
"Key": "c:\\Users\\diberry\\repos\\writing\\docs\\azure-dev-docs-pr-2\\articles\\javascript\\media\\azure-function-resource-group-management\\azure-portal-function-application-insights-link.png",
109-
"Value": "Function App\n\n\u00AE Overview\n\n \n\n| View Application Insights data G)\n\n \n\n \n\n \n\n \n\nActivity log Link to an Application Insights resource\n8. Access control (IAM)\n\u00A9 tes \u00A9 temepiseaieiin yt eb ise ea\n\n@ Diagnose and solve problems\n\n\u00A9 Microsoft Defender for Cloud @ totum Apptzation ihe of check that Applicaton nights OK ard the insramentaion key are removed rm your apliaton,\n\n\u0026 events (preview)\n\nFunctions O) \u00E9sarteg etiam caer toe Gorman Vier Tc home\nApplication Insights. You have the option to disable non-essential data collection, Learn more\n(A) Functions\n\u00A9 App keys\nChange your resource\nB App files\n\n \n\nDeployment\n\n= Deployment slots\n@ Deployment Center\nSettings\n\nHl Configuration\n\n\u0026\u0026 Authentication\n\n\u00AE Application insights\n\n \n"
110-
},
111-
]
112-
}
113-
```
83+
The output file is prefixed with `FilteredOcrImageFiles-`.

0 commit comments

Comments
 (0)