You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-[ ] local search linking to pages such as `https://jocoknow.vercel.app/docs/jocoknow-extras/translate-your-site/docs/jocoknow-extras/translate-your-site#translate-a-doc` which do not exist
13
-
-[ ] add resources for developing data management plan (https://dmponline.dcc.ac.uk/) and data curation logs
14
-
-[ ] add a template for logging data curation steps
15
-
-[ ] need to add 'levels' to the docs/intro document and make it more user-friendly
16
-
-[ ] update `intro` document with links and more information
17
-
-[ ] write instructions for a curation log example using JSON and code to parse the file into Markdown; include methods for adding ToDo items to the log to help encourage users to make use of the log; add ability to include question or checklist types to a curation task **IMPORTANT**
9
+
## [1.0.0] - Sept. 20, 2024
10
+
11
+
-[x] update `intro` document with links and more information
12
+
-[x] removed default Docusaurus styling
18
13
-[x] create python packages for handling basic API functions
19
14
-[x] added cheerio resolutions to the package.json file to correct an issue with local search
20
15
-[x] adding additional documentation for a data curation checklist and steps to help automate a curation log
@@ -26,4 +21,16 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
26
21
-[x] updated Docusaurus to version 3.5.2
27
22
-[x] testing orama search plugin, but both the local and cloud version have an error needing to be fixed in https://github.com/askorama/orama/issues/728
28
23
-[x] testing DocSearch crawler https://docsearch.algolia.com/docs/legacy/run-your-own/#run-the-crawl-from-the-docker-image and setting up an .env.prod set of environment variables, but it keeps throwing `Unreachable hosts` error
29
-
-[x] adding a basic `cmfcmf/docusaurus-search-local` search plugin (needs configuration changes but is a good starter)
24
+
-[x] adding a basic `cmfcmf/docusaurus-search-local` search plugin (needs configuration changes but is a good starter search tool)
25
+
26
+
## [x.x.x] - future ideas
27
+
28
+
-[ ] write instructions on how to link this documentation site builder with the Jupyter notebooks used for dataset curation to publish the curation logs to this site under dataset specific directories/pages (using shutil commands to copy files from the notebook directories to this site builder)
29
+
-[ ] add Algolia search once the site is production ready and write up a document regarding how users can best search for data
30
+
-[ ] write a document which contains the code and resources to build a curation log generator and how it compares to the curation log for https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/SKP9IB
-[ ] local search linking to pages such as `https://jocoknow.vercel.app/docs/jocoknow-extras/translate-your-site/docs/jocoknow-extras/translate-your-site#translate-a-doc` which do not exist
33
+
-[ ] add resources for developing data management plan (https://dmponline.dcc.ac.uk/) and data curation logs
34
+
-[ ] add a template for logging data curation steps
35
+
-[ ] write instructions for a curation log example using JSON and code to parse the file into Markdown; include methods for adding ToDo items to the log to help encourage users to make use of the log; add ability to include question or checklist types to a curation task **IMPORTANT**
36
+
-[ ] try to query the Dataverse API to retrieve a list of datasets to help autogenerate a dataset listing README file that links to the individual README files of each dataset (remove dataset category JSON and replace with this)
This repository was started in August of 2024 and uses [Docusaurus 3](https://docusaurus.io), a modern static website generator. The idea behind the repository is to provide an knowledgebase guide into understanding and working with the JoCo publicly available data.
3
+
This repository was started in August of 2024 and uses [Docusaurus 3](https://docusaurus.io), a modern static website generator. The idea behind the repository is to provide a knowledgebase guide into understanding and working with the JoCo publicly available data.
4
4
5
5
## Why Docusaurus
6
6
@@ -9,21 +9,22 @@ The reason for Docusaurus is it is a CMS framework with integrations for things
9
9
Other benefits are:
10
10
11
11
- The Docusaurus framework automatically checks for broken links in your documents when you build the code (whether you are building locally or letting Vercel build for you). This is great for quality assurance.
12
-
- Looking good from the start, means a framework that has been designed by professional graphic artists for accessibility and out-of-the-box visual appeal. So you can simply focus on content and not building a site from scratch.
12
+
- Looking good from the start, means a framework that has been designed by professional graphic artists for accessibility and out-of-the-box visual appeal. You can simply focus on content and not c creating a site from scratch.
13
13
14
14
## Using this code on Vercel
15
15
16
-
The easiest way use this code (for your own project or to help contribute to this documentation) is to fork this repository in GitHub then log into Vercel.com and create a new project using your forked repo. Vercel can be used as a cloud testing environment, to help you avoid the need to setup NodeJs and NPM on your local computer (but you could do that as well). Vercel will build from your GitHub code repository and setup a test site so you could simply make changes to your forked repository code in GitHub.com, and see the changes in Vercel as you make code updates.
16
+
The easiest way to use this code (for your own project or to help contribute to this documentation) is to fork this repository in GitHub then log into Vercel.com and create a new project using your forked repo. You can use Vercel as a cloud testing environment, to help you avoid the need to setup NodeJs and NPM on your local computer (but you could do that as well). Vercel will build from your GitHub code repository and set up a test site. This allows you to simply make changes to your forked repository code in GitHub.com, and see the changes in Vercel as you make code updates.
17
17
18
-
Note, some of the application settings are set as environment variables. The `.env.example` file contains place-holder values for the environment. These variables will need to be added to your Vercel project since you do not want sensitive environment API keys or passwords published/embedded in your repository code. Once you have the code deployed to Vercel from GitHub, a link to your test site on Vercel will added to your main repository page on GitHub.
18
+
Note, several application settings are set as environment variables. The `.env.example` file contains place-holder values for the environment. You will need to add these variables to your Vercel project since you do not want sensitive environment API keys or passwords published/embedded in your repository code. After deploying the code to Vercel from GitHub, a link to your test site on Vercel will be included on your main repository page on GitHub.
19
19
20
20
## Using the code locally
21
21
22
-
See Docusaurus documentation for this. It is not recommended. Try using StackBlitz (which is free but slow to rebuild the site) or some other virtual service first, to save you the headache of installing dependencies on your computer.
23
-
24
-
22
+
See Docusaurus documentation for this. Try using StackBlitz as an alternative (which is free but slow to rebuild the site) or some other virtual service first, to save you the headache of installing dependencies on your computer.
23
+
For me, I open a WSL command and paste the following command to ensure I am in the correct directory before I try to run Docusausus:
25
24
`cd "/mnt/c/Users/pgale/University of North Carolina at Chapel Hill/TarcStudyDataRepository - Files/DataPull/364-dp/Note3/jocoknow"`
26
25
26
+
**Technical Note: If you are testing Docusaurus locally and the command line that serves the Docusaurus site is closed unexpectedly, it is best to simply log out of the computer and log back so you can start the site using a new instance (since `port in use` issues will be difficult to resolve otherwise).**
27
+
27
28
## Update or add packages
28
29
29
30
Use Yarn via https://yarnpkg.com/getting-started/install. When adding something like a search module to the package.json script, run `yarn up` to update the packages or `yarn add` to add a package.
@@ -34,12 +35,13 @@ To run the build first run `yarn build` then `yarn run serve` (this is ideal bec
34
35
35
36
Review the documentation at https://docsearch.algolia.com/docs/legacy/run-your-own/ or https://docsearch.algolia.com/docs/legacy/config-file
36
37
37
-
-First we need to log into Algolia and select or create the application we want to use (which in this case is JoCoKnow)
38
+
-Log into Algolia and select or create the application we want to use (which in this case is JoCoKnow)
38
39
- Next we need to create the index for this application and give the index a name
39
40
- Next add an API needs the ACL addObject, editSettings and deleteIndex.
40
41
41
42
the `Search-Only API Key` and copy that to our environment file.
42
43
43
-
44
44
Here we will use the `.env.prod` file to define our APPLICATION_ID and API_KEY
Copy file name to clipboardExpand all lines: docs/contact.md
+5-1Lines changed: 5 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -4,4 +4,8 @@ sidebar_position: 2
4
4
5
5
# Contact Us
6
6
7
-
If you have questions for us regarding the JoCo data or would like to leave us suggestions and feedback, please use the following form: [https://unc.az1.qualtrics.com/jfe/form/SV_3ruQfOlS87zcyZE](https://unc.az1.qualtrics.com/jfe/form/SV_3ruQfOlS87zcyZE?RuPath=https://jocoknow.vercel.app/).
7
+
Use the form below to submit any questions you may have regarding the JoCo data or would like to leave us suggestions and feedback.
Copy file name to clipboardExpand all lines: docs/curation-tools/CURATED-checklist.md
+9-8Lines changed: 9 additions & 8 deletions
Original file line number
Diff line number
Diff line change
@@ -109,12 +109,13 @@ In this step, examine the dataset closely to understand what it is, how the file
109
109
-[ ] Examine files, organization, and documentation more thoroughly.
110
110
-[ ] Are there changes that could enhance the dataset?
111
111
-[ ] Are there missing data?
112
-
-[ ] Could a user with similar qualifications to the author's understand and reuse these data and reproduce the results? Are the data, documentation and/or metadata presented in a way that aids in interpretation? (e.g., [README Example](https://deepblue.lib.umich.edu/data/Deep\_Blue\_Data\_Example\_Readme.txt))
112
+
-[ ] Could a user with similar qualifications to the author's understand and reuse these data and reproduce the results?
113
+
-[ ] Are the data, documentation and/or metadata presented in a way that aids in interpretation? (e.g., [README Example](https://deepblue.lib.umich.edu/data/Deep\_Blue\_Data\_Example\_Readme.txt))
113
114
-[ ] Record all questions and concerns in Curation Log.
114
115
115
116
*Tasks vary based on file formats and subject domain. Sample tasks based on format:*
116
117
117
-
**Tabular Data (e.g, Microsoft Excel) Questions:**
118
+
**Tabular Data (e.g. Microsoft Excel) Questions:**
118
119
119
120
-[ ] Check the organization of the data–is it well-structured?
120
121
-[ ] Are headers/codes clearly defined?
@@ -133,8 +134,8 @@ In this step, examine the dataset closely to understand what it is, how the file
133
134
-[ ] Is the code commented, i.e., did the author provide descriptive information on sections of code?
134
135
-[ ] Is data for input missing? Are environmental conditions and parameters noted? Is it clear which language(s) and version(s) are used?
135
136
-[ ] Does the code use absolute paths or relative paths? If absolute paths, is this documented in the README?
136
-
-[ ] Are packages or additional libraries used? Is so, is this noted with clear use instructions?
137
-
-[ ]Are any data organized consistently for access by the code?
137
+
-[ ] Are packages or additional libraries used? I so, is this noted with clear use instructions?
138
+
-[ ]Is data organized consistently for access by the code?
138
139
-[ ] Is there an indication of whether the depositor intends users to be able to run the code and reproduce results, or just see the process used?
139
140
140
141
To view additional UNDERSTAND steps based on format, view the following primers:
@@ -238,7 +239,7 @@ In this step we ensure metadata conforms to repository and/or appropriate discip
238
239
-[ ] Add subject terms
239
240
-[ ] Ensure keywords are sufficient and representative
240
241
-[ ] Record all changes in the Curation Log
241
-
-[ ] Provide suggestions to improve accessibility of content (e.g., alt-text or additional descriptions; color contrast; etc.)
242
+
-[ ] Provide suggestions to improve accessibility of content (e.g., alt-text or additional descriptions, color contrast, etc.)
242
243
243
244
:::
244
245
@@ -249,7 +250,7 @@ In this step we ensure metadata conforms to repository and/or appropriate discip
249
250
In this step, consider the file formats in the dataset to make them more interoperable, reusable, preservation friendly, and non-proprietary when possible.<sup>[1](#footnote-1)</sup> Common TRANSFORM steps include:
250
251
251
252
- Identify specialized file formats and their restrictions (e.g., Is the software freely available? If so, link to it or archive it alongside the data)
252
-
- Propose opensource or more reusable formats when appropriate
253
+
- Propose open-source or more reusable formats when appropriate
253
254
- Retain original file formats
254
255
255
256
#### Footnote 1
@@ -267,7 +268,7 @@ In this step, consider the file formats in the dataset to make them more interop
267
268
-[ ] If not, recommend conversion
268
269
-[ ] Retain original formats
269
270
-[ ] Check whether software needed is readily available
270
-
-[ ] Suggest opensource options, if applicable and appropriate
271
+
-[ ] Suggest open-source options, if applicable and appropriate
271
272
-[ ] Ensure software and software version is documented
272
273
-[ ] Convert any data visualization(s) that are not accessible (e.g., R [visualizations](https://github.com/DataCurationNetwork/data-primers/blob/master/R%20Data%20Curation%20Primer/R-data-curation-primer.md#accessibility-considerations), which need to be converted for screen reader use, or visualizations that do not meet color contrast guidelines) Reorganize files as appropriate
273
274
-[ ] Standardize file names
@@ -294,7 +295,7 @@ In this step, review the dataset and companion data record against international
294
295
### Key Ethical Considerations
295
296
296
297
- Final review--remember it is not too late to surface any ethical concerns.
297
-
- Verify the words/language being used are not racist/harmful.
298
+
- Verify you are not using racist/harmful words/language.
298
299
- Remind the submitter of their responsibility if they choose to ignore requests for de-identification or similar concerns.
Copy file name to clipboardExpand all lines: docs/curation-tools/curation_template.md
+7-6Lines changed: 7 additions & 6 deletions
Original file line number
Diff line number
Diff line change
@@ -89,12 +89,13 @@ In this step, examine the dataset closely to understand what it is, how the file
89
89
-[ ] Examine files, organization, and documentation more thoroughly.
90
90
-[ ] Are there changes that could enhance the dataset?
91
91
-[ ] Are there missing data?
92
-
-[ ] Could a user with similar qualifications to the author's understand and reuse these data and reproduce the results? Are the data, documentation and/or metadata presented in a way that aids in interpretation? (e.g., [README Example](https://deepblue.lib.umich.edu/data/Deep\_Blue\_Data\_Example\_Readme.txt))
92
+
-[ ] Could a user with similar qualifications to the author's understand and reuse these data and reproduce the results?
93
+
-[ ] Are the data, documentation and/or metadata presented in a way that aids in interpretation? (e.g., [README Example](https://deepblue.lib.umich.edu/data/Deep\_Blue\_Data\_Example\_Readme.txt))
93
94
-[ ] Record all questions and concerns in Curation Log.
94
95
95
96
*Tasks vary based on file formats and subject domain. Sample tasks based on format:*
96
97
97
-
**Tabular Data (e.g, Microsoft Excel) Questions:**
98
+
**Tabular Data (e.g. Microsoft Excel) Questions:**
98
99
99
100
-[ ] Check the organization of the data–is it well-structured?
100
101
-[ ] Are headers/codes clearly defined?
@@ -113,7 +114,7 @@ In this step, examine the dataset closely to understand what it is, how the file
113
114
-[ ] Is the code commented, i.e., did the author provide descriptive information on sections of code?
114
115
-[ ] Is data for input missing? Are environmental conditions and parameters noted? Is it clear which language(s) and version(s) are used?
115
116
-[ ] Does the code use absolute paths or relative paths? If absolute paths, is this documented in the README?
116
-
-[ ] Are packages or additional libraries used? Is so, is this noted with clear use instructions?
117
+
-[ ] Are packages or additional libraries used? If so, is this noted with clear use instructions?
117
118
-[ ] Are any data organized consistently for access by the code?
118
119
-[ ] Is there an indication of whether the depositor intends users to be able to run the code and reproduce results, or just see the process used?
119
120
@@ -229,7 +230,7 @@ In this step we ensure metadata conforms to repository and/or appropriate discip
229
230
In this step, consider the file formats in the dataset to make them more interoperable, reusable, preservation friendly, and non-proprietary when possible.<sup>[1](#footnote-1)</sup> Common TRANSFORM steps include:
230
231
231
232
- Identify specialized file formats and their restrictions (e.g., Is the software freely available? If so, link to it or archive it alongside the data)
232
-
- Propose opensource or more reusable formats when appropriate
233
+
- Propose open-source or more reusable formats when appropriate
233
234
- Retain original file formats
234
235
235
236
#### Footnote 1
@@ -247,7 +248,7 @@ In this step, consider the file formats in the dataset to make them more interop
247
248
-[ ] If not, recommend conversion
248
249
-[ ] Retain original formats
249
250
-[ ] Check whether software needed is readily available
250
-
-[ ] Suggest opensource options, if applicable and appropriate
251
+
-[ ] Suggest open-source options, if applicable and appropriate
251
252
-[ ] Ensure software and software version is documented
252
253
-[ ] Convert any data visualization(s) that are not accessible (e.g., R [visualizations](https://github.com/DataCurationNetwork/data-primers/blob/master/R%20Data%20Curation%20Primer/R-data-curation-primer.md#accessibility-considerations), which need to be converted for screen reader use, or visualizations that do not meet color contrast guidelines) Reorganize files as appropriate
253
254
-[ ] Standardize file names
@@ -274,7 +275,7 @@ In this step, review the dataset and companion data record against international
274
275
### Key Ethical Considerations
275
276
276
277
- Final review--remember it is not too late to surface any ethical concerns.
277
-
- Verify the words/language being used are not racist/harmful.
278
+
- Verify you are not using racist/harmful words/language.
278
279
- Remind the submitter of their responsibility if they choose to ignore requests for de-identification or similar concerns.
0 commit comments