Skip to content

Commit 4825fbd

Browse files
author
w. Patrick Gale
committed
version 1
1 parent 4bb9b2d commit 4825fbd

20 files changed

+420
-68
lines changed

CHANGELOG.md

Lines changed: 17 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -5,16 +5,11 @@ All notable changes to this project will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8-
## [0.0.2] - future
98

10-
- [ ] add Algolia search once the site is production ready
11-
- [ ] `cmfcmf/docusaurus-search-local` search plugin needs better configuration
12-
- [ ] local search linking to pages such as `https://jocoknow.vercel.app/docs/jocoknow-extras/translate-your-site/docs/jocoknow-extras/translate-your-site#translate-a-doc` which do not exist
13-
- [ ] add resources for developing data management plan (https://dmponline.dcc.ac.uk/) and data curation logs
14-
- [ ] add a template for logging data curation steps
15-
- [ ] need to add 'levels' to the docs/intro document and make it more user-friendly
16-
- [ ] update `intro` document with links and more information
17-
- [ ] write instructions for a curation log example using JSON and code to parse the file into Markdown; include methods for adding ToDo items to the log to help encourage users to make use of the log; add ability to include question or checklist types to a curation task **IMPORTANT**
9+
## [1.0.0] - Sept. 20, 2024
10+
11+
- [x] update `intro` document with links and more information
12+
- [x] removed default Docusaurus styling
1813
- [x] create python packages for handling basic API functions
1914
- [x] added cheerio resolutions to the package.json file to correct an issue with local search
2015
- [x] adding additional documentation for a data curation checklist and steps to help automate a curation log
@@ -26,4 +21,16 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
2621
- [x] updated Docusaurus to version 3.5.2
2722
- [x] testing orama search plugin, but both the local and cloud version have an error needing to be fixed in https://github.com/askorama/orama/issues/728
2823
- [x] testing DocSearch crawler https://docsearch.algolia.com/docs/legacy/run-your-own/#run-the-crawl-from-the-docker-image and setting up an .env.prod set of environment variables, but it keeps throwing `Unreachable hosts` error
29-
- [x] adding a basic `cmfcmf/docusaurus-search-local` search plugin (needs configuration changes but is a good starter)
24+
- [x] adding a basic `cmfcmf/docusaurus-search-local` search plugin (needs configuration changes but is a good starter search tool)
25+
26+
## [x.x.x] - future ideas
27+
28+
- [ ] write instructions on how to link this documentation site builder with the Jupyter notebooks used for dataset curation to publish the curation logs to this site under dataset specific directories/pages (using shutil commands to copy files from the notebook directories to this site builder)
29+
- [ ] add Algolia search once the site is production ready and write up a document regarding how users can best search for data
30+
- [ ] write a document which contains the code and resources to build a curation log generator and how it compares to the curation log for https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/SKP9IB
31+
- [ ] `cmfcmf/docusaurus-search-local` search plugin needs better configuration
32+
- [ ] local search linking to pages such as `https://jocoknow.vercel.app/docs/jocoknow-extras/translate-your-site/docs/jocoknow-extras/translate-your-site#translate-a-doc` which do not exist
33+
- [ ] add resources for developing data management plan (https://dmponline.dcc.ac.uk/) and data curation logs
34+
- [ ] add a template for logging data curation steps
35+
- [ ] write instructions for a curation log example using JSON and code to parse the file into Markdown; include methods for adding ToDo items to the log to help encourage users to make use of the log; add ability to include question or checklist types to a curation task **IMPORTANT**
36+
- [ ] try to query the Dataverse API to retrieve a list of datasets to help autogenerate a dataset listing README file that links to the individual README files of each dataset (remove dataset category JSON and replace with this)

README.md

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# About this code
22

3-
This repository was started in August of 2024 and uses [Docusaurus 3](https://docusaurus.io), a modern static website generator. The idea behind the repository is to provide an knowledgebase guide into understanding and working with the JoCo publicly available data.
3+
This repository was started in August of 2024 and uses [Docusaurus 3](https://docusaurus.io), a modern static website generator. The idea behind the repository is to provide a knowledgebase guide into understanding and working with the JoCo publicly available data.
44

55
## Why Docusaurus
66

@@ -9,21 +9,22 @@ The reason for Docusaurus is it is a CMS framework with integrations for things
99
Other benefits are:
1010

1111
- The Docusaurus framework automatically checks for broken links in your documents when you build the code (whether you are building locally or letting Vercel build for you). This is great for quality assurance.
12-
- Looking good from the start, means a framework that has been designed by professional graphic artists for accessibility and out-of-the-box visual appeal. So you can simply focus on content and not building a site from scratch.
12+
- Looking good from the start, means a framework that has been designed by professional graphic artists for accessibility and out-of-the-box visual appeal. You can simply focus on content and not c creating a site from scratch.
1313

1414
## Using this code on Vercel
1515

16-
The easiest way use this code (for your own project or to help contribute to this documentation) is to fork this repository in GitHub then log into Vercel.com and create a new project using your forked repo. Vercel can be used as a cloud testing environment, to help you avoid the need to setup NodeJs and NPM on your local computer (but you could do that as well). Vercel will build from your GitHub code repository and setup a test site so you could simply make changes to your forked repository code in GitHub.com, and see the changes in Vercel as you make code updates.
16+
The easiest way to use this code (for your own project or to help contribute to this documentation) is to fork this repository in GitHub then log into Vercel.com and create a new project using your forked repo. You can use Vercel as a cloud testing environment, to help you avoid the need to setup NodeJs and NPM on your local computer (but you could do that as well). Vercel will build from your GitHub code repository and set up a test site. This allows you to simply make changes to your forked repository code in GitHub.com, and see the changes in Vercel as you make code updates.
1717

18-
Note, some of the application settings are set as environment variables. The `.env.example` file contains place-holder values for the environment. These variables will need to be added to your Vercel project since you do not want sensitive environment API keys or passwords published/embedded in your repository code. Once you have the code deployed to Vercel from GitHub, a link to your test site on Vercel will added to your main repository page on GitHub.
18+
Note, several application settings are set as environment variables. The `.env.example` file contains place-holder values for the environment. You will need to add these variables to your Vercel project since you do not want sensitive environment API keys or passwords published/embedded in your repository code. After deploying the code to Vercel from GitHub, a link to your test site on Vercel will be included on your main repository page on GitHub.
1919

2020
## Using the code locally
2121

22-
See Docusaurus documentation for this. It is not recommended. Try using StackBlitz (which is free but slow to rebuild the site) or some other virtual service first, to save you the headache of installing dependencies on your computer.
23-
24-
22+
See Docusaurus documentation for this. Try using StackBlitz as an alternative (which is free but slow to rebuild the site) or some other virtual service first, to save you the headache of installing dependencies on your computer.
23+
For me, I open a WSL command and paste the following command to ensure I am in the correct directory before I try to run Docusausus:
2524
`cd "/mnt/c/Users/pgale/University of North Carolina at Chapel Hill/TarcStudyDataRepository - Files/DataPull/364-dp/Note3/jocoknow"`
2625

26+
**Technical Note: If you are testing Docusaurus locally and the command line that serves the Docusaurus site is closed unexpectedly, it is best to simply log out of the computer and log back so you can start the site using a new instance (since `port in use` issues will be difficult to resolve otherwise).**
27+
2728
## Update or add packages
2829

2930
Use Yarn via https://yarnpkg.com/getting-started/install. When adding something like a search module to the package.json script, run `yarn up` to update the packages or `yarn add` to add a package.
@@ -34,12 +35,13 @@ To run the build first run `yarn build` then `yarn run serve` (this is ideal bec
3435

3536
Review the documentation at https://docsearch.algolia.com/docs/legacy/run-your-own/ or https://docsearch.algolia.com/docs/legacy/config-file
3637

37-
- First we need to log into Algolia and select or create the application we want to use (which in this case is JoCoKnow)
38+
- Log into Algolia and select or create the application we want to use (which in this case is JoCoKnow)
3839
- Next we need to create the index for this application and give the index a name
3940
- Next add an API needs the ACL addObject, editSettings and deleteIndex.
4041

4142
the `Search-Only API Key` and copy that to our environment file.
4243

43-
4444
Here we will use the `.env.prod` file to define our APPLICATION_ID and API_KEY
4545
`docker run -it --env-file=.env.prod -e "CONFIG=$(cat jocoknow.config.json | jq -r tostring)" algolia/docsearch-scraper`
46+
47+

docs/contact.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,4 +4,8 @@ sidebar_position: 2
44

55
# Contact Us
66

7-
If you have questions for us regarding the JoCo data or would like to leave us suggestions and feedback, please use the following form: [https://unc.az1.qualtrics.com/jfe/form/SV_3ruQfOlS87zcyZE](https://unc.az1.qualtrics.com/jfe/form/SV_3ruQfOlS87zcyZE?RuPath=https://jocoknow.vercel.app/).
7+
Use the form below to submit any questions you may have regarding the JoCo data or would like to leave us suggestions and feedback.
8+
9+
:::info[Get in touch]
10+
### [https://unc.az1.qualtrics.com/jfe/form/SV_3ruQfOlS87zcyZE](https://unc.az1.qualtrics.com/jfe/form/SV_3ruQfOlS87zcyZE?RuPath=https://jocoknow.vercel.app/)
11+
:::

docs/curation-tools/CURATED-checklist.md

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -109,12 +109,13 @@ In this step, examine the dataset closely to understand what it is, how the file
109109
- [ ] Examine files, organization, and documentation more thoroughly.
110110
- [ ] Are there changes that could enhance the dataset?
111111
- [ ] Are there missing data?
112-
- [ ] Could a user with similar qualifications to the author's understand and reuse these data and reproduce the results? Are the data, documentation and/or metadata presented in a way that aids in interpretation? (e.g., [README Example](https://deepblue.lib.umich.edu/data/Deep\_Blue\_Data\_Example\_Readme.txt))
112+
- [ ] Could a user with similar qualifications to the author's understand and reuse these data and reproduce the results?
113+
- [ ] Are the data, documentation and/or metadata presented in a way that aids in interpretation? (e.g., [README Example](https://deepblue.lib.umich.edu/data/Deep\_Blue\_Data\_Example\_Readme.txt))
113114
- [ ] Record all questions and concerns in Curation Log.
114115

115116
*Tasks vary based on file formats and subject domain. Sample tasks based on format:*
116117

117-
**Tabular Data (e.g, Microsoft Excel) Questions:**
118+
**Tabular Data (e.g. Microsoft Excel) Questions:**
118119

119120
- [ ] Check the organization of the data–is it well-structured?
120121
- [ ] Are headers/codes clearly defined?
@@ -133,8 +134,8 @@ In this step, examine the dataset closely to understand what it is, how the file
133134
- [ ] Is the code commented, i.e., did the author provide descriptive information on sections of code?
134135
- [ ] Is data for input missing? Are environmental conditions and parameters noted? Is it clear which language(s) and version(s) are used?
135136
- [ ] Does the code use absolute paths or relative paths? If absolute paths, is this documented in the README?
136-
- [ ] Are packages or additional libraries used? Is so, is this noted with clear use instructions?
137-
- [ ] Are any data organized consistently for access by the code?
137+
- [ ] Are packages or additional libraries used? I so, is this noted with clear use instructions?
138+
- [ ] Is data organized consistently for access by the code?
138139
- [ ] Is there an indication of whether the depositor intends users to be able to run the code and reproduce results, or just see the process used?
139140

140141
To view additional UNDERSTAND steps based on format, view the following primers:
@@ -238,7 +239,7 @@ In this step we ensure metadata conforms to repository and/or appropriate discip
238239
- [ ] Add subject terms
239240
- [ ] Ensure keywords are sufficient and representative
240241
- [ ] Record all changes in the Curation Log
241-
- [ ] Provide suggestions to improve accessibility of content (e.g., alt-text or additional descriptions; color contrast; etc.)
242+
- [ ] Provide suggestions to improve accessibility of content (e.g., alt-text or additional descriptions, color contrast, etc.)
242243

243244
:::
244245

@@ -249,7 +250,7 @@ In this step we ensure metadata conforms to repository and/or appropriate discip
249250
In this step, consider the file formats in the dataset to make them more interoperable, reusable, preservation friendly, and non-proprietary when possible.<sup>[1](#footnote-1)</sup> Common TRANSFORM steps include:
250251

251252
- Identify specialized file formats and their restrictions (e.g., Is the software freely available? If so, link to it or archive it alongside the data)
252-
- Propose open source or more reusable formats when appropriate
253+
- Propose open-source or more reusable formats when appropriate
253254
- Retain original file formats
254255

255256
#### Footnote 1
@@ -267,7 +268,7 @@ In this step, consider the file formats in the dataset to make them more interop
267268
- [ ] If not, recommend conversion
268269
- [ ] Retain original formats
269270
- [ ] Check whether software needed is readily available
270-
- [ ] Suggest open source options, if applicable and appropriate
271+
- [ ] Suggest open-source options, if applicable and appropriate
271272
- [ ] Ensure software and software version is documented
272273
- [ ] Convert any data visualization(s) that are not accessible (e.g., R [visualizations](https://github.com/DataCurationNetwork/data-primers/blob/master/R%20Data%20Curation%20Primer/R-data-curation-primer.md#accessibility-considerations), which need to be converted for screen reader use, or visualizations that do not meet color contrast guidelines) Reorganize files as appropriate
273274
- [ ] Standardize file names
@@ -294,7 +295,7 @@ In this step, review the dataset and companion data record against international
294295
### Key Ethical Considerations
295296

296297
- Final review--remember it is not too late to surface any ethical concerns.
297-
- Verify the words/language being used are not racist/harmful.
298+
- Verify you are not using racist/harmful words/language.
298299
- Remind the submitter of their responsibility if they choose to ignore requests for de-identification or similar concerns.
299300

300301
### Essential Tasks

docs/curation-tools/_category_.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"label": "Data Curation Tools",
3-
"position": 4,
3+
"position": 6,
44
"link": {
55
"type": "generated-index",
66
"description": "We want to help you manage your data as well as we have, by providing you with the toolset we used to manage this project."

docs/curation-tools/curation_template.md

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -89,12 +89,13 @@ In this step, examine the dataset closely to understand what it is, how the file
8989
- [ ] Examine files, organization, and documentation more thoroughly.
9090
- [ ] Are there changes that could enhance the dataset?
9191
- [ ] Are there missing data?
92-
- [ ] Could a user with similar qualifications to the author's understand and reuse these data and reproduce the results? Are the data, documentation and/or metadata presented in a way that aids in interpretation? (e.g., [README Example](https://deepblue.lib.umich.edu/data/Deep\_Blue\_Data\_Example\_Readme.txt))
92+
- [ ] Could a user with similar qualifications to the author's understand and reuse these data and reproduce the results?
93+
- [ ] Are the data, documentation and/or metadata presented in a way that aids in interpretation? (e.g., [README Example](https://deepblue.lib.umich.edu/data/Deep\_Blue\_Data\_Example\_Readme.txt))
9394
- [ ] Record all questions and concerns in Curation Log.
9495

9596
*Tasks vary based on file formats and subject domain. Sample tasks based on format:*
9697

97-
**Tabular Data (e.g, Microsoft Excel) Questions:**
98+
**Tabular Data (e.g. Microsoft Excel) Questions:**
9899

99100
- [ ] Check the organization of the data–is it well-structured?
100101
- [ ] Are headers/codes clearly defined?
@@ -113,7 +114,7 @@ In this step, examine the dataset closely to understand what it is, how the file
113114
- [ ] Is the code commented, i.e., did the author provide descriptive information on sections of code?
114115
- [ ] Is data for input missing? Are environmental conditions and parameters noted? Is it clear which language(s) and version(s) are used?
115116
- [ ] Does the code use absolute paths or relative paths? If absolute paths, is this documented in the README?
116-
- [ ] Are packages or additional libraries used? Is so, is this noted with clear use instructions?
117+
- [ ] Are packages or additional libraries used? If so, is this noted with clear use instructions?
117118
- [ ] Are any data organized consistently for access by the code?
118119
- [ ] Is there an indication of whether the depositor intends users to be able to run the code and reproduce results, or just see the process used?
119120

@@ -229,7 +230,7 @@ In this step we ensure metadata conforms to repository and/or appropriate discip
229230
In this step, consider the file formats in the dataset to make them more interoperable, reusable, preservation friendly, and non-proprietary when possible.<sup>[1](#footnote-1)</sup> Common TRANSFORM steps include:
230231

231232
- Identify specialized file formats and their restrictions (e.g., Is the software freely available? If so, link to it or archive it alongside the data)
232-
- Propose open source or more reusable formats when appropriate
233+
- Propose open-source or more reusable formats when appropriate
233234
- Retain original file formats
234235

235236
#### Footnote 1
@@ -247,7 +248,7 @@ In this step, consider the file formats in the dataset to make them more interop
247248
- [ ] If not, recommend conversion
248249
- [ ] Retain original formats
249250
- [ ] Check whether software needed is readily available
250-
- [ ] Suggest open source options, if applicable and appropriate
251+
- [ ] Suggest open-source options, if applicable and appropriate
251252
- [ ] Ensure software and software version is documented
252253
- [ ] Convert any data visualization(s) that are not accessible (e.g., R [visualizations](https://github.com/DataCurationNetwork/data-primers/blob/master/R%20Data%20Curation%20Primer/R-data-curation-primer.md#accessibility-considerations), which need to be converted for screen reader use, or visualizations that do not meet color contrast guidelines) Reorganize files as appropriate
253254
- [ ] Standardize file names
@@ -274,7 +275,7 @@ In this step, review the dataset and companion data record against international
274275
### Key Ethical Considerations
275276

276277
- Final review--remember it is not too late to surface any ethical concerns.
277-
- Verify the words/language being used are not racist/harmful.
278+
- Verify you are not using racist/harmful words/language.
278279
- Remind the submitter of their responsibility if they choose to ignore requests for de-identification or similar concerns.
279280

280281
### Essential Tasks

0 commit comments

Comments
 (0)