Skip to content

Commit 3b42901

Browse files
committed
module 3 edits
1 parent e0ddbd3 commit 3b42901

File tree

1 file changed

+14
-14
lines changed

1 file changed

+14
-14
lines changed

03-project-organization.Rmd

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ Keeping your files organized is a skill that has a high long-term payoff. As you
1717
ottrpal::include_slide("https://docs.google.com/presentation/d/1LMurysUhCjZb7DVF6KS9QmJ5NBjwWVjRn40MS9f2noE/edit#slide=id.gf7bed24491_1_56")
1818
```
1919

20-
@Tayo2019 discusses four particular reasons why it is important to organize your project:
20+
@Tayo2019 discusses four reasons why it is important to organize your project:
2121

2222
> 1. Organization increases productivity. If a project is well organized, with everything placed in one directory, it makes it easier to avoid wasting time searching for project files such as datasets, codes, output files, and so on.
2323
> 2. A well-organized project helps you to keep and maintain a record of your ongoing and completed data science projects.
@@ -48,8 +48,8 @@ Getting more specific, here's some ideas of how to organize your project:
4848

4949
- **Make file names informative** to those who don't have knowledge of the project but avoid using spaces, quotes, or unusual characters in your filenames and folders -- these only serve to make reading in files a nightmare in some programs.
5050
- **Number scripts** in the order that they are run.
51-
- **Keep like-files together** in their own directory: results tables with other results tables, etc. _Including most importantly keeping raw data separate from processed data or other results!_
52-
- **Put source scripts and functions in their own directory**. Things that should never need to be called directly by yourself or anyone else.
51+
- **Keep like-files together** in their own directory: results tables with other results tables, etc. _Most importantly, keep raw data separate from processed data or other results!_
52+
- **Put source scripts and functions in their own directory**. These are things that are called directly by yourself or anyone else.
5353
- **Put output in its own directories** like `results` and `plots`.
5454
- **Have a central document (like a README)** that describes the basic information about the analysis and how to re-run it.
5555
- Make it easy on yourself, **dates aren't necessary**. The computer keeps track of those.
@@ -62,14 +62,14 @@ Let's see what these principles might look like put into practice.
6262
Here's an example of what this might look like:
6363
```
6464
project-name/
65-
├── run_analysis.sh
65+
├── run-analysis.sh
6666
├── 00-download-data.sh
6767
├── 01-make-heatmap.Rmd
6868
├── README.md
6969
├── plots/
7070
│ └── project-name-heatmap.png
7171
├── results/
72-
│ └── top_gene_results.tsv
72+
│ └── top-gene-results.tsv
7373
├── raw-data/
7474
│ ├── project-name-raw.tsv
7575
│ └── project-name-metadata.tsv
@@ -82,14 +82,14 @@ project-name/
8282

8383
**What these hypothetical files and folders contain:**
8484

85-
- `run_analysis.sh` - A central script that runs everything again
86-
- `00-download-data.sh` - The script that needs to be run first and is called by run_analysis.sh
87-
- `01-make-heatmap.Rmd` - The script that needs to be run second and is also called by run_analysis.sh
88-
- `README.md` - The document that has the information that will orient someone to this project, we'll discuss more about how to create a helpful README in [an upcoming chapter](https://jhudatascience.org/Reproducibility_in_Cancer_Informatics/documenting-analyses.html#readmes).
85+
- `run-analysis.sh` - A central script that runs everything again
86+
- `00-download-data.sh` - The script that needs to be run first and is called by run-analysis.sh
87+
- `01-make-heatmap.Rmd` - The script that needs to be run second and is also called by run-analysis.sh
88+
- `README.md` - A document that will orient someone to this project. We'll discuss more about how to create a helpful README in [an upcoming chapter](https://jhudatascience.org/Reproducibility_in_Cancer_Informatics/documenting-analyses.html#readmes).
8989
- `plots` - A folder of plots and resulting images
9090
- `results` - A folder results
91-
- `raw-data` - Data files as they first arrive and **nothing** has been done to them yet.
92-
- `processed-data` - Data that has been modified from the raw in some way.
91+
- `raw-data` - Data files as they first arrive (**nothing** has been done to them yet)
92+
- `processed-data` - Data that has been modified from their raw form in some way
9393
- `util` - A folder of utilities that never needs to be called or touched directly unless troubleshooting something
9494

9595
## Readings about organizational strategies for data science projects:
@@ -99,7 +99,7 @@ You can read through some of these articles to think about what kind of organiza
9999

100100
- [Jenny Bryan's organizational strategies](https://www.stat.ubc.ca/~jenny/STAT545A/block19_codeFormattingOrganization.html) [@Bryan2021].
101101
- [Danielle Navarro's organizational strategies](https://www.youtube.com/playlist?list=PLRPB0ZzEYegPiBteC2dRn95TX9YefYFyy) @Navarro2021
102-
- [Jenny Bryan on Project-oriented workflows](https://www.tidyverse.org/blog/2017/12/workflow-vs-script/)[@Bryan2017].
102+
- [Jenny Bryan on Project-oriented workflows](https://www.tidyverse.org/blog/2017/12/workflow-vs-script/) [@Bryan2017].
103103
- [Data Carpentry mini-course about organizing projects](https://datacarpentry.org/organization-genomics/) [@DataCarpentry2021].
104104
- [Andrew Severin's strategy for organization](https://bioinformaticsworkbook.org/projectManagement/Intro_projectManagement.html#gsc.tab=0) [@Severin2021].
105105
- [A BioStars thread where many individuals share their own organizational strategies](https://www.biostars.org/p/821/) [@Biostars2021].
@@ -141,8 +141,8 @@ unzip -o chapter-zips/r-heatmap-chapt-3.zip -d chapter-zips/
141141

142142
Using your computer's GUI (drag, drop, and clicking), organize the files that are part of this project.
143143

144-
1. Organized these files using an organizational scheme similar to [what is described above](#example organizational-scheme).
145-
1. Create folders like `plots`, `results`, and `data` folder. Note that `aggregated_metadata.json` and `LICENSE.TXT` also belong in the `data` folder.
144+
1. Organized these files using an organizational scheme similar to [what is described above](#example-organizational-scheme).
145+
1. Create folders like `plots`, `results`, and `data`. Note that `aggregated_metadata.json` and `LICENSE.TXT` also belong in the `data` folder.
146146
1. You will want to delete any files that say "OLD". Keeping multiple versions of your scripts around is a recipe for mistakes and confusion. In the advanced course we will discuss how to use version control to help you track this more elegantly.
147147

148148
After your files are organized, you are ready to move on to the next chapter and create a notebook!

0 commit comments

Comments
 (0)