You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@Tayo2019 discusses four particular reasons why it is important to organize your project:
20
+
@Tayo2019 discusses four reasons why it is important to organize your project:
21
21
22
22
> 1. Organization increases productivity. If a project is well organized, with everything placed in one directory, it makes it easier to avoid wasting time searching for project files such as datasets, codes, output files, and so on.
23
23
> 2. A well-organized project helps you to keep and maintain a record of your ongoing and completed data science projects.
@@ -48,8 +48,8 @@ Getting more specific, here's some ideas of how to organize your project:
48
48
49
49
-**Make file names informative** to those who don't have knowledge of the project but avoid using spaces, quotes, or unusual characters in your filenames and folders -- these only serve to make reading in files a nightmare in some programs.
50
50
-**Number scripts** in the order that they are run.
51
-
-**Keep like-files together** in their own directory: results tables with other results tables, etc. _Including most importantly keeping raw data separate from processed data or other results!_
52
-
-**Put source scripts and functions in their own directory**. Things that should never need to be called directly by yourself or anyone else.
51
+
-**Keep like-files together** in their own directory: results tables with other results tables, etc. _Most importantly, keep raw data separate from processed data or other results!_
52
+
-**Put source scripts and functions in their own directory**. These are things that are called directly by yourself or anyone else.
53
53
-**Put output in its own directories** like `results` and `plots`.
54
54
-**Have a central document (like a README)** that describes the basic information about the analysis and how to re-run it.
55
55
- Make it easy on yourself, **dates aren't necessary**. The computer keeps track of those.
@@ -62,14 +62,14 @@ Let's see what these principles might look like put into practice.
62
62
Here's an example of what this might look like:
63
63
```
64
64
project-name/
65
-
├── run_analysis.sh
65
+
├── run-analysis.sh
66
66
├── 00-download-data.sh
67
67
├── 01-make-heatmap.Rmd
68
68
├── README.md
69
69
├── plots/
70
70
│ └── project-name-heatmap.png
71
71
├── results/
72
-
│ └── top_gene_results.tsv
72
+
│ └── top-gene-results.tsv
73
73
├── raw-data/
74
74
│ ├── project-name-raw.tsv
75
75
│ └── project-name-metadata.tsv
@@ -82,14 +82,14 @@ project-name/
82
82
83
83
**What these hypothetical files and folders contain:**
84
84
85
-
-`run_analysis.sh` - A central script that runs everything again
86
-
-`00-download-data.sh` - The script that needs to be run first and is called by run_analysis.sh
87
-
-`01-make-heatmap.Rmd` - The script that needs to be run second and is also called by run_analysis.sh
88
-
-`README.md` - The document that has the information that will orient someone to this project, we'll discuss more about how to create a helpful README in [an upcoming chapter](https://jhudatascience.org/Reproducibility_in_Cancer_Informatics/documenting-analyses.html#readmes).
85
+
-`run-analysis.sh` - A central script that runs everything again
86
+
-`00-download-data.sh` - The script that needs to be run first and is called by run-analysis.sh
87
+
-`01-make-heatmap.Rmd` - The script that needs to be run second and is also called by run-analysis.sh
88
+
-`README.md` - A document that will orient someone to this project. We'll discuss more about how to create a helpful README in [an upcoming chapter](https://jhudatascience.org/Reproducibility_in_Cancer_Informatics/documenting-analyses.html#readmes).
89
89
-`plots` - A folder of plots and resulting images
90
90
-`results` - A folder results
91
-
-`raw-data` - Data files as they first arrive and **nothing** has been done to them yet.
92
-
-`processed-data` - Data that has been modified from the raw in some way.
91
+
-`raw-data` - Data files as they first arrive (**nothing** has been done to them yet)
92
+
-`processed-data` - Data that has been modified from their raw form in some way
93
93
-`util` - A folder of utilities that never needs to be called or touched directly unless troubleshooting something
94
94
95
95
## Readings about organizational strategies for data science projects:
@@ -99,7 +99,7 @@ You can read through some of these articles to think about what kind of organiza
-[Jenny Bryan on Project-oriented workflows](https://www.tidyverse.org/blog/2017/12/workflow-vs-script/)[@Bryan2017].
102
+
-[Jenny Bryan on Project-oriented workflows](https://www.tidyverse.org/blog/2017/12/workflow-vs-script/)[@Bryan2017].
103
103
-[Data Carpentry mini-course about organizing projects](https://datacarpentry.org/organization-genomics/)[@DataCarpentry2021].
104
104
-[Andrew Severin's strategy for organization](https://bioinformaticsworkbook.org/projectManagement/Intro_projectManagement.html#gsc.tab=0)[@Severin2021].
105
105
-[A BioStars thread where many individuals share their own organizational strategies](https://www.biostars.org/p/821/)[@Biostars2021].
Using your computer's GUI (drag, drop, and clicking), organize the files that are part of this project.
143
143
144
-
1. Organized these files using an organizational scheme similar to [what is described above](#exampleorganizational-scheme).
145
-
1. Create folders like `plots`, `results`, and `data` folder. Note that `aggregated_metadata.json` and `LICENSE.TXT` also belong in the `data` folder.
144
+
1. Organized these files using an organizational scheme similar to [what is described above](#example-organizational-scheme).
145
+
1. Create folders like `plots`, `results`, and `data`. Note that `aggregated_metadata.json` and `LICENSE.TXT` also belong in the `data` folder.
146
146
1. You will want to delete any files that say "OLD". Keeping multiple versions of your scripts around is a recipe for mistakes and confusion. In the advanced course we will discuss how to use version control to help you track this more elegantly.
147
147
148
148
After your files are organized, you are ready to move on to the next chapter and create a notebook!
0 commit comments