Skip to content

Commit 7550af7

Browse files
bms63bms63manciniedoardoStefanThoma
authored
Closes #231 #267 blog post data packages relevant to the pharmaverse (#263)
* Delete posts/2024-01-04_end_of__year__up... directory * feat: #231 data blog init * docs: working on title and image * feat: #231 laid out 4 packages...was hoping for more! * feat: #231 added NEST and SafetyData * chore: a post was accidently deleted * Update posts/zzz_DO_NOT_EDIT_data__packages/data__packages.qmd Co-authored-by: Edoardo Mancini <[email protected]> * Update posts/zzz_DO_NOT_EDIT_data__packages/data__packages.qmd Co-authored-by: Edoardo Mancini <[email protected]> * Update posts/zzz_DO_NOT_EDIT_data__packages/data__packages.qmd Co-authored-by: Edoardo Mancini <[email protected]> * Update posts/zzz_DO_NOT_EDIT_data__packages/data__packages.qmd Co-authored-by: Edoardo Mancini <[email protected]> * Update posts/zzz_DO_NOT_EDIT_data__packages/data__packages.qmd Co-authored-by: Edoardo Mancini <[email protected]> * Update posts/zzz_DO_NOT_EDIT_data__packages/data__packages.qmd Co-authored-by: Edoardo Mancini <[email protected]> * Update posts/zzz_DO_NOT_EDIT_data__packages/data__packages.qmd Co-authored-by: Edoardo Mancini <[email protected]> * Update posts/zzz_DO_NOT_EDIT_data__packages/data__packages.qmd Co-authored-by: Edoardo Mancini <[email protected]> * feat: feedback from review; nice image at the end * chore: new spelling for the cause * chore: did the LLm fix my action? * chore: test linkchecker * fix: linkchecker back in action; chore: words again * Update posts/zzz_DO_NOT_EDIT_data__packages/data__packages.qmd Co-authored-by: StefanThoma <[email protected]> * Update posts/zzz_DO_NOT_EDIT_data__packages/data__packages.qmd Co-authored-by: StefanThoma <[email protected]> * Update data__packages.qmd * #231 attempt to split out update-post-dates and publishing workflows --------- Co-authored-by: bms63 <[email protected] config --listgit config --global user.email [email protected]> Co-authored-by: Edoardo Mancini <[email protected]> Co-authored-by: StefanThoma <[email protected]> Co-authored-by: Edoardo Mancini <[email protected]>
1 parent 1795070 commit 7550af7

File tree

9 files changed

+240
-40
lines changed

9 files changed

+240
-40
lines changed

.github/workflows/link_check.yml

+18-5
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,33 @@
11
name: Links (Fail Fast)
22

33
on:
4-
pull_request: {branches: ['main']}
4+
pull_request:
5+
branches:
6+
- main
7+
58
jobs:
69
linkChecker:
710
runs-on: ubuntu-latest
811
steps:
912
- uses: actions/checkout@v4
1013

11-
- name: switch .qmd to .md
12-
run: 'source("R/switch.R")'
13-
shell: Rscript -e
14+
- name: Set up R # Install R from CRAN
15+
uses: r-lib/actions/setup-r@v2
16+
with:
17+
r-version: '4.3.3' # You can specify a different R version if needed
18+
19+
- name: Install R packages
20+
run: |
21+
Rscript -e 'install.packages("fs")'
22+
shell: bash
23+
24+
- name: Switch .qmd to .md
25+
run: Rscript R/switch.R
26+
shell: bash
1427

1528
- name: Link Checker
1629
uses: lycheeverse/[email protected]
1730
with:
1831
fail: true
1932
env:
20-
GITHUB_TOKEN: ${{secrets.GITHUB_TOKEN}}
33+
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

.github/workflows/publish.yml

+3-31
Original file line numberDiff line numberDiff line change
@@ -2,39 +2,11 @@ name: Quarto Publish
22

33
on:
44
workflow_dispatch:
5-
push:
6-
branches: [main]
5+
repository_dispatch:
6+
types: [quarto-publish]
77

88
jobs:
9-
Update-post-dates:
10-
runs-on: ubuntu-latest
11-
container:
12-
image: "rocker/tidyverse:4.2.1"
13-
steps:
14-
- name: Checkout repository
15-
uses: actions/checkout@v4
16-
with:
17-
ref: main
18-
token: ${{ secrets.PHARMAVERSE_BOT }}
19-
20-
- name: Run update_post_dates
21-
run: Rscript R/update_post_dates.R # running the R script with Rscript
22-
23-
- name: Configure Git safe directory
24-
run: git config --global --add safe.directory /__w/blog/blog
25-
26-
- name: Commit and push changes
27-
uses: stefanzweifel/git-auto-commit-action@v5
28-
with:
29-
commit_message: "[skip actions] Auto-update blog post date"
30-
file_pattern: "."
31-
commit_user_name: github-actions
32-
commit_user_email: >-
33-
41898282+github-actions[bot]@users.noreply.github.com
34-
continue-on-error: true
35-
36-
build-deploy:
37-
needs: Update-post-dates
9+
build_deploy:
3810
runs-on: ubuntu-latest
3911
permissions:
4012
contents: write
+40
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
name: Update Post Dates
2+
3+
on:
4+
workflow_dispatch:
5+
push:
6+
branches: [main]
7+
8+
jobs:
9+
update_post_dates:
10+
runs-on: ubuntu-latest
11+
container:
12+
image: "rocker/tidyverse:4.2.1"
13+
steps:
14+
- name: Checkout repository
15+
uses: actions/checkout@v4
16+
with:
17+
ref: main
18+
token: ${{ secrets.PHARMAVERSE_BOT }}
19+
20+
- name: Run update_post_dates
21+
run: Rscript R/update_post_dates.R # running the R script with Rscript
22+
23+
- name: Configure Git safe directory
24+
run: git config --global --add safe.directory /__w/blog/blog
25+
26+
- name: Commit and push changes
27+
uses: stefanzweifel/git-auto-commit-action@v5
28+
with:
29+
commit_message: "[skip actions] Auto-update blog post date"
30+
file_pattern: "."
31+
commit_user_name: github-actions
32+
commit_user_email: >-
33+
41898282+github-actions[bot]@users.noreply.github.com
34+
continue-on-error: true
35+
36+
- name: Trigger Quarto Publish
37+
uses: peter-evans/repository-dispatch@v2
38+
with:
39+
token: ${{ secrets.GITHUB_TOKEN }}
40+
event-type: quarto-publish

README.md

+3-1
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,9 @@ install.packages(c("jsonlite",
112112
"rtables",
113113
"teal",
114114
"riskmetric",
115-
"tidyCDISC"))
115+
"tidyCDISC",
116+
"mirai",
117+
"admiralmetabolic"))
116118
```
117119
## How to Use the `blog` Docker Image for Local Development
118120

inst/WORDLIST.txt

+7-3
Original file line numberDiff line numberDiff line change
@@ -102,11 +102,11 @@ AMBUL
102102
amd
103103
amongst
104104
Amor
105-
Anders
106105
analysing
107106
analytics
108107
Analytics
109108
aNCA
109+
Anders
110110
anderson
111111
andre
112112
André
@@ -170,6 +170,7 @@ BILIBL
170170
bindCache
171171
bindEvent
172172
biogen
173+
Biologics
173174
biomarker
174175
Biomarker
175176
biometrics
@@ -851,6 +852,8 @@ s
851852
sa
852853
sadchla
853854
Sadchla
855+
safetyData
856+
SafetyGraphics
854857
Salzburg
855858
Sanofi
856859
Sanofi's
@@ -940,6 +943,7 @@ Syon
940943
tagList
941944
tamor
942945
targetdatatype
946+
TAs
943947
Taşlıçukur
944948
Tatiana
945949
TatianaPXL
@@ -1016,8 +1020,8 @@ ubuntu
10161020
ucla
10171021
ug
10181022
ui
1019-
uk
10201023
UI
1024+
uk
10211025
Ul
10221026
un
10231027
Unardi
@@ -1073,8 +1077,8 @@ WAISTHGT
10731077
waisthip
10741078
WAISTHIP
10751079
Walkowiak
1076-
Walkthrough
10771080
walkthrough
1081+
Walkthrough
10781082
wasm
10791083
WAWA
10801084
wayback

media/data.jpg

21 KB
Loading
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
suppressMessages(library(dplyr))
2+
# markdown helpers --------------------------------------------------------
3+
4+
markdown_appendix <- function(name, content) {
5+
paste(paste("##", name, "{.appendix}"), " ", content, sep = "\n")
6+
}
7+
markdown_link <- function(text, path) {
8+
paste0("[", text, "](", path, ")")
9+
}
10+
11+
12+
13+
# worker functions --------------------------------------------------------
14+
15+
insert_source <- function(repo_spec, name,
16+
collection = "posts",
17+
branch = "main",
18+
host = "https://github.com",
19+
text = "Source",
20+
file_name) {
21+
path <- paste(
22+
host,
23+
repo_spec,
24+
"tree",
25+
branch,
26+
collection,
27+
name,
28+
file_name,
29+
sep = "/"
30+
)
31+
return(markdown_link(text, path))
32+
}
33+
34+
insert_timestamp <- function(tzone = Sys.timezone()) {
35+
time <- lubridate::now(tzone = tzone)
36+
stamp <- as.character(time, tz = tzone, usetz = TRUE)
37+
return(stamp)
38+
}
39+
40+
insert_lockfile <- function(repo_spec, name,
41+
collection = "posts",
42+
branch = "main",
43+
host = "https://github.com",
44+
text = "Session info") {
45+
path <- path <- "https://pharmaverse.github.io/blog/session_info.html"
46+
47+
return(markdown_link(text, path))
48+
}
49+
50+
51+
52+
# top level function ------------------------------------------------------
53+
54+
insert_appendix <- function(repo_spec, name, collection = "posts", file_name) {
55+
appendices <- paste(
56+
markdown_appendix(
57+
name = "Last updated",
58+
content = insert_timestamp()
59+
),
60+
" ",
61+
markdown_appendix(
62+
name = "Details",
63+
content = paste(
64+
insert_source(repo_spec, name, collection, file_name = file_name),
65+
# get renv information,
66+
insert_lockfile(repo_spec, name, collection),
67+
sep = ", "
68+
)
69+
),
70+
sep = "\n"
71+
)
72+
knitr::asis_output(appendices)
73+
}
21 KB
Loading
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
---
2+
title: "Collecting all the data!"
3+
author:
4+
- name: Ben Straub
5+
description: "Where is all the data? An intermittent attempt to continuously compile, collate, consolidate, and curate publicly available CDISC data useful for Clinical Reporting in R"
6+
# Note that the date below will be auto-updated when the post is merged.
7+
date: "2025-02-14"
8+
# Please do not use any non-default categories.
9+
# You can find the default categories in the repository README.md
10+
categories: [SDTM, ADaM, Community, Technical]
11+
# Feel free to change the image
12+
image: "data.jpg"
13+
14+
---
15+
16+
<!--------------- typical setup ----------------->
17+
18+
```{r setup, include=FALSE}
19+
long_slug <- "zzz_DO_NOT_EDIT_data__packages"
20+
library(link)
21+
link::auto(keep_pkg_prefix = FALSE)
22+
```
23+
24+
<!--------------- post begins here ----------------->
25+
26+
The purpose of this blog is to maintain an ongoing list of publicly available data packages, data in packages or data sources that align to CDISC standards. My hope is that this could be a resource for:
27+
28+
* those intrepid individuals looking to showcase new documentation, functions, packages and other tools
29+
* those enterprising individuals wanting to learn more about CDISC standards and exploring open-source tools.
30+
31+
The data presented below is just a start and is shown in order of how I found them. Feel free to get in touch with me for additions or clarifications. You can find me on pharmaverse slack by joining [here](https://pharmaverse.slack.com/). In fact, I encourage, nay implore you, to get in touch as this can't be all the data that we have available to us!
32+
33+
## pharmaversesdtm: SDTM Test Data for the Pharmaverse Family of Packages
34+
35+
A set of Study Data Tabulation Model (SDTM) datasets from the Clinical Data Interchange Standards Consortium (CDISC) pilot project used for testing and developing Analysis Data Model (ADaM) datasets inside the pharmaverse family of packages. A CDISC Pilot was conducted somewhere between 2008 and 2010. This is that Pilot data but slowly brought up to current CDISC standards. There are also new datasets in the same style (same `STUDYID`, `USUBJID`s, etc.) added by the {admiral} and the {admiral} extension package teams that provide test data for new domains or specific TAs (ophthalmology, vaccines, etc.).
36+
37+
Most common SDTM datasets can be found as well as some specific disease area SDTMs that are not available in the CDISC pilot datasets.
38+
39+
Available on [CRAN](https://cloud.r-project.org/web/packages/pharmaversesdtm/index.html). This package is actively maintained on [GitHub](https://github.com/pharmaverse/pharmaversesdtm)
40+
41+
## pharmaverseadam: ADaM Test Data for the Pharmaverse Family of Packages
42+
43+
A set of Analysis Data Model (ADaM) datasets constructed using the Study Data Tabulation Model (SDTM) datasets contained in the {pharmaversesdtm} package and the template scripts from the {admiral} family of packages.
44+
45+
Available on [CRAN](https://cloud.r-project.org/web/packages/pharmaverseadam/index.html). This package is actively maintained on [GitHub](https://github.com/pharmaverse/pharmaversesdtm)
46+
47+
## admiral: ADaM in R Asset Library
48+
49+
A toolbox for programming Clinical Data Interchange Standards Consortium (CDISC) compliant Analysis Data Model (ADaM) datasets in R. ADaM datasets are a mandatory part of any New Drug or Biologics License Application submitted to the United States Food and Drug Administration (FDA). Analysis derivations are implemented in accordance with the "Analysis Data Model Implementation Guide.
50+
51+
Limited datasets like `ADSL`, `ADLB` are provided in {admiral}, because the template scripts available in this package are used to create the ADaMs in {pharmaverseadam}.
52+
53+
Available on [CRAN](https://cran.r-project.org/web/packages/admiral/index.html). This package is actively maintained on [GitHub](https://github.com/pharmaverse/admiral).
54+
55+
## random.cdisc.data: Create Random ADaM Datasets
56+
57+
A set of functions to create *random* Analysis Data Model (ADaM) datasets and cached datasets. You can find a list of the possible random CDISC datasets generated [here](https://insightsengineering.github.io/random.cdisc.data/main/index.html). ADaM dataset specifications are described by the Clinical Data Interchange Standards Consortium (CDISC) Analysis Data Model Team. These datasets are used to power the [TLG Catalog](https://insightsengineering.github.io/tlg-catalog/stable/), though the NEST team is actively substituting them for {pharmaverseadam} datasets instead - see [a recent blog post](https://pharmaverse.github.io/blog/posts/2025-01-15_nest_and_pharmaverseadam/nest_and_pharmaverseadam.html) about this very effort!
58+
59+
60+
61+
Available on [CRAN](https://cran.r-project.org/web/packages/random.cdisc.data/index.html). The package is actively maintained on [GitHub](https://github.com/insightsengineering/random.cdisc.data) by the NEST team.
62+
63+
## safetyData: Clinical Trial Data
64+
65+
The package re-formats PHUSE's sample ADaM and SDTM datasets as an R package following R data best practices.
66+
67+
PHUSE released the data under the permissive MIT license, so reuse with attribution is encouraged. The data are especially useful for prototyping new tables, listings and figures and for writing automated tests.
68+
69+
Basic documentation for each data file is provided in help files (e.g. ?adam_adae). Full data specifications in the form of define.xml files can also be found at the links above (pdf for ADaM and pdf for SDTM).
70+
71+
Available on [CRAN](https://cran.r-project.org/web/packages/random.cdisc.data/index.html). The package is available on [GitHub](https://github.com/SafetyGraphics/safetyData).
72+
73+
74+
## NEST: Accelerating Clinical Reporting
75+
76+
[NEST](https://insightsengineering.github.io/nest/) is a collection of open-sourced R packages, which enables faster and more efficient insights generation under clinical research settings, for both exploratory and regulatory purposes.
77+
78+
They have a wealth of data generated for documentation, demonstrations and testing. You can find all the datasets and what packages they live in [here](https://insightsengineering.r-universe.dev/datasets).
79+
80+
## Collect all the data!
81+
82+
As you can see the list is short! Let me know if you have sources (big and small) and we can add to this list.
83+
84+
![](data.jpg){fig-align="center" width="220"}
85+
86+
<!--------------- appendices go here ----------------->
87+
88+
```{r, echo=FALSE}
89+
source("appendix.R")
90+
insert_appendix(
91+
repo_spec = "pharmaverse/blog",
92+
name = long_slug,
93+
# file_name should be the name of your file
94+
file_name = list.files() %>% stringr::str_subset(".qmd") %>% first()
95+
)
96+
```

0 commit comments

Comments
 (0)