Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closes #231 #267 blog post data packages relevant to the pharmaverse #263

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
535170b
Delete posts/2024-01-04_end_of__year__up... directory
bms63 Oct 30, 2024
a853e2f
feat: #231 data blog init
Oct 30, 2024
f272f42
Merge remote-tracking branch 'origin/main' into 231-blog-post-data-pa…
bms63 Dec 14, 2024
52ab692
docs: working on title and image
bms63 Dec 14, 2024
424ded7
feat: #231 laid out 4 packages...was hoping for more!
bms63 Dec 14, 2024
588a824
Merge remote-tracking branch 'origin/main' into 231-blog-post-data-pa…
bms63 Feb 14, 2025
17e4d03
feat: #231 added NEST and SafetyData
bms63 Feb 14, 2025
e7622c4
chore: a post was accidently deleted
bms63 Feb 14, 2025
88c99b8
Update posts/zzz_DO_NOT_EDIT_data__packages/data__packages.qmd
bms63 Feb 14, 2025
995c6f0
Update posts/zzz_DO_NOT_EDIT_data__packages/data__packages.qmd
bms63 Feb 14, 2025
7143391
Update posts/zzz_DO_NOT_EDIT_data__packages/data__packages.qmd
bms63 Feb 14, 2025
1bd884f
Update posts/zzz_DO_NOT_EDIT_data__packages/data__packages.qmd
bms63 Feb 14, 2025
35e996a
Update posts/zzz_DO_NOT_EDIT_data__packages/data__packages.qmd
bms63 Feb 14, 2025
eb71bca
Update posts/zzz_DO_NOT_EDIT_data__packages/data__packages.qmd
bms63 Feb 14, 2025
0a4e4f5
Update posts/zzz_DO_NOT_EDIT_data__packages/data__packages.qmd
bms63 Feb 14, 2025
6c1c3b3
Update posts/zzz_DO_NOT_EDIT_data__packages/data__packages.qmd
bms63 Feb 14, 2025
90216c2
feat: feedback from review; nice image at the end
bms63 Feb 14, 2025
f6a50f6
chore: new spelling for the cause
bms63 Feb 14, 2025
1267256
chore: did the LLm fix my action?
bms63 Feb 14, 2025
dbb20e7
chore: test linkchecker
bms63 Feb 14, 2025
cc58dd1
fix: linkchecker back in action; chore: words again
bms63 Feb 14, 2025
c699f1a
Update posts/zzz_DO_NOT_EDIT_data__packages/data__packages.qmd
bms63 Feb 17, 2025
0e7b9ee
Update posts/zzz_DO_NOT_EDIT_data__packages/data__packages.qmd
bms63 Feb 17, 2025
f10aa2c
Update data__packages.qmd
bms63 Feb 17, 2025
978dcbd
#231 attempt to split out update-post-dates and publishing workflows
manciniedoardo Feb 17, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 18 additions & 5 deletions .github/workflows/link_check.yml
Original file line number Diff line number Diff line change
@@ -1,20 +1,33 @@
name: Links (Fail Fast)

on:
pull_request: {branches: ['main']}
pull_request:
branches:
- main

jobs:
linkChecker:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: switch .qmd to .md
run: 'source("R/switch.R")'
shell: Rscript -e
- name: Set up R # Install R from CRAN
uses: r-lib/actions/setup-r@v2
with:
r-version: '4.3.3' # You can specify a different R version if needed

- name: Install R packages
run: |
Rscript -e 'install.packages("fs")'
shell: bash

- name: Switch .qmd to .md
run: Rscript R/switch.R
shell: bash

- name: Link Checker
uses: lycheeverse/[email protected]
with:
fail: true
env:
GITHUB_TOKEN: ${{secrets.GITHUB_TOKEN}}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
34 changes: 3 additions & 31 deletions .github/workflows/publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,39 +2,11 @@ name: Quarto Publish

on:
workflow_dispatch:
push:
branches: [main]
repository_dispatch:
types: [quarto-publish]

jobs:
Update-post-dates:
runs-on: ubuntu-latest
container:
image: "rocker/tidyverse:4.2.1"
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
ref: main
token: ${{ secrets.PHARMAVERSE_BOT }}

- name: Run update_post_dates
run: Rscript R/update_post_dates.R # running the R script with Rscript

- name: Configure Git safe directory
run: git config --global --add safe.directory /__w/blog/blog

- name: Commit and push changes
uses: stefanzweifel/git-auto-commit-action@v5
with:
commit_message: "[skip actions] Auto-update blog post date"
file_pattern: "."
commit_user_name: github-actions
commit_user_email: >-
41898282+github-actions[bot]@users.noreply.github.com
continue-on-error: true

build-deploy:
needs: Update-post-dates
build_deploy:
runs-on: ubuntu-latest
permissions:
contents: write
Expand Down
40 changes: 40 additions & 0 deletions .github/workflows/update_post_dates.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
name: Update Post Dates

on:
workflow_dispatch:
push:
branches: [main]

jobs:
update_post_dates:
runs-on: ubuntu-latest
container:
image: "rocker/tidyverse:4.2.1"
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
ref: main
token: ${{ secrets.PHARMAVERSE_BOT }}

- name: Run update_post_dates
run: Rscript R/update_post_dates.R # running the R script with Rscript

- name: Configure Git safe directory
run: git config --global --add safe.directory /__w/blog/blog

- name: Commit and push changes
uses: stefanzweifel/git-auto-commit-action@v5
with:
commit_message: "[skip actions] Auto-update blog post date"
file_pattern: "."
commit_user_name: github-actions
commit_user_email: >-
41898282+github-actions[bot]@users.noreply.github.com
continue-on-error: true

- name: Trigger Quarto Publish
uses: peter-evans/repository-dispatch@v2
with:
token: ${{ secrets.GITHUB_TOKEN }}
event-type: quarto-publish
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,9 @@ install.packages(c("jsonlite",
"rtables",
"teal",
"riskmetric",
"tidyCDISC"))
"tidyCDISC",
"mirai",
"admiralmetabolic"))
```
## How to Use the `blog` Docker Image for Local Development

Expand Down
10 changes: 7 additions & 3 deletions inst/WORDLIST.txt
Original file line number Diff line number Diff line change
Expand Up @@ -102,11 +102,11 @@ AMBUL
amd
amongst
Amor
Anders
analysing
analytics
Analytics
aNCA
Anders
anderson
andre
André
Expand Down Expand Up @@ -170,6 +170,7 @@ BILIBL
bindCache
bindEvent
biogen
Biologics
biomarker
Biomarker
biometrics
Expand Down Expand Up @@ -851,6 +852,8 @@ s
sa
sadchla
Sadchla
safetyData
SafetyGraphics
Salzburg
Sanofi
Sanofi's
Expand Down Expand Up @@ -940,6 +943,7 @@ Syon
tagList
tamor
targetdatatype
TAs
Taşlıçukur
Tatiana
TatianaPXL
Expand Down Expand Up @@ -1016,8 +1020,8 @@ ubuntu
ucla
ug
ui
uk
UI
uk
Ul
un
Unardi
Expand Down Expand Up @@ -1073,8 +1077,8 @@ WAISTHGT
waisthip
WAISTHIP
Walkowiak
Walkthrough
walkthrough
Walkthrough
wasm
WAWA
wayback
Expand Down
Binary file added media/data.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
73 changes: 73 additions & 0 deletions posts/zzz_DO_NOT_EDIT_data__packages/appendix.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
suppressMessages(library(dplyr))
# markdown helpers --------------------------------------------------------

markdown_appendix <- function(name, content) {
paste(paste("##", name, "{.appendix}"), " ", content, sep = "\n")
}
markdown_link <- function(text, path) {
paste0("[", text, "](", path, ")")
}



# worker functions --------------------------------------------------------

insert_source <- function(repo_spec, name,
collection = "posts",
branch = "main",
host = "https://github.com",
text = "Source",
file_name) {
path <- paste(
host,
repo_spec,
"tree",
branch,
collection,
name,
file_name,
sep = "/"
)
return(markdown_link(text, path))
}

insert_timestamp <- function(tzone = Sys.timezone()) {
time <- lubridate::now(tzone = tzone)
stamp <- as.character(time, tz = tzone, usetz = TRUE)
return(stamp)
}

insert_lockfile <- function(repo_spec, name,
collection = "posts",
branch = "main",
host = "https://github.com",
text = "Session info") {
path <- path <- "https://pharmaverse.github.io/blog/session_info.html"

return(markdown_link(text, path))
}



# top level function ------------------------------------------------------

insert_appendix <- function(repo_spec, name, collection = "posts", file_name) {
appendices <- paste(
markdown_appendix(
name = "Last updated",
content = insert_timestamp()
),
" ",
markdown_appendix(
name = "Details",
content = paste(
insert_source(repo_spec, name, collection, file_name = file_name),
# get renv information,
insert_lockfile(repo_spec, name, collection),
sep = ", "
)
),
sep = "\n"
)
knitr::asis_output(appendices)
}
Binary file added posts/zzz_DO_NOT_EDIT_data__packages/data.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
96 changes: 96 additions & 0 deletions posts/zzz_DO_NOT_EDIT_data__packages/data__packages.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
---
title: "Collecting all the data!"
author:
- name: Ben Straub
description: "Where is all the data? An intermittent attempt to continuously compile, collate, consolidate, and curate publicly available CDISC data useful for Clinical Reporting in R"
# Note that the date below will be auto-updated when the post is merged.
date: "2025-02-14"
# Please do not use any non-default categories.
# You can find the default categories in the repository README.md
categories: [SDTM, ADaM, Community, Technical]
# Feel free to change the image
image: "data.jpg"

---

<!--------------- typical setup ----------------->

```{r setup, include=FALSE}
long_slug <- "zzz_DO_NOT_EDIT_data__packages"
library(link)
link::auto(keep_pkg_prefix = FALSE)
```

<!--------------- post begins here ----------------->

The purpose of this blog is to maintain an ongoing list of publicly available data packages, data in packages or data sources that align to CDISC standards. My hope is that this could be a resource for:

* those intrepid individuals looking to showcase new documentation, functions, packages and other tools
* those enterprising individuals wanting to learn more about CDISC standards and exploring open-source tools.

The data presented below is just a start and is shown in order of how I found them. Feel free to get in touch with me for additions or clarifications. You can find me on pharmaverse slack by joining [here](https://pharmaverse.slack.com/). In fact, I encourage, nay implore you, to get in touch as this can't be all the data that we have available to us!

## pharmaversesdtm: SDTM Test Data for the Pharmaverse Family of Packages

A set of Study Data Tabulation Model (SDTM) datasets from the Clinical Data Interchange Standards Consortium (CDISC) pilot project used for testing and developing Analysis Data Model (ADaM) datasets inside the pharmaverse family of packages. A CDISC Pilot was conducted somewhere between 2008 and 2010. This is that Pilot data but slowly brought up to current CDISC standards. There are also new datasets in the same style (same `STUDYID`, `USUBJID`s, etc.) added by the {admiral} and the {admiral} extension package teams that provide test data for new domains or specific TAs (ophthalmology, vaccines, etc.).

Most common SDTM datasets can be found as well as some specific disease area SDTMs that are not available in the CDISC pilot datasets.

Available on [CRAN](https://cloud.r-project.org/web/packages/pharmaversesdtm/index.html). This package is actively maintained on [GitHub](https://github.com/pharmaverse/pharmaversesdtm)

## pharmaverseadam: ADaM Test Data for the Pharmaverse Family of Packages

A set of Analysis Data Model (ADaM) datasets constructed using the Study Data Tabulation Model (SDTM) datasets contained in the {pharmaversesdtm} package and the template scripts from the {admiral} family of packages.

Available on [CRAN](https://cloud.r-project.org/web/packages/pharmaverseadam/index.html). This package is actively maintained on [GitHub](https://github.com/pharmaverse/pharmaversesdtm)

## admiral: ADaM in R Asset Library

A toolbox for programming Clinical Data Interchange Standards Consortium (CDISC) compliant Analysis Data Model (ADaM) datasets in R. ADaM datasets are a mandatory part of any New Drug or Biologics License Application submitted to the United States Food and Drug Administration (FDA). Analysis derivations are implemented in accordance with the "Analysis Data Model Implementation Guide.

Limited datasets like `ADSL`, `ADLB` are provided in {admiral}, because the template scripts available in this package are used to create the ADaMs in {pharmaverseadam}.

Available on [CRAN](https://cran.r-project.org/web/packages/admiral/index.html). This package is actively maintained on [GitHub](https://github.com/pharmaverse/admiral).

## random.cdisc.data: Create Random ADaM Datasets

A set of functions to create *random* Analysis Data Model (ADaM) datasets and cached datasets. You can find a list of the possible random CDISC datasets generated [here](https://insightsengineering.github.io/random.cdisc.data/main/index.html). ADaM dataset specifications are described by the Clinical Data Interchange Standards Consortium (CDISC) Analysis Data Model Team. These datasets are used to power the [TLG Catalog](https://insightsengineering.github.io/tlg-catalog/stable/), though the NEST team is actively substituting them for {pharmaverseadam} datasets instead - see [a recent blog post](https://pharmaverse.github.io/blog/posts/2025-01-15_nest_and_pharmaverseadam/nest_and_pharmaverseadam.html) about this very effort!



Available on [CRAN](https://cran.r-project.org/web/packages/random.cdisc.data/index.html). The package is actively maintained on [GitHub](https://github.com/insightsengineering/random.cdisc.data) by the NEST team.

## safetyData: Clinical Trial Data

The package re-formats PHUSE's sample ADaM and SDTM datasets as an R package following R data best practices.

PHUSE released the data under the permissive MIT license, so reuse with attribution is encouraged. The data are especially useful for prototyping new tables, listings and figures and for writing automated tests.

Basic documentation for each data file is provided in help files (e.g. ?adam_adae). Full data specifications in the form of define.xml files can also be found at the links above (pdf for ADaM and pdf for SDTM).

Available on [CRAN](https://cran.r-project.org/web/packages/random.cdisc.data/index.html). The package is available on [GitHub](https://github.com/SafetyGraphics/safetyData).


## NEST: Accelerating Clinical Reporting

[NEST](https://insightsengineering.github.io/nest/) is a collection of open-sourced R packages, which enables faster and more efficient insights generation under clinical research settings, for both exploratory and regulatory purposes.

They have a wealth of data generated for documentation, demonstrations and testing. You can find all the datasets and what packages they live in [here](https://insightsengineering.r-universe.dev/datasets).

## Collect all the data!

As you can see the list is short! Let me know if you have sources (big and small) and we can add to this list.

![](data.jpg){fig-align="center" width="220"}

<!--------------- appendices go here ----------------->

```{r, echo=FALSE}
source("appendix.R")
insert_appendix(
repo_spec = "pharmaverse/blog",
name = long_slug,
# file_name should be the name of your file
file_name = list.files() %>% stringr::str_subset(".qmd") %>% first()
)
```