Consolidating metadata for a collection of files #4

fBedecarrats · 2024-03-06T16:27:29Z

Congratulations and many thanks for this great tool @dusadrian !
I undestand how to use the convert function to produce DDI files with a one to one correspondence between source files (in Stata for instance) and xml files. But I can't figure out how to consolidate it. Here is one reproducible example using the survy models proposed by DHS:

# Install the latest version of DDIwR
remotes::install_github("https://github.com/dusadrian/DDIwR")
library(DDIwR)
library(tidyverse)

# Set our variables to acquire the data
dhs_dir <- "test/"
dhs_models <- c("zzbr62dt.zip", # Births Recode
                "zzcr61dt.zip", # Couples' Recode
                "zzhr62dt.zip", # Household Recode
                "zzir62.zip") # Individual Recode
dhs_base_url <- "https://www.dhsprogram.com/data/model_data/dhs/"

# Acquire the data
dir.create(dhs_dir)
dhs_urls <- paste0(dhs_base_url, dhs_models)
dhs_dest <- paste0(dhs_dir, dhs_models)
map2(dhs_urls, dhs_dest, download.file) # Download
map(dhs_dest, unzip, exdir = dhs_dir) # Unzip
stata_files <- list.files(dhs_dir, pattern = "\\.DTA$") # List data files

Here I have 4 stata files that correspond to different questionnaire sections or different formatting of the same data. I would like to make a consolidated DDI file out of them. Here are two questions:
How can I use DDIwR to convert them to children of a parent common object?
How can use DDIwR to add general metadata to document the Overview, scope & coverage, sampling... and other attributes common to all the files?
Thanks in advance for your feedback.

The text was updated successfully, but these errors were encountered:

dusadrian · 2024-03-07T13:57:54Z

Hello Florent,

I only got three .DTA files using your script, but the question is still the same.
In principle, it would be difficult to say how to integrate without knowing more about your datasets, but this is something out of the immediate scope of the DDIwR package. The so-called "Codebook" variant of the DDI is intended to document individual datasets (one at a time).
Now, if all of these datasets are part of the same study, there are two options, function of the particular situation of your research:

If the datasets can be combined (merged) into a single one (for instance I see hhid which I assume is the household ID) then I would try to merge them in R, then convert the resulting R dataframe into a consolidated XML file. The command is still the same, something like convert(finalRdata, to = "path/to/ddi.xml")
If the datasets cannot be merged because they really are supposed to be separate (despite from the same study), which I believe it is the case, then:

you can generate separate XML files and merge them manually (the DDI elements are repeatable) into a suitable text or XML editor
you could also read the metadata from each individual Stata file into R, create the DDI elements (also in R) and save the final R version of the DDI Codebook into an XML file on the disk

This version of the DDIwR package contains all elements from the DDI Codebook 2.6, which you can browse (see for instance ?showDetails) to learn about the structure of these elements, which can be created (see ?makeElement) and added to parent elements (see ?addChildren) and there are more such useful commands in the manual.

I tried to play with your files and for the moment I am getting errors (don't yet know why, but I will investigate).
Hope this helps to get your going, at least for the moment,
Adrian

fBedecarrats · 2024-03-11T16:07:03Z

Hello Adrian, thank you for your reply. I was refering to the demographic and health surveys in my example, because it is to my knowledge the standardized household survey that is the most widely used around the world (in more than 90 countries), because the DHS program provide "mock-up" survey datasets for tests (downloaded in the reproducible example above) and because the DDIs produced with these survey are used by many online catalogues (NADA or others), such as the International Household Survey Network. See for instance a recent DHS survey entry on IHSN catalogue that was created with a DDI Codebook 2.5 (hundreds others can be found by searching "DHS" on this catalog). The DDI codebook can be downloaded here, but it seems to have a different structure than what I get from DDIwR: We have a docDscr, a stdyDscr, one fileDscr per stata file, and a dataDscr that includes one entry per variables and a files variable that refers to the ID of one of the fileDscr. I can try to figure it out myself, but I think that it would serve common use cases to provide some guidance on how to prepare a multifile DDI with your package. I think that it would also be useful to have some handy functions to populate the docDscr and stdyDscr sections.

dusadrian · 2024-09-15T16:28:58Z

Returning a bit to this issue, it is still open and will likely stay open for a "little" while. It requires me writing out a guide (either intro, as in getting started, or probably that plus more advanced topics).

But there actually are handy functions to populate docDscr and stdyDscr. In fact, the latest functions allows one to write the entire codeBook using these functions, see for instance:
?makeElement
?addChildren
?addAttributes
?addContent
etc.

The DDI Codebook elements are standard, so the structure of the XML file produced by DDIwR has to be compatible (impossible not) to the IHSN files. The reason why they seem different must be the fact that IHSN codebook files are completely documented, while the ones (automatically) produced by DDIwR thoroughly document the variables in the dataDscr element, but there is no other information about the study. The other elements of the Codebook have to be created manually (using the above commands), or using a script that make use of these commands to populate the Codebook from a database.

pitkant mentioned this issue Sep 12, 2024

convert function #6

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consolidating metadata for a collection of files #4

Consolidating metadata for a collection of files #4

fBedecarrats commented Mar 6, 2024

dusadrian commented Mar 7, 2024

fBedecarrats commented Mar 11, 2024 •

edited

Loading

dusadrian commented Sep 15, 2024

Consolidating metadata for a collection of files #4

Consolidating metadata for a collection of files #4

Comments

fBedecarrats commented Mar 6, 2024

dusadrian commented Mar 7, 2024

fBedecarrats commented Mar 11, 2024 • edited Loading

dusadrian commented Sep 15, 2024

fBedecarrats commented Mar 11, 2024 •

edited

Loading