Properly attribute code and data sources #327

nmdefries · 2023-06-08T16:18:05Z

as_slide_computation borrows heavily from rlang::as_function. Currently attribution is given informally via the roxygen @source tag and in the function description. We would also want to list JHU for data usage.

Logan looked into possibilities here, but it looks like there's no single official way.

The text was updated successfully, but these errors were encountered:

brookslogan · 2023-06-14T16:44:33Z

Potentially missing from the above investigation is attaching a file LICENSE that notes the (non-MIT) data set licenses.

nmdefries · 2023-06-28T20:07:50Z

In the knitRProgressBar package, the author added the original license text and a description of changes to the source code, and added the original authors to the DESCRIPTION as contributors (ctb). No other changes were made for attribution.

dajmcdon · 2023-06-29T18:19:36Z

Are these attributions actually necessary? I'm not sure that knitRProgressBar is the appropriate model. How do packages in the tidyverse/r-lib actually approach this? That is, do the people who work directly with rlang contributors credit rlang in this way?

(I get that the Writing R packages document recommends this, but if followed, I suspect that all packages on CRAN would have dozens of aut/ctb fields, which doesn't seem to be the case.)

nmdefries · 2023-06-29T22:11:12Z

The motivation for this attribution is more than just that epiprocess uses rlang functions. The initial version of epiprocess:::as_slide_computation was copy-pasted, including documentation, from rlang's source code. I added maybe a line of changes.

As for tidyverse, the only member package with additional external authors listed is readr, see its DESCRIPTION. mio and grisu3 seem to be C++ packages that readr copy-pasted code from, e.g. see readr's mio.h vs mio's mmap.hpp. Both C++ packages use the MIT license.

In a historical version, R Core Team was included for an adaptation of its date-time code.

So the standard seems to be to add the authors of the original package to the DESCRIPTION, and note the source and license in the code itself. In readr, no COPYRIGHT file is provided.

Our as_slide_computation function now has additional differences from the original, but it is more than "inspired" by rlang::as_function. However, the amount of borrowed/adapted code is certainly substantially less than all of the examples above. I'm sure there's a point at which attribution no longer makes sense.

nmdefries · 2023-06-29T22:18:16Z

I haven't looked for examples of dataset attributions yet.

nmdefries · 2023-07-12T21:51:49Z

RE data attributions, looks like no one really does them. As in the above, I looked at some tidyverse packages and assumed that their approach is idiomatic/best-case.

The tidyverse packages I looked at don't have any dtc (data contributor) authors listed, and don't include license info for any datasets, including those that are obviously under copyright and not public domain, e.g. billboard ratings where even the linked source says that "This data is almost certainly a violation of Billboard’s copyright, and probably infringes on Record Research’s books too. The analysis I’m publishing here should fall under fair use, but redistributing the spreadsheet would not"; and Star Wars character info whose license requires attribution.

Some other not-very-official packages list dtc authors, but don't include licenses (although their data might not require it). So listing data contributors and copyright holders in the DESCRIPTION and including license info is better than what the average package is doing.

RE attribution for imported data packages, I think it makes sense to include data contributors in epiprocess and epipredict. Can't find any formal guidance on attribution for this case, but since we're reexporting the datasets and they're being made available as part of the packages, it makes sense to do.

nmdefries mentioned this issue Jun 8, 2023

Pass ref_time_value to epix_slide for functions and formulas #313

Merged

nmdefries mentioned this issue Jun 29, 2023

Attribute code borrowed/modified from rlang for use in as_slide_computation #338

Merged

nmdefries mentioned this issue Jul 24, 2023

Decide on a format for inst/COPYRIGHT #349

Open

nmdefries self-assigned this Oct 26, 2023

nmdefries added this to the Epiprocess Issue Triage milestone Feb 20, 2024

nmdefries mentioned this issue Oct 8, 2024

Import datasets and documentation from epidatasets #520

Merged

4 tasks

brookslogan closed this as completed in #520 Oct 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Properly attribute code and data sources #327

Properly attribute code and data sources #327

nmdefries commented Jun 8, 2023

brookslogan commented Jun 14, 2023

nmdefries commented Jun 28, 2023

dajmcdon commented Jun 29, 2023

nmdefries commented Jun 29, 2023 •

edited

Loading

nmdefries commented Jun 29, 2023 •

edited

Loading

nmdefries commented Jul 12, 2023 •

edited

Loading

Properly attribute code and data sources #327

Properly attribute code and data sources #327

Comments

nmdefries commented Jun 8, 2023

brookslogan commented Jun 14, 2023

nmdefries commented Jun 28, 2023

dajmcdon commented Jun 29, 2023

nmdefries commented Jun 29, 2023 • edited Loading

nmdefries commented Jun 29, 2023 • edited Loading

nmdefries commented Jul 12, 2023 • edited Loading

nmdefries commented Jun 29, 2023 •

edited

Loading

nmdefries commented Jun 29, 2023 •

edited

Loading

nmdefries commented Jul 12, 2023 •

edited

Loading