-
-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dataset: Create Data Frames that are Easier to Exchange and Reuse #681
Comments
Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type |
This comment has been minimized.
This comment has been minimized.
@antaldaniel I am so sorry, you didn't use the right template... because we broke the right template last time we edited it! I put it back in shape, thanks for helping us catch it. You can find it by opening a new issue (it'll be at the top of the list) or use it from here https://github.com/ropensci/software-review/blob/main/.github/ISSUE_TEMPLATE/A-submit-software-for-review.md So sorry, thanks for your patience. |
Checks for dataset (v0.3.4002)git hash: 7bf85ac7
Important: All failing checks above must be addressed prior to proceeding (Checks marked with 👀 may be optionally addressed.) Package License: GPL (>= 3) 1. Package DependenciesDetails of Package Dependency Usage (click to open)
The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.
Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(<path/to/repo>)', and examining the 'external_calls' table. baseifelse (41), as.character (40), is.null (39), list (20), c (14), vapply (8), lapply (7), names (7), data.frame (6), logical (6), paste (6), paste0 (6), character (5), inherits (5), which (5), contributors (4), date (4), seq_along (4), substr (4), Sys.time (4), for (3), format (3), invisible (3), length (3), t (3), with (3), all (2), attr (2), class (2), drop (2), gsub (2), labels (2), nrow (2), args (1), as.data.frame (1), as.Date (1), as.POSIXct (1), cbind (1), do.call (1), double (1), if (1), nchar (1), ncol (1), rbind (1), Sys.Date (1), tolower (1), vector (1) datasetget_bibentry (26), creator (11), dataset_title (11), subject (10), publisher (7), rights (7), get_creator (6), identifier (6), description (5), language (5), new_Subject (5), provenance (5), dataset_df (4), get_publisher (4), get_type (4), agent (3), convert_column (3), n_triple (3), publication_year (3), var_definition (3), var_namespace (3), var_unit (3), as_dataset_df (2), as_dublincore (2), datacite (2), default_provenance (2), definition_attribute (2), geolocation (2), get_author (2), get_person_iri (2), idcol_find (2), is_person (2), is.dataset_df (2), n_triples (2), namespace_attribute (2), new_my_tibble (2), prov_author (2), unit_attribute (2), as_character (1), as_character.haven_labelled_defined (1), as_datacite (1), as_numeric (1), as_numeric.haven_labelled_defined (1), as.character.haven_labelled_defined (1), create_iri (1), dataset_to_triples (1), defined (1), describe (1), dublincore (1), dublincore_to_triples (1), fix_contributor (1), fix_publisher (1), get_definition_attribute (1), get_namespace_attribute (1), get_unit_attribute (1), id_to_column (1), is_dataset_df (1), is_doi (1), is.datacite (1), is.datacite.datacite (1), is.defined (1), is.dublincore (1), is.dublincore.dublincore (1), is.subject (1), label_attribute (1), names.dataset_df (1), new_datacite (1), new_datetime_defined (1), new_dublincore (1), new_labelled_defined (1), print.dataset_df (1), set_default_bibentry (1), set_definition_attribute (1), set_namespace_attribute (1), set_unit_attribute (1), set_var_labels (1), subject_create (1), summary.dataset_df (1), summary.haven_labelled_defined (1), tbl_sum.dataset_df (1), var_definition.default (1), var_label.dataset_df (1), var_label.defined (1), var_namespace.default (1) assertthatassert_that (29) graphicstitle (11) utilsperson (5), bibentry (2), citation (1) labelledvar_label (4), to_labelled (1) rlangcaller_env (1), env_is_user_facing (1) statsdf (1), family (1) clicat_line (1) havenlabelled (1) tibblenew_tibble (1) NOTE: Some imported packages appear to have no associated function calls; please ensure with author that these 'Imports' are listed appropriately. 2. Statistical PropertiesThis package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing. Details of statistical properties (click to open)
The package has:
Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages
All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by the The final measure (
2a. Network visualisationClick to see the interactive network visualisation of calls between objects in package 3.
|
package | version |
---|---|
pkgstats | 0.2.0.48 |
pkgcheck | 0.1.2.77 |
Editor-in-Chief Instructions:
Processing may not proceed until the items marked with ✖️ have been resolved.
@maelle let me know if this works now :) |
Yes, thank you! |
@ropensci-review-bot assign @maelle as editor |
Assigned! @maelle is now the editor |
Thanks again for your submission! Editor checks:
Editor commentsDocumentationMy main comment before I can proceed to looking for reviewers is that the case of the package could be made better. On the one hand, it'd be interesting to read how dataset compares to other approaches to the same "problem", such as (if I follow correctly)
On the other hand, how would an user take advantage of dataset?
In short, could you exemplify "release" and "re-use" in a vignette or more, as use cases, potentially using as roles the type of users you mention in the submission under "target audience". For instance https://wbdataset.dataobservatory.eu/ is a good example, but it is mentioned in a vignette. A tiny comment: I find "reuse" harder to parse than "re-use" but that might be a personal preference. Installation instructionsI'd recommend documenting the two methods of installation (CRAN and GitHub) in distinct chunks so readers could copy-paste the entire code chunk of interest. Instead of devtools you could recommend using pak. install.packages("pak")
pak::pak("dataobservatory-eu/dataset") Default git branchYou might want to rename the master branch to main as some people can be offended by the word "master", see https://www.tidyverse.org/blog/2021/10/renaming-default-branch/ that includes links to context, and practical advice on renaming the default branch. Contributing guideThe contributing guide does not seem customized. Since you are looking for co-developers, and mentioned one of the articles could be relevant to potential contributors, I'd recommend having some text related to design and wishes for feedback in the contributing guide. The contributing guide mentions "AppVeyor" which is not used any more as far as I can tell. Continuous integration
Project managementFrom the open issues, which ones are meant to be tackled soon? Code styleI'd recommend running styler (on R scripts including tests) to make spacing more consistent. For instance in https://github.com/dataobservatory-eu/dataset/blob/7bf85ac7abe9477d02b429b4d335179d94993a77/R/agent.R#L56C1-L57C55 the space before CodeThe code could be simplified so that reviewers might more easily follow the logic.
Code like and (and other similar pipelines) reminds me of Jenny Bryan's advice in her talk Code smells and feels
So instead of the complex logic, you'd define methods. In some files like Since dataset imports rlang, you could use the creators <- ifelse(is.null(dataset_bibentry$author), ":tba", dataset_bibentry$author) would become creators <- dataset_bibentry$author %||% ":tba" There are many other occurrences of the In the Example datasetThe iris dataset is very well-known, but it is also infamous because of its eugenics links. TestsShould the line https://github.com/dataobservatory-eu/dataset/blob/7bf85ac7abe9477d02b429b4d335179d94993a77/tests/testthat/test-agent.R#L9 be removed as it is not used in the test? I don't understand why the iris object needs to be duplicated in lines like https://github.com/dataobservatory-eu/dataset/blob/7bf85ac7abe9477d02b429b4d335179d94993a77/tests/testthat/test-creator.R#L23
should be expect_type(as_datacite(iris_dataset, "list"), "list") What is https://github.com/dataobservatory-eu/dataset/blob/master/tests/testthat/test-dataset_prov.bak? When using Thank you! Happy to discuss any of the items. |
@ropensci-review-bot check package |
Thanks, about to send the query. |
🚀 Editor check started 👋 |
Checks for dataset (v0.3.4002)git hash: 7bf85ac7
Important: All failing checks above must be addressed prior to proceeding (Checks marked with 👀 may be optionally addressed.) Package License: GPL (>= 3) 1. Package DependenciesDetails of Package Dependency Usage (click to open)
The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.
Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(<path/to/repo>)', and examining the 'external_calls' table. baseifelse (41), as.character (40), is.null (39), list (20), c (14), vapply (8), lapply (7), names (7), data.frame (6), logical (6), paste (6), paste0 (6), character (5), inherits (5), which (5), contributors (4), date (4), seq_along (4), substr (4), Sys.time (4), for (3), format (3), invisible (3), length (3), t (3), with (3), all (2), attr (2), class (2), drop (2), gsub (2), labels (2), nrow (2), args (1), as.data.frame (1), as.Date (1), as.POSIXct (1), cbind (1), do.call (1), double (1), if (1), nchar (1), ncol (1), rbind (1), Sys.Date (1), tolower (1), vector (1) datasetget_bibentry (26), creator (11), dataset_title (11), subject (10), publisher (7), rights (7), get_creator (6), identifier (6), description (5), language (5), new_Subject (5), provenance (5), dataset_df (4), get_publisher (4), get_type (4), agent (3), convert_column (3), n_triple (3), publication_year (3), var_definition (3), var_namespace (3), var_unit (3), as_dataset_df (2), as_dublincore (2), datacite (2), default_provenance (2), definition_attribute (2), geolocation (2), get_author (2), get_person_iri (2), idcol_find (2), is_person (2), is.dataset_df (2), n_triples (2), namespace_attribute (2), new_my_tibble (2), prov_author (2), unit_attribute (2), as_character (1), as_character.haven_labelled_defined (1), as_datacite (1), as_numeric (1), as_numeric.haven_labelled_defined (1), as.character.haven_labelled_defined (1), create_iri (1), dataset_to_triples (1), defined (1), describe (1), dublincore (1), dublincore_to_triples (1), fix_contributor (1), fix_publisher (1), get_definition_attribute (1), get_namespace_attribute (1), get_unit_attribute (1), id_to_column (1), is_dataset_df (1), is_doi (1), is.datacite (1), is.datacite.datacite (1), is.defined (1), is.dublincore (1), is.dublincore.dublincore (1), is.subject (1), label_attribute (1), names.dataset_df (1), new_datacite (1), new_datetime_defined (1), new_dublincore (1), new_labelled_defined (1), print.dataset_df (1), set_default_bibentry (1), set_definition_attribute (1), set_namespace_attribute (1), set_unit_attribute (1), set_var_labels (1), subject_create (1), summary.dataset_df (1), summary.haven_labelled_defined (1), tbl_sum.dataset_df (1), var_definition.default (1), var_label.dataset_df (1), var_label.defined (1), var_namespace.default (1) assertthatassert_that (29) graphicstitle (11) utilsperson (5), bibentry (2), citation (1) labelledvar_label (4), to_labelled (1) rlangcaller_env (1), env_is_user_facing (1) statsdf (1), family (1) clicat_line (1) havenlabelled (1) tibblenew_tibble (1) NOTE: Some imported packages appear to have no associated function calls; please ensure with author that these 'Imports' are listed appropriately. 2. Statistical PropertiesThis package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing. Details of statistical properties (click to open)
The package has:
Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages
All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by the The final measure (
2a. Network visualisationClick to see the interactive network visualisation of calls between objects in package 3.
|
package | version |
---|---|
pkgstats | 0.2.0.48 |
pkgcheck | 0.1.2.77 |
Editor-in-Chief Instructions:
Processing may not proceed until the items marked with ✖️ have been resolved.
Coverage is also something noted in my comments. |
@maelle Thank you. It is unfortunate that you have joined this review after two years, because I find many of your comments very useful. However, I must say that some of them would be unusual to add to a vignette, and I would find an article format more useful, i.e., the your question about frictionless, datapack, datapackage.org, and researchobject, as about 2 years of research went into this package that is usually not a vignette material, other reviewers of my other packages hated extended vignettes. I think that the frictionless package family follows a very different approach, and when I started the development of this package and this review, it did not even seem relevant. Eventually I see that in some use cases both can be useful and a choice could be offered, so I will argue that, but I think that this is more a paper considering statistical exchange formats and their best representation in R. A small question: what it the package coverage you are aiming at? I think that the package already has a very high coverage, and it exceeded the requirements when I started the review. |
@antaldaniel thank you for your answer!
|
Thank you @maelle and I will look in to why the coverage is not updating... However, do you have an explicit coverage target? |
Yes it's 75% https://devguide.ropensci.org/pkg_building.html#testing -- But I'd recommend also looking at the coverage report to find the not covered lines and make a judgement call on how important/risky these lines are (vs how hard to test they'd be) just so you're sure there's nothing dangerous in the remaining 25%. 🙂 The idea really is to cover "key functionality" (phrasing from the dev guide). In my comments I recommend updating the test-coverage workflow, the fix might be as simple as that. |
@maelle Thank you agian for your useful comments and your PRs. I reorganised the issues and created a new milestone. The milestone currently breaks up your review comments to 7 issues, but they could of course be broken down to more. I set myself a deadline of 16 February to resolve them, though it may happen earlier. I will tag you in the issues when they are ready for review and will also add a new comment here. |
Thank you! I'll put the issue on hold for now but will remove the label as soon as you are ready. Happy to respond to comments (and issues/PRs in your repo) in the meantime if needed. |
@ropensci-review-bot put on hold |
Submission on hold! |
Dear @antaldaniel this is to mark the start of my EiC rotation. I'm reviewing all open issues and noting what I see:
It all looks good to me. I'll step back. Thanks! |
Hi @maurolepore, I just solved three issues with new commits, so their evolution and solution is on the dataset package place. I was hoping and I did receive some broader and useful review, as this is the early stages of the development of a package family. But responding to those is perhaps best placed in an accompanying paper, not a package, for the purpose of this review, I add them as a vignette but it is a bit too philsophical for a normal vignette. I will note when all are solved, although I have a very serious objection to one. I think that the use of the |
Submitting Author Name: Daniel Antal
Submitting Author Github Handle: @antaldaniel
Repository: https://github.com/dataobservatory-eu/dataset
Version submitted: 0.3.4002
Submission type: Standard
Editor: @maelle
Reviewers: TBD
Archive: TBD
Version accepted: TBD
Language: en
Scope
Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
Explain how and why the package falls under these categories (briefly, 1-2 sentences):
The package works with various semantic interoperability standards, therefore it allows the users to retrieve RDF annotated, rich and platform-independent data and reconstruct it as an R data.frame with rich metadata attributes, or to release interoperable, RDF annotated datasets on linked open data platforms from native R objects.
Production-side statisticans. Scientists who want to update their sources from various data repositories and exchanges. Scientists and research data managers who want to release new scientific or professional datasets that follow modern interoperability standards.
The package aimst to complement the rdflib and the datapsice package.
(If applicable) Does your package comply with our guidance around Ethics, Data Privacy and Human Subjects Research? Not applicable.
If you made a pre-submission inquiry, please paste the link to the corresponding issue, forum post, or other discussion, or
@tag
the editor you contacted.Explain reasons for any
pkgcheck
items which your package is unable to pass.Technical checks
Confirm each of the following by checking the box.
This package:
Publication options
Do you intend for this package to go on CRAN?
Do you intend for this package to go on Bioconductor?
Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:
Code of conduct
The text was updated successfully, but these errors were encountered: