feat: add more to our gen ai policy (#371)

lwasser · InessaPawson · kysolvik · web-flow · commit 499a7e2bedf5 · 2026-05-11T12:17:49.000-06:00
* feat: add more to our gen ai policy * Apply suggestion from @lwasser * Minor text edit Co-authored-by: Kylen Solvik <kysolvik@gmail.com> * Minor text edit Co-authored-by: Kylen Solvik <kysolvik@gmail.com> * Apply suggestion from @MicahGale Co-authored-by: Micah Gale <mgale@fastmail.com> * Apply suggestion from @kysolvik Co-authored-by: Kylen Solvik <kysolvik@gmail.com> --------- Co-authored-by: Inessa Pawson <inessapawson@gmail.com> Co-authored-by: Kylen Solvik <kysolvik@gmail.com> Co-authored-by: Micah Gale <mgale@fastmail.com>
diff --git a/about/package-scope.md b/about/package-scope.md
@@ -140,7 +140,7 @@ Tools for depositing data into scientific research repositories.
 
 - Examples: [This is an example from rOpenSci - eml](https://github.com/ropensci/software-review/issues/80)
 
-### Data validation and testing:
+### Data validation and testing
 
 Tools that enable automated validation and checking of data quality and
 completeness. These tools should be able to support scientific workflows.
@@ -169,9 +169,9 @@ reproducible workflows. These
 tools may include build systems and tools to manage continuous integration.
 This also includes tools that support version control.
 
-- Examples: Both of these tools are not pyOpenSci reviewed as of yet but are examples of tools that might be in scope for this category - [snakemake](https://snakemake.readthedocs.io/en/stable/), [pyGitHub ](https://github.com/PyGithub/PyGithub)
+- Examples: Both of these tools are not pyOpenSci reviewed as of yet but are examples of tools that might be in scope for this category - [snakemake](https://snakemake.readthedocs.io/en/stable/), [pyGitHub](https://github.com/PyGithub/PyGithub)
 
-### Citation management and bibliometrics:
+### Citation management and bibliometrics
 
 Tools that facilitate managing references, such as for writing manuscripts,
 creating CVs or otherwise attributing scientific contributions, or accessing,
@@ -207,7 +207,11 @@ The review for this package:
 - requires at least 1 domain specialist
 - will never vet the analytical method itself.
 
+<<<<<<< genai-2
+1. If your package introduces a novel or newer analytic approach that is not yet vetted/ accepted by a scientific journal, we can not review it. We cannot review projects that exist as a proof-of-concept demonstration of a model or analytical approach that might accompany a paper. In this case, the approach should be sent to a scientific journal for vetting.
+=======
 2. We cannot review a package that introduces a new or novel analytic approach unless they have already been **vetted or accepted by a scientific journal**. We also cannot review projects that serve as proof-of-concept demonstrations of a model or analytical approach that might accompany a paper. If your package falls under either of these cases, please submit it to a scientific journal for peer review before requesting a review here.
+>>>>>>> main
 
 3. If your package implements a novel approach that **has** been peer-reviewed and accepted by a credible scientific journal, it may be eligible for our [publication fast-track review](publication-fast-track). Fast-track review is a streamlined review process focused on software quality and packaging standards. A fast-track review is performed by one reviewer rather than two, focusing solely on packaging rather than the scientific methods applied. Since the domain/scientific component has already been vetted by the journal, the pyOpenSci fast-track reviewer is not expected to re-evaluate the underlying scientific method. To apply for the fast track route, please first submit a pre-submission inquiry and include the publication details.
 
@@ -230,7 +234,7 @@ we will expand this list.
 Packages focused on the retrieval, manipulation, and analysis of spatial data.
 
 - Examples: [PyGmt](https://github.com/pyOpenSci/software-submission/issues/43),
-  [Moving Pandas ](https://github.com/pyOpenSci/software-submission/issues/18)
+  [Moving Pandas](https://github.com/pyOpenSci/software-submission/issues/18)
 
 ### Education
 
@@ -300,31 +304,14 @@ that may be outside JOSS scope while maintaining our partnership for
 packages that meet both organizations' criteria.
 :::
 
-### Telemetry & user-informed consent
-
-Your package should not collect collecting usage analytics without first informing your users about what data are being collected and what is being done with that data. With
-that in mind, we understand that package-use data can be invaluable for the
-development process. If the package does collect such data, it should do so
-by prioritizing user-informed-consent. This means that before any data are
-collected, the user understands:
-
-1. What data are collected
-2. How the data are collected.
-3. What you plan to do with the data
-4. How and where the data are stored
-
-Once the user is informed of what will be collected and how that data will be handled, stored and used, you can implement `opt-in` consent. `opt-in` means that the user agrees to usage-data collection prior to it being collected (rather than having to opt-out when using your package).
-
-We will evaluate usage data collected by packages on a case-by-case basis
-and reserve the right not to review a package if the data collection is overly
-invasive.
-
 To be in technical scope for a pyOpenSci review, your package:
 
 - Should have maintenance workflows documented.
 - Should declare vendor dependencies using standard approaches rather than including code from other packages within your repository.
 - Should not have an exceedingly complex structure. Others should be able to contribute and/or take over maintenance if needed.
 
+See our [policy for use of generative AI / LLMs](../our-process/policies.md#generative-ai-and-open-source-development) for additional expectations regarding AI-generated code and documentation.
+
 :::{admonition} pyOpenSci's goal is to support long(er) term maintenance
 pyOpenSci has a goal of supporting long term maintenance of open source
 Python tools. It is thus important for us to know that if you need to step down as a maintainer, and that the package infrastructure and documentation is
@@ -363,7 +350,7 @@ Your package might be out of in technical scope if it is:
 A few examples of packages that may be too technically challenging for us to
 find a new maintainer for in the future are below.
 
-### Example 1: Your package is an out of sync fork of another package repository that is being actively maintained.
+### Example 1: Your package is an out of sync fork of another package repository that is being actively maintained
 
 Sometimes we understand that a package maintainer may need to step down. In
 that case, we strongly suggest that the original package owner, transfer the
diff --git a/appendices/gen-ai-checklist.md b/appendices/gen-ai-checklist.md
@@ -1,11 +1,18 @@
 ```markdown
-- [ ] Generative AI was used to produce some of the material in this submission.
-- [ ] If generative AI was used in this project, the authors affirm that all generated material has been reviewed and edited for clarity, correctness, and completeness. The authors are responsible for the content of their work and affirm that it is in a state where reviewers will not be responsible for primary editing and review of machine-generated material.
+- [ ] This package has a public development history spanning 3-6 months, with commits distributed over time that reflect **iterative, thoughtful development.**
+- [ ] All code in this package has been **carefully reviewed by a human**. Its implementation is also understood by the authors submitting the package.
+- [ ] All communication on this issue will be written by a human (someone on your maintainer team). We embrace the use of LLMs for translation and grammar correction. We prefer honest interactions over ones that prioritize perfect language and grammar.  Use as little aid from a LLM as possible.
+- [ ] **Generative AI tools were used to develop and maintain this package.**
 
-If you checked the first box above, please describe how generative AI was used, including:
+**Please list the tools and frameworks that you used below.**
+If you checked the first box above, please describe the tools and frameworks that you used below including:
 
 - **Which parts** of the submission were generated (e.g., documentation, tests, code). In addition to a general description, please specifically indicate any substantial portions of code (classes, modules, subpackages) that were wholly or primarily generated by AI.
+
+
 - **The approximate scale** of the generated portions (e.g., "all of the tests were generated and then checked by a human," "small routines were generated and copied into the code").
+
+
 - **How the generative AI was used** (e.g., line completion, help with translation, queried separately and integrated, agentic workflow).
 
 If you have a policy around generative AI use in your project, please provide a link to it below:
diff --git a/our-process/generative-ai-policy.md b/our-process/generative-ai-policy.md
@@ -0,0 +1,56 @@
+# Policy for use of generative AI / LLMs
+
+::::{admonition} How this policy was developed
+::class: important
+
+The Generative AI policy below was co-developed by the pyOpenSci community. Its goals are:
+
+* **Acknowledge the widespread use of Generative AI tools** (LLMs) and promote transparency and responsible use that ensures better software outputs that support sound open source development practices.
+* **Ensure equitable balance of effort in peer review** — authors are responsible for human review of AI-generated content before submission; our volunteer reviewers are not responsible for identifying and/or correcting machine-generated errors or issues.
+* **Protect volunteer reviewers** from being the first line of review for generated code.
+* Give reviewers and editors the information they need to make informed decisions about what they choose to review.
+* **Support and promote packages that follow sustainable software practices** that enable future discovery and uphold the foundational principles of scientific open source.
+* Raise awareness of the broader challenges Generative AI presents to the scientific open source community.
+* Promote transparency and privacy in user data
+
+[Please see this GitHub issue for a discussion of the topic.](https://github.com/pyOpenSci/software-peer-review/issues/331)
+
+In generating our Generative AI policy, we acknowledge some of the other policies in the open source ecosystem that inspired our work here, including:
+
+* [FastAPI docs](https://fastapi.tiangolo.com/contributing/#automated-code-and-ai)
+* [JOSS AI Policy](https://blog.joss.theoj.org/2026/01/preparing-joss-for-a-generative-ai-future)
+* [Scikit-Learn Policy](https://scikit-learn.org/dev/developers/contributing.html#automated-contributions-policy)
+* [Melissa Mendonça’s Collection of GenAI Policies](https://github.com/melissawm/open-source-ai-contribution-policies)
+
+::::
+
+## Generative AI and open source development
+
+We understand and support your use of Generative AI tools to improve software development workflows and to make your developer workflows more efficient. We want you to use them thoughtfully and effectively, and in ways that improve both the open source ecosystem and your development trajectory.
+
+We expect that all code and documentation submitted to our peer review process should have meaningful human review, intervention, judgment, and context. We understand that the use of current Generative AI tools is often tightly woven into development workflows, making disclosure challenging. But **we still require disclosure** to support both transparency and to allow reviewers and editors to understand what they are reviewing.
+
+The policies below support adherence to thoughtful open source development best practices. A pyOpenSci package submission should demonstrate both need and sustained value to the research community. **Short-lived, single-use codebases are out of scope for pyOpenSci.**
+
+## Communication in review issues
+
+* We prefer that all communication in our software review issues are written by a human. We embrace the use of LLMs for translation and grammar correction. We prefer honest interactions over ones that prioritize perfect language and grammar. Use as little aid from an LLM as possible.
+* We will block accounts that spam our repositories or burden our volunteers with repeated, automated comments that aren't directly related to and in support of productive conversations in a review.
+
+## Package development and design approach
+
+* **Development History Timeline:** Projects should have at least **3-6 months of public development history**, with evidence of releases, public issues, and pull requests that reflect **iterative, thoughtful development** rather than rapid and recent code generation.
+* If the human effort put into the package is less than the effort required to review it, please don't submit the package.
+* Software should be developed openly, rather than developed in private and then moved to a public repository with an OSI-approved license to meet minimal open source requirements.
+* **Development History Approach:** We encourage thoughtful development history and patterns, including tightly scoped commits with clear commit messages that follow iterative development best practices, rather than large commits that address multiple issues in a package and affect large volumes of files throughout the package. These workflows signal careful design and development, and changes to a codebase that could be reviewed by a human.
+* Projects with very short, rapid development timelines (weeks to a few months) will face higher scrutiny by our review teams than those that have a significant development history (more than 6 months)
+* **Package Scope & Design:** We value packages with a thoughtful, well-scoped design. When submitting, we will ask you to describe the key design decisions behind your package — the tradeoffs you considered and why you built it the way you did.
+* We place greater value on packages that have been adopted or used by a wide user base, since this demonstrates that the package has design and performance characteristics that meet multiple use cases.
+* Be sure to situate your package within the broader Python ecosystem: identify related tools, explain how your package differs from them, and explain how it complements, extends, or builds upon them.
+* We particularly value **work that builds upon or extends existing tools rather than reinventing solutions** where quality alternatives already exist.
+
+Below is the checklist that you will need to respond to in our submission form:
+
+```{include} ../appendices/gen-ai-checklist.md
+
+```
diff --git a/our-process/policies.md b/our-process/policies.md