Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harmonization of Output Across Annotation Modules and Sub-Workflows #87

Open
5 tasks
anwarMZ opened this issue Nov 20, 2024 · 0 comments
Open
5 tasks
Labels
enhancement New feature or request

Comments

@anwarMZ
Copy link
Member

anwarMZ commented Nov 20, 2024

Description of feature

To enhance interoperability and standardization within the BACPAQ workflow, output harmonization across various annotation modules and sub-workflows is essential. The goal is to create two standardized output formats:

  1. GFF (General Feature Format) for bioinformatics analysis compatibility.
  2. JSON (JavaScript Object Notation) for seamless integration and interoperability with external systems such as IRIDA-next.

By standardizing these formats, we can improve the usability of BACPAQ outputs both within bioinformatics pipelines and across other platforms, allowing users to easily share, integrate, and analyze annotation results. Additionally, using bioinformatics ontologies will allow further standardization of output data and enhance compatibility with other tools.

Tasks:

  • Develop Helper Functions for GFF Conversion:

  • Create helper functions to convert non-standardized outputs from various annotation modules to a consistent GFF format.

  • Ensure the functions can handle different annotation tools’ outputs and translate these into the GFF standard.

  • Test conversions to ensure compliance with the GFF specification and compatibility across bioinformatics tools.

  • Identify and Implement a JSON Schema for Annotation Output:

  • Research and identify a JSON schema that can accommodate structured annotation outputs from BACPAQ.

  • Ensure that the schema aligns with relevant bioinformatics ontologies (e.g., OBO Foundry) to standardize terms across tools.

  • Test the schema by manually structuring sample data outputs and validating them against the schema requirements to confirm suitability.

  • Write Helper Function to Convert GFF to JSON:

  • Develop a helper function that translates GFF-formatted data into the identified JSON schema.

  • Ensure the function maps relevant GFF fields to their JSON schema counterparts in a structured and consistent manner.

  • Add error handling to alert users of any incompatible or missing fields during the conversion process.

  • Test Harmonized Outputs in Bioinformatics and External Systems:

  • Validate the generated GFF and JSON files by running tests with popular bioinformatics tools (for GFF) and ensuring compatibility with external systems such as IRIDA-next (for JSON).

  • Gather feedback on functionality and ease of integration from beta users and developers working with BACPAQ outputs.

  • Documentation and User Guide Updates:

  • Update the BACPAQ documentation with details on the standardized output formats and instructions on how users can leverage them in bioinformatics analysis and external systems.

  • Provide examples of the GFF and JSON formats and include scenarios where each format can be most effectively utilized.

@anwarMZ anwarMZ added the enhancement New feature or request label Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant