Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Implement Automated NMDC Workflow Data Ingestion Pipeline for JAMO #365

Open
16 tasks
Shalsh23 opened this issue Jan 23, 2025 · 0 comments
Open
16 tasks
Assignees

Comments

@Shalsh23
Copy link

Shalsh23 commented Jan 23, 2025

Overview

We need to establish a systematic process for ingesting NMDC workflow outputs from CFS into JAMO. This will enable better data management, systematic storage, and efficient retrieval mechanisms while leveraging JAMO's advanced features.

Current Status

  • Workflow outputs currently reside on CFS
  • No automated ingestion process exists
  • Manual ingestion is time-consuming and error-prone

Objectives

  1. Establish standardized JAT templates for all workflow types
  2. Implement and validate ingestion process
  3. Automate the entire ingestion pipeline

Implementation Tasks

  • Develop JAT templates #366

    • Identify all workflow types requiring templates
    • Create standardized templates following JAMO specifications
    • Document template structure and usage
  • Validate NMDC JAT Templates Through Test Ingestion #367

    • Perform manual ingestion tests using sample workflow data
    • Verify data integrity post-ingestion
    • Document any issues or edge cases
  • Automation Development

    • Develop script for automated ingestion
    • Implement error handling and logging
    • Add validation checks
  • Production Deployment

    • Execute full data ingestion
    • Monitor and verify results
    • Document operational procedures

Technical Resources

Reference Documentation

Related Code & Resources

Success Criteria

  • All workflow data successfully ingested into JAMO
  • Automated pipeline operational and documented
  • Error handling and recovery procedures in place
  • Documentation complete and accessible

Dependencies

  • Access to CFS and JAMO systems
  • Required permissions and credentials

cc @shreddd @aclum @kaijli

@Shalsh23 Shalsh23 self-assigned this Jan 23, 2025
@Shalsh23 Shalsh23 transferred this issue from microbiomedata/issues Jan 28, 2025
@Shalsh23 Shalsh23 moved this to Todo in JAMO Ingest Jan 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

No branches or pull requests

1 participant