Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create folder structure for versioning IBA GAF releases #34

Open
dustine32 opened this issue Sep 5, 2019 · 11 comments
Open

Create folder structure for versioning IBA GAF releases #34

dustine32 opened this issue Sep 5, 2019 · 11 comments
Assignees

Comments

@dustine32
Copy link
Collaborator

Currently we really only have one set of IBA GAFs available at anytime, which is the "published", "released" version, accessible via:

ftp://ftp.pantherdb.org/downloads/paint/presubmission/

We could retain this URL as a pointer to the "published", "released" version of the IBA GAFs while also creating dated folders for each release that can then be used for testing/reproducing fixes/bugs:

ftp://ftp.pantherdb.org/downloads/paint/2019-07-16/

But then how do we highlight which version of Panther was used to generate the IBAs? Try this:

ftp://ftp.pantherdb.org/downloads/paint/13.1/2019-06-12/
ftp://ftp.pantherdb.org/downloads/paint/14.1/2019-07-16/
ftp://ftp.pantherdb.org/downloads/paint/14.1/2019-09-05/

The panther version should also be in the GAF header if anyone needs more assurance:

!gaf-version: 2.1
!Created on Tue Jul 16 09:40:00 2019.
!PANTHER version: v.14.1.
!GO version: 2019-07-03.

And we already do version the Panther tree files (up to a certain point in history):
http://data.pantherdb.org/PANTHER14.1/globals/tree_files.tar.gz
http://data.pantherdb.org/PANTHER13.1/globals/tree_files.tar.gz

We have a somewhat regular history of these GAFs since late 2017 stored under an "archive" folder. Just need to move these to match the above path convention. Ex:
ftp://ftp.pantherdb.org/downloads/paint/archive/09302017/ -> ftp://ftp.pantherdb.org/downloads/paint/11.1/2017-05-11/
Note that the date discrepancy here is due to 2017-05-11 being the file generation date and 09302017 being the date the files were archived/replaced by a newer version.

I'll set this up and we can see if it satisfies the needs of the GO pipeline workflow. @mugitty Would you be affected if we moved around the contents of ftp://ftp.pantherdb.org/downloads/paint/archive/?

@kltm
Copy link

kltm commented Sep 5, 2019

To clarify, if I'm reading this correctly, this would mean that every few months somebody would need to update the PAINT GAF location (by date), while the tree location remains the same for a given PANTHER release?

This would certainly meet our criteria.

Possible implementation of geneontology/pipeline#86

@dustine32
Copy link
Collaborator Author

@kltm Yep correct for all that! At IBA GAF release time, we would have to create the new folder (e.g. .../14.1/2019-09-06/) and change the ftp://ftp.pantherdb.org/downloads/paint/presubmission/ symlink to point to it. The file URLs in paint.yaml wouldn't change.

When we release the new 15.0 version of Panther trees we'll swap the http://data.pantherdb.org/current/ symlink to point to http://data.pantherdb.org/PANTHER15.0/ containing the current tree_files.tar.gz file.

@dustine32
Copy link
Collaborator Author

dustine32 commented Sep 6, 2019

So I decided to add another folder level to each dated "release" folder that will contain the IBA GAFs meant for the GO pipeline. To maintain convention, this new folder is called presubmission:

ftp://ftp.pantherdb.org/downloads/paint/14.1/2019-07-23/presubmission/

Intuitively, the ftp://ftp.pantherdb.org/downloads/paint/presubmission symlink will always point to a specific release's presubmission folder.

With this extra level, we can separate out the IBA GAFs from other products we want to attach to a specific release. For sure the "IBD" file that's generated parallel to the IBAs as well as other metadata will go somewhere here (not in presubmission).

@kltm
Copy link

kltm commented Sep 6, 2019

@dustine32 A URL is a URL is a URL, but I'm a little confused as to what "presubmission" means in this case. These are essentially the versioned locations of the product that the GO will consume? To re-clarify, we are concerned with three things:

  • http://data.pantherdb.org/PANTHERAA.B/globals/tree_files.tar.gz: PANTHER version AA.B tree files
  • ftp://ftp.pantherdb.org/downloads/paint/presubmission/: the location of the latest GAF set that the GO is interested in, linked to the latest PANTHER version and release available
  • ftp://ftp.pantherdb.org/downloads/paint/AA.B/XXXX-YY-ZZ/presubmission/: dated release XXXX-YY-ZZ for PANTHER version AA.B.

Noting that there is no way to get the lastest release of a particular PANTHER version in this setup.

Does this all sound correct to you? With this, we'd essentially have two new variables in the pipeline: PANTHER_VERSION and PANTHER_RELEASE and would thread those in.

@dustine32
Copy link
Collaborator Author

@kltm Sorry, I'm actually not sure what is meant by "presubmission" either. I just reused the name to hopefully reduce confusion during this versioning transition, though I probably just created more confusion. @mugitty Thoughts on where the "presubmission" name came from?

Re: the URL path variables, I believe this new setup should satisfy those three patterns. For example:

http://data.pantherdb.org/current/globals/tree_files.tar.gz
ftp://ftp.pantherdb.org/downloads/paint/presubmission/

Will all get you the latest data. And these all currently point to:

http://data.pantherdb.org/PANTHER14.1/globals/tree_files.tar.gz
ftp://ftp.pantherdb.org/downloads/paint/14.1/2019-08-16/presubmission/

Noting that there is no way to get the lastest release of a particular PANTHER version in this setup.

Do you mean the latest Panther version's tree_files.tar.gz? You should still be able to use http://data.pantherdb.org/current/globals/tree_files.tar.gz.

Also I should note the two new variables in ftp://ftp.pantherdb.org/downloads/paint/PANTHER_VERSION/PANTHER_RELEASE/presubmission/ are not independent. You can currently get IBA GAFs from

ftp://ftp.pantherdb.org/downloads/paint/14.1/2019-08-16/presubmission/

But you can't get the 13.1 PANTHER_VERSION of 2019-08-16 PANTHER_RELEASE by using

ftp://ftp.pantherdb.org/downloads/paint/13.1/2019-08-16/presubmission/

I'd actually consider calling (I know I'm being nit picky) the release variable PAINT_RELEASE since the date primarily reflects the PAINT curation data as of that date. Anyhow, is this flexibility required? If yes, there are ways we could support this. But it would probably require discussion over whether these different PANTHER_VERSION/PAINT_RELEASE combos are reasonably reflecting the curator's original intentions when they annotated to a single Panther version.

@mugitty
Copy link

mugitty commented Sep 7, 2019

@huaiyumi once instructed me to copy the GAF files into the 'presubmission' directory. But, I don't know why it is called 'presubmission'.

@kltm
Copy link

kltm commented Sep 8, 2019

@dustine32 You're right: I missed the pattern for the most current trees. I think all the uses are satisfied.

I'm happy with any input as to the variable names. Mainly, I just want to prevent mistakes from copy/paste from creeping in whenever we make a change to what we're actively pointing at, with controlling the "version" and "date" being the only things that change. Let's pencil in, what I believe is your suggestion, of having: PANTHER_VERSION AA.B and PAINT_RELEASE XXXX-YY-ZZ.

@dustine32
Copy link
Collaborator Author

Thanks @mugitty ! I guess it's probably not that big of a deal what it's called? But, just noting that we're all aware, if we did decide to change it we would need to update the URLs in the paint.yaml.

@dustine32
Copy link
Collaborator Author

@kltm Awesome thank you!

The ftp://ftp.pantherdb.org/downloads/paint/AA.B/XXXX-YY-ZZ/presubmission/ pattern is live for the previous and current versions. You should be able to browse from ftp://ftp.pantherdb.org/downloads/paint/ to get some test values for the new variables.

I can also set up some go-site/pipeline test branches (similar to go-site/dustine32-issue-1127 and pipeline/issue-78-test-panther-14_1) for Jenkins to run.

@kltm
Copy link

kltm commented Sep 9, 2019

@dustine32 Yes, let's go ahead and do that /but/ wait until after the current release, hopefully this week. Would this be something that you could tackle? We can talk sometime next week about details and then you could switch us over and close out geneontology/pipeline#86 ?

@dustine32
Copy link
Collaborator Author

OK @kltm , yep, that would be "fun" for me to setup. I'll wait until I hear about the release and then see when you're available to chat.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants