Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Git references in subset merger #987

Merged
merged 7 commits into from
Sep 5, 2024

Conversation

RickyDaMa
Copy link
Contributor

@RickyDaMa RickyDaMa commented Jun 12, 2024

Closes #986

In YAML, you can now do:

- operation: addSubset
  directory: latin-sources
  subsets:
    - from:
        repo: googlefonts/some-font@latest # <- the @ bit is new!
        path: source/some-font/SomeFont.designspace
      force: true
      name: GF_Latin_Core

Which can take any git reference, or the special value "latest" which grabs the tag from the latest GitHub release

These are appropriately cached in separate folders for org/repo/ref

Support for this is has also been incorporated into add-ds-subsets, and I've sprinkled in some type hints in parts of the codebase I've visited

Old PR description (outdated)

In YAML, you can now do:

repo:
  slug: notofonts/latin-greek-cyrillic
  ref: e7f1736c5ad0dc2abfc4dcd49ebca50abf612b29

Where before repo could only take a string (being the repo slug). This is still supported with the old behaviour of assuming you want the latest of the branch

The cache path was previously just the repo slug, but is now the slug + ref, e.g. notofonts/latin-greek-cyrillic/main

I've added a corresponding --git-ref to gftools-add-ds-subsets to expose this feature to the CLI as well

Opening this PR early without tests etc. to make sure the approach & code style are acceptable - such as the smidge of type hinting I added

Let me know your thoughts!

@simoncozens
Copy link
Contributor

This is good, but:

  1. I'd rather "repo" be the repo slug and a new optional "ref" field be added. There's no need for an additional level of nesting here.
  2. Type hints are good and I want more of them everywhere! But I don't think we need from __future__ import annotations unless you're doing self-referential typing ("Postponed Evaluation of Annotations"). Type hints are supported by default in all the Python versions we support.

@RickyDaMa
Copy link
Contributor Author

RickyDaMa commented Jun 13, 2024

I'd rather "repo" be the repo slug and a new optional "ref" field be added

Sure, I can do that. Or we could do pip-like syntax and have repo support something like org/repo@git-ref, which wouldn't even need an extra field. On GitHub at least, org & repo can't contain @, so splitting on it once should be without edge cases

I don't think we need from __future__ import annotations

They're to be able to use lowercase type hints for collections in Python 3.8 IIRC, e.g.

from __future__ import annotations

def foo(bar: dict[str, str]):
    pass

Instead of needing

from typing import Dict

def foo(bar: Dict[str, str]):
    pass

@simoncozens
Copy link
Contributor

Or we could do pip-like syntax and have repo support something like org/repo@git-ref

I like that.

They're to be able to use lowercase type hints for collections in Python 3.8

Hmm. OK, fair enough!

@RickyDaMa
Copy link
Contributor Author

Is there any testing that covers the subset merger or add-ds-subsets subcommand for me to hook into, and/or what would you like to see tested of this PR?

I've used it locally through gftools' builder to make sure it works

@RickyDaMa RickyDaMa marked this pull request as ready for review June 13, 2024 11:42
@simoncozens
Copy link
Contributor

The Noto recipe builder uses it, so it's tested through there.

But here's a thought. I can see a strong case for there to be an option to download the source of the latest tagged release. We do this already in gftools-packager when uploading a font from a downstream repository to google/fonts:

upstream = GitHubClient.from_url(metadata.source.repository_url)
res = []
# Getting files from an archive always takes precedence over a
# repo dir
if latest_release or metadata.source.archive_url:
if latest_release:
release = upstream.get_latest_release()
metadata.source.archive_url = release["assets"][0]["browser_download_url"]

That's an option which would be useful for Noto, and potentially for a project that you might be working on: when a rebuild of the language-based font is triggered, the build automatically picks up any stable updates from the subset donor font, without including any unreleased development state.

@RickyDaMa
Copy link
Contributor Author

Yes, that could be good. I'll take a look at making a special case for org/repo@latest (bikeshedding welcome)

@RickyDaMa
Copy link
Contributor Author

Fun niggle: using the GitHub API for the zipball download link downloads a zip that has the inner top-level folder named with a completely different convention to usual

Normally, it's repo-tag*, however in this case it's org-repo-tag_sha which is super helpful because the response metadata from the get latest release call doesn't contain the tag SHA (target_commitish points to the commit)

Would you rather I make another API call to get the tag SHA and can properly recreate the inner folder name for precise extraction, or shall I just hack something to make it work?

(*where if the tag name began with a "v" and looked like a version, the "v" has been stripped)

@simoncozens
Copy link
Contributor

shall I just hack something to make it work?

There's only one top-level folder, right? So I guess you can just glob for it.

@RickyDaMa RickyDaMa marked this pull request as draft June 13, 2024 16:15
@RickyDaMa RickyDaMa force-pushed the subset-repo-at-commit branch from e380711 to ba57a45 Compare June 13, 2024 16:16
@RickyDaMa RickyDaMa marked this pull request as ready for review June 14, 2024 10:31
In YAML, you can now do:
repo:
  slug: notofonts/latin-greek-cyrillic
  ref: e7f1736c5ad0dc2abfc4dcd49ebca50abf612b29

Where before repo could only take a string (being the repo slug)
This is still supported with the old behaviour of assuming you want the latest of the  branch
Keep the repo key as a string, interpret anything after @ as the git ref
Add GitHubClient.get_latest_tag_name()

This massively simplifies code paths and doesn't require SubsetMerger.download_for_subsetting
to have to do more than it should
This also allows us to know the name of the top level folder within the zip file in all scenarios
download_for_subsetting code is now a fair bit simpler and far more similar to before the ref feature
was added
@RickyDaMa RickyDaMa force-pushed the subset-repo-at-commit branch from cbf704e to 53ee046 Compare June 27, 2024 09:07
@m4rc1e
Copy link
Collaborator

m4rc1e commented Jun 27, 2024

Thanks for the force push. Seems to be working:

Screenshot 2024-06-27 at 10 11 53

@RickyDaMa
Copy link
Contributor Author

@simoncozens (or @m4rc1e) any going concerns that'd prevent this PR from getting merged soon? I think it's in a good state; just updated the original post description to reflect the current state of the feature

@simoncozens simoncozens merged commit 184c192 into googlefonts:main Sep 5, 2024
9 checks passed
@RickyDaMa RickyDaMa deleted the subset-repo-at-commit branch September 5, 2024 16:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Git references in subset merger from repo
3 participants