Skip to content

fix: ensure filenames with spaces are excluded from targets #2748

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

mattem
Copy link
Contributor

@mattem mattem commented Apr 7, 2025

Some dependencies contain files with spaces in the name. These should be excluded as they are generally unsupported, and when placed in a runfiles manifest file, they cause it to be malformed.

This changes omits files with spaces in the names from glob patterns.
It also changes the .pyo.NNN temp file inclusion added in #2743 as it seems it was slightly misplaced, and missed form 3p dependency targets.

"**/__pycache__/*.pyc.*",
"**/__pycache__/*.pyo.*",
# File names with spaces should also be ignored.
"**/* *",
] + glob_excludes.version_dependent_exclusions() + extra_files_glob_exclude,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking that it would be nice to have glob_excludes.pyc_files() and glob_excludes.pyo_files() and glob_excludes.files_with_spaces(). Then we can ensure that the explanation for why we need to do what we need to do can be next to their definitions.

I would also love to exclude .pyc and .pyc.* is the hermetic toolchain definition, so that the exclude is the same regardless if we are chmoding the dir to be read-only or not.

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah can add those methods.

Re: pyc, I think we'd only want the temp files excluded here? I'd originally excluded then in a different PR in a different part of the code (removed in this PR in favor of here). This change is keeping the pyc excluded in a single place.

If the pyc files are stable, then generally it would be preferable to keep them, no?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm. Yeah, if they are stable it is fine and we are already setting the vars to make them stable, so SGTM.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If memory serves, excluding pyc was what finally got rid of the Windows jobs getting "can't delete open file" errors. My theory was two processes both went to import at a module without a pyc. Both would start the pyc process, but one would manage to finish writing and open the pyc, then the other process would try to overwrite it. But it couldn't, because the file was open.

The secondary issue is, as pycs are created, they show as additional files added to the target, thus invalidating it, which means anything downstream has to re-run. Eventually things will settle, but they'll only stay settled as long as the repo sticks around. A similar issue can happen with the timestamps: two processes might race and end up creating slightly different timestamped pycs, thus making it look like the file changed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only true for the pyc generation happening at repository_rule execution time. I have added -B a while ago.

When the packages are used in the regular py_binary and py_test rules I expect the pyc files to be created in the sandbox and not the repository_rule output dirs, but my claim should be checked.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh good. Yeah, that should prevent that issue, then. SGTM.

glob,
exclude = [
# File names with spaces should be excluded.
"**/* *",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that our supported bazel versions support files with spaces, so why do we need to exclude them?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, it seems that someone tried it and it did not work?

https://github.com/michael-christen/toolbox/pull/184/files

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we also hit the issue with setuptools in runfiles, but with the Go runfiles library. Setuptools seems to contain files with spaces, so even if bazel itself can handle then now, the runfiles libraries can't.

Copy link
Collaborator

@rickeylev rickeylev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM. The only comment I have is to replace the "files with spaces should be ignored" with text that tells why they should be ignored. I suggested an edit to that effect in one spot. I'm OK with copy/pasting that same comment, or factor out a common function with the comment there instead

"**/__pycache__/*.pyc.*",
"**/__pycache__/*.pyo.*",
# File names with spaces should also be ignored.
"**/* *",
] + glob_excludes.version_dependent_exclusions() + extra_files_glob_exclude,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh good. Yeah, that should prevent that issue, then. SGTM.

"**/__pycache__/*.pyc.*",
"**/__pycache__/*.pyo.*",
# File names with spaces should also be ignored.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# File names with spaces should also be ignored.
# Ignore files with spaces because, while Bazel supports them,
# the runfiles manifest format doesn't yet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants