-
-
Notifications
You must be signed in to change notification settings - Fork 344
Fix check-mode.sh
display of leading-whitespace paths
#1709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
6f9296a
to
c6b22ae
Compare
In GitoxideLabs#1708, `check-mode.sh` deliberately parsed `git ls-files -sz` records using default nonempty `IFS`, to conveniently read fields into variables. But this would strip leading and trailing whitespace from the paths. Trailing whitespace is not currently present, since only files with `.sh` suffixes are processed, so the only paths that would be displayed incorrectly were those with leading whitespace. (We do not intentionally have such files in the repository, but they can arise accidentally. They might also, in principle, be added at some point to support tests. They would not likely be kept, because they are incompatible with Windowws and `git` will not check them out there, but they could plausibly exist in some commits on a feature branch, even with no mistakes.) This was *sort of* okay, because the script never accesses the repository or filesystem using the paths, but only displays them to the user. But this was still bad for a few reasons: - The paths were at least in principle ambiguous. - A path with accidental leading whitespace would not be noticed. - The `path` variable looks like it would be usable, so it could be easily misused in future changes, if not fixed. - Having to think through whether or not this parsing bug is excessively serious, when reasoning about the code, may make the code more cumbersome to understand and maintain. This is to say that I should never have done it in that way. Therefore, this replaces the default-`IFS` logic with empty-`IFS` calls to `read` and regular-expression processing, using `bash`'s `[[` keyword, which supports a `=~` operator that matches against a regular expression and populates the `BASH_REMATCH` array variable with captures. (This approach to parsing `git ls-file -sz` is based on code I wrote in https://github.com/EliahKagan/gitscripts/blob/master/git-lsx.) As discussed in GitoxideLabs#1708, this script may be converted to Rust. This change is not intended as a substitute for that. It should actually make that slightly easier, in that a regex approach has clearer semantics, whether it is translated into code that uses regex or not. (If regex are used, it will probably still need adjustment, as not all dialects recognize POSIX classes like `[[:digit:]]`.) This commit includes some files to test the fix in the script. They are at the top level of the repository, since the bug this fixes only occurs with whitespace at the beginning of the first path component (relative to the root of the working tree). These files will be removed shortly. The messages for them are: mode +x but no shebang: $'\tb.sh' mode -x but has shebang: \ a.sh Before the fix, they would have been shown as `a.sh` instead of `\ a.sh`, and `b.sh` instead of `$'\tb.sh'`. Note that the approach to quoting has not itself changed: the `%q` format specifier for the `printf` builtin in `bash` formats them this, when the paths are parsed in such a way that the characters that would need quoting if used in the shell are not lost.
Now that the `etc/check-mode.sh` script is verified to detect them.
c6b22ae
to
a9b3b48
Compare
In bash 3.2, which is used on macOS, `[[` commands don't invoke `set -e` behavior. This adds an explicit check. This also improves some code that sliced an array instead of using the variables its elements were already assigned to. One of them is used twice, and indexing in the existing usage would make the meaning less clear. So the variables would not be reasonable to eliminate by slicing the array, without other refactoring.
Thanks a lot, and as always, I am impressed by the amount of diligence that went into this PR. And since this PR was necessary even for somebody whom I consider super-human, this should probably provide all the motivation that's needed to rewrite this script as part of Doing so would have one disadvantage though: build times. But maybe with a shared-key for the cache one can share just the right caches across all jobs that use the internal tools, helping them start relatively quickly, albeit still slowly compared to a shell script. But ultimately, it should be worth it (and fun), even for a script that's probably never touched again. |
PS: I am aware that a potential mine was just merged, with two files that seem to be displayed incorrectly in the GitHub UI even, without any quoting or hints about the whitespace they have in their name. But I suppose maybe one day it will help to uncover a shortcoming in a tool, which makes it worth it. |
It will be interesting to see how it displays in the web-based interfaces of the Codeberg and gitee mirrors next time they are fast-forwarded. I am not very worried, though. |
In #1708,
check-mode.sh
deliberately parsedgit ls-files -sz
records using default nonemptyIFS
, to conveniently read fields into variables. But this would strip leading and trailing whitespace from the paths. Trailing whitespace is not currently present, since only files with.sh
suffixes are processed, so the only paths that would be displayed incorrectly were those with leading whitespace.(We do not intentionally have such files in the repository, but they can arise accidentally. They might also, in principle, be added at some point to support tests. They would not likely be kept, because they are incompatible with Windowws and
git
will not check them out there, but they could plausibly exist in some commits on a feature branch, even with no mistakes.1)This was sort of okay, because the script never accesses the repository or filesystem using the paths, but only displays them to the user. But this was still bad for a few reasons:
path
variable looks like it would be usable, so it could be easily misused in future changes, if not fixed.This is to say that I should never have done it in that way.
Therefore, this replaces the default-
IFS
logic with empty-IFS
calls toread
and regular-expression processing, usingbash
's[[
keyword, which supports a=~
operator that matches against a regular expression and populates theBASH_REMATCH
array variable with captures.(This approach to parsing
git ls-file -sz
is based on code I wrote in https://github.com/EliahKagan/gitscripts/blob/master/git-lsx.)As discussed in #1708, this script may be converted to Rust. This change is not intended as a substitute for that. It should actually make that slightly easier, in that a regex approach has clearer semantics, whether it is translated into code that uses regex or not. (If regex are used, it will probably still need adjustment, as not all dialects recognize POSIX classes like
[[:digit:]]
.)I have tested this on Arch Linux, macOS, and Windows [edit: and again after 530948d], though I have not actually tested paths with leading whitespace on Windows, since they are not supported there and
git
will not check them out. (I have checked that it still works otherwise on all systems, though.)For macOS, in addition to that verification, I've checked a bash 3.2 manual to verify that
=~
and$BASH_REMATCH
are present in that version, with the semantics I expect. [Edit: While=~
worked the way I expected,[[
itself didn't, in its interaction withset -e
. That's fixed in 530948d.]1 I mean, in addition to this feature branch, which does have such files in 2ba1ce3, removed in a9b3b48. (If you prefer to avoid that--maybe it would get in the way of some uses of
git bisect
on Windows?--then thetwothree commits in this PR could be squashed into one or two.)