Skip to content

system_prefix strips whitespace that is significant if present #1760

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
EliahKagan opened this issue Jan 12, 2025 · 2 comments · Fixed by #1758
Closed

system_prefix strips whitespace that is significant if present #1760

EliahKagan opened this issue Jan 12, 2025 · 2 comments · Fixed by #1758

Comments

@EliahKagan
Copy link
Member

EliahKagan commented Jan 12, 2025

Current behavior 😯

I admit it would be unusual for the path to the git-core directory to end in whitespace, because the last component is usually git-core, and because it is unusual for the name of a file or directory to end in whitespace. Likewise, it would be unusual for the path to git-core to begin in whitespace, because it is usually an absolute path, and because it is unusual for the name of a file or directory to begin in whitespace. However, these conditions could in principle occur, and they are easy to produce for testing purposes.

git --exec-path has no trouble reporting the path accurately and unambiguously, even if the path is unusual in this way (or almost any other way). It writes the path literally to standard output, followed by exactly one newline character. Git does this on all platforms and regardless of what characters appear in the path.

However, in gix-path, the env::system_prefix function does not round-trip such paths correctly. It instead strips all ASCII whitespace on both sides:

.trim_with(|b| b.is_ascii_whitespace())

In addition to achieving much of the goal of #1752 without any of the problems, #1758 improves system_prefix significantly. For example, before #1758, if git --exec-path ran but exited indicating failure, system_prefix would still use its output, while in #1758, this is correctly taken to mean the attempt to obtain the path was not successful. However, it does not affect this specific behavior of stripping whitespace while parsing the path:

.trim_with(|b| b.is_ascii_whitespace())

In #1758, that code appears in the implementation of GIT_CORE_DIR, which system_path now uses. Because it is now put to broader use (at least on Windows), because #1758 also uses it to find a reasonable path to sh.exe, it might make sense to fix this in #1758. But it seems to me that this bug is both preexisting and minor (and even more minor on Windows where such paths are less often capable of referring to existing files in the filesystem, and where, when they do, the files are usually the same as referred to by the stripped path).

Therefore I believe it is fine to merge #1758 with or without including a fix for this. So I am opening this issue rather than including this in my (forthcoming) review of #1758. However, the new gix env command introduced there makes it convenient to demonstrate this bug, so I'll use that below, even though that PR neither introduces nor, as it stands, changes this bug.

Expected behavior 🤔

Instead of trimming zero or more ASCII whitespace characters from either side, we should remove a single newline character from the end.

We should not remove any character other than a newline, nor multiple newlines even if present, and we should require the newline to be present and refuse to accept the path if it is does not have a newline after it:

  • The only situation where it would be absent is if the output were incompletely received, in which case it should not be used.
  • Treating it as optional is slightly more complicated.
  • In the especially (hopefully) rare case that the path ends in one or more newlines, removing a single newline character still does the right thing (and is the only approach that does the right thing). git --exec-path outputs whatever the path is, including any newlines anywhere in it, and follows it by a newline of its own.

I say "character" but really I am talking about a b'\n' byte. The way I did this in #1712 was with a call to strip_suffix(b"\n"), which I suspect would work here. Note that some aspects of the code there would not be suitable outside tests, because we should return None rather than panicking. However, for reference, here is that code:

output
.stdout
.strip_suffix(b"\n")
.expect("`git --exec-path` output to be well-formed")
.to_os_str()
.expect("no invalid UTF-8 in `--exec-path` except as OS allows")
.into()

As applied to the version in #1758 that has been moved into GIT_CORE_DIR and improved in other ways already, it looks like this change would be sufficient (and cursory testing seems to confirm this):

     BString::new(output.stdout)
-        .trim_with(|b| b.is_ascii_whitespace())
+        .strip_suffix(b"\n")?
         .to_path()
         .ok()?
         .to_owned()

For convenience, I'll include a review comment suggesting that, but I emphasize that I don't think this needs to be a blocker for #1758.

Git behavior

As described above (and shown below).

Steps to reproduce 🕹

These commands were run on an Arch Linux system. First, I confirm that git --exec-path and gix env are both usable to obtain the git-core directory ("core dir") and agree in typical cases:

ek in 🌐 catenary in gitoxide on  git-shell [$] is 📦 v0.40.0 via 🦀 v1.84.0
❯ git --exec-path
/usr/lib/git-core

ek in 🌐 catenary in gitoxide on  git-shell [$] is 📦 v0.40.0 via 🦀 v1.84.0
❯ cargo run --bin=gix -- env
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.18s
     Running `target/debug/gix env`
          shell: /bin/sh
  config prefix: Some("/home/ek")
         config: Some("/home/ek/.gitconfig")
        git exe: git
  system prefix: Some("/")
       core dir: Some("/usr/lib/git-core")

Now, note that git recognizes GIT_EXEC_PATH to override the path to the git-core directory to an arbitrary user-specified value, that gix-path picks this up:

ek in 🌐 catenary in gitoxide on  git-shell [$] is 📦 v0.40.0 via 🦀 v1.84.0
❯ GIT_EXEC_PATH=/to/some/other/place git --exec-path
/to/some/other/place

ek in 🌐 catenary in gitoxide on  git-shell [$] is 📦 v0.40.0 via 🦀 v1.84.0
❯ GIT_EXEC_PATH=/to/some/other/place cargo run --bin=gix -- env
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.16s
     Running `target/debug/gix env`
          shell: /bin/sh
  config prefix: Some("/home/ek")
         config: Some("/home/ek/.gitconfig")
        git exe: git
  system prefix: Some("/")
       core dir: Some("/to/some/other/place")

To be clear, I am not recommending sanitizing GIT_EXEC_PATH. If the user has set that, then presumably they intend a different or otherwise specific git installation's git-core executables to be used, and for any paths those executables resolve relative to it to be resolved as it implies. Either that, or they are doing testing, as here. I bring this up because I am going to use it as a simple way to test weird git-core paths. That GIT_EXEC_PATH is respected is not part of this bug, and it is probably the right thing to do.

(In gitpython-developers/GitPython#1791 (comment), I mentioned a worry that respecting GIT_EXEC_PATH, when using git --exec-path to find things outside of it at relative locations that are expected with Git for Windows but not in general, might not be ideal. I now think that specific worry of mine was misguided. Even in this kind of Windows use, I think respecting GIT_EXEC_PATH in the rare case that it is set will make it less likely to detect a wrong, broken, or unintended Git installation when there is more than one. In addition, if it is set, then simply ignoring it and using a value obtained by running git --exec-path with it removed from the environment would run too high a risk of using the wrong directory or of detecting different parts of different Git installation and assuming they are associated with the same one. Furthermore, while finding files in Git for Windows is part of its use here, it is also being made available for other, more typical purposes, for which it would simply not be correct without respecting GIT_EXEC_PATH.)

We can use the same technique to test a path with a trailing space. To make the git --exec-path output unambiguous, I use sed to add a literal $ at the end of the line. (This is not needed in the output of gix env, since the "core dir" item is an Option and thus, when displayed as Some, the unambiguous debug representation is shown.) This reveals that the trailing whitespace is preserved by Git but stripped away by gix-path:

ek in 🌐 catenary in gitoxide on  git-shell [$] is 📦 v0.40.0 via 🦀 v1.84.0
❯ GIT_EXEC_PATH='/usr/lib/git-core ' git --exec-path | sed 's/$/$/'
/usr/lib/git-core $

ek in 🌐 catenary in gitoxide on  git-shell [$] is 📦 v0.40.0 via 🦀 v1.84.0
❯ GIT_EXEC_PATH='/usr/lib/git-core ' cargo run --bin=gix -- env
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.18s
     Running `target/debug/gix env`
          shell: /bin/sh
  config prefix: Some("/home/ek")
         config: Some("/home/ek/.gitconfig")
        git exe: git
  system prefix: Some("/")
       core dir: Some("/usr/lib/git-core")

As further confirmation of the cause, as well as to (partially) test a possible solution, I try the fix proposed above and observe that it preserves the trailing space:

ek in 🌐 catenary in gitoxide on  git-shell [$] is 📦 v0.40.0 via 🦀 v1.84.0
❯ git stash pop
On branch git-shell
Your branch is up to date with 'upstream/git-shell'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
 modified:   gix-path/src/env/mod.rs

no changes added to commit (use "git add" and/or "git commit -a")
Dropped refs/stash@{0} (8aaefaf2f3e9a1b37839e7a291de425165dbfe04)

ek in 🌐 catenary in gitoxide on  git-shell [$!] is 📦 v0.40.0 via 🦀 v1.84.0
❯ git diff
diff --git a/gix-path/src/env/mod.rs b/gix-path/src/env/mod.rs
index 20cdb5185..d1cd4de79 100644
--- a/gix-path/src/env/mod.rs
+++ b/gix-path/src/env/mod.rs
@@ -122,7 +122,7 @@ static GIT_CORE_DIR: Lazy<Option<PathBuf>> = Lazy::new(|| {
     }

     BString::new(output.stdout)
-        .trim_with(|b| b.is_ascii_whitespace())
+        .strip_suffix(b"\n")?
         .to_path()
         .ok()?
         .to_owned()

ek in 🌐 catenary in gitoxide on  git-shell [$!] is 📦 v0.40.0 via 🦀 v1.84.0
❯ GIT_EXEC_PATH='/usr/lib/git-core ' cargo run --bin=gix -- env
   Compiling ...
...
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 12.61s
     Running `target/debug/gix env`
          shell: /bin/sh
  config prefix: Some("/home/ek")
         config: Some("/home/ek/.gitconfig")
        git exe: git
  system prefix: Some("/")
       core dir: Some("/usr/lib/git-core ")

As shown, under that change, the value Some("/usr/lib/git-core") is replaced with the value that is correct (for this specific weird situation being tested), Some("/usr/lib/git-core ").

In the above output, I redacted a bunch of "Compiling" lines with ellipses, for readability. The full exact output can be seen in this gist.

@Byron
Copy link
Member

Byron commented Jan 12, 2025

Thanks for bringing this up.

I think this is implemented in #1758 now that I applied the proposed change.

@EliahKagan
Copy link
Member Author

Yes, I think merging #1758 fixed this. Is there a convenient way to mark this as closed specifically by that PR? (If not then it can just be closed as completed.)

@Byron Byron linked a pull request Jan 12, 2025 that will close this issue
3 tasks
@Byron Byron closed this as completed Jan 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants