-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove git submodule for testing data #2848
Conversation
clang-tidy review says "All clean, LGTM! 👍" |
1ad9e03
to
97a6457
Compare
clang-tidy review says "All clean, LGTM! 👍" |
Makes sense to me. Hopefully this would also by design remove some of the weird annoyances around submodules not changing commit hash on main repo branch change / getting mistakenly included in The outstanding relevant question is whether there should remain two test data repositories, or whether they should be merged into one. Also: Would this be a preferable alternative to submodules for other external dependencies (#2584)? I suppose the disadvantage there is that IDEs wouldn't be able to find the code for the relevant functions unless you put the code somewhere else and pointed the IDE to it yourself. |
There might be a slight disadvantage to developer workflow, but it is tolerable. Whenever an update to the test data is required, typically one would:
Here, I presume the workflow will be:
? |
I think for this purpose, FetchContent is more suitable than
and IDEs shouldn't have trouble picking this up. |
Would like to get this merged too unless there are any objections. |
Before we do, I'm not sure I get the rationale. My understanding of the original issue was that the testing ought to run from within the build directory. I kind of get that bit. But then there's a suggestion to download the test data into the build directory, and that's where I get a bit lost. Does that mean that if we delete the build folder, the system will have to download all the test data again? That's a fair bit of data... I'm hoping that the Hopefully this is just my ignorance about how that feature works, and a quick bit of education will sort that out... |
Yes, the data will be cloned in the build directory. However, the original issue #2837 was to address the problem that our source directory can get polluted with temporary files produced by execution of tests. This also cleanly allows a different data set to be cloned when a developer changes their local branch. In #2859, we have the same problem and I implemented an alternative solution: downloaded dependencies are cloned in |
OK, so this was a valid concern... On my home network, we're talking minutes to download all the test data for the scripts. Much longer than the actual build time - especially now that we're using |
Regarding re-downloading of test data, I also have developed the habit of deleting a build directory and then re-constructing. However #2859 has these suggestions:
So it might be possible to avoid repetitive long downloads just by changing developer practise. We must however consider the case where one has multiple build directories:
Given the latter, I'm uncomfortable having non-repository files requisite for building living outside of a build directory; both here and potentially for #2859 also. I'd previously discussed the prospect of having environment variables to be interpreted at configure time for loading external dependencies from the local filesystem into the build directory rather than pulling them from the internet. Hypothetically, that argument could extend to the test data. If the developer has a test data repository cloned on their own system, and that clone contains the commit to which the source code repository currently refers, then hypothetically, the test data repository could be copied into the build directory and the requisite commit checked out (indeed, if that local clone is already pointing to the right commit, then just the data content could be pulled, skipping unnecessary duplication of git content). Perhaps not be worth the effort if just avoiding build directory deletion resolves, but thought it worth mentioning. |
I tend to agree with this. For #2859, cloning in the build directory is even less of an issue as downloading tarball for Eigen, Json for Modern C++ and Half only takes a few seconds. |
@Lestropie @jdtournier I can't exactly recall what was the verdict on this issue, but did we decide to allow for the possibility of specifying a local URL for the data directory (so that we can clone the data into the build)? If yes, what should be the default behaviour? |
Yes, that was more or less the conclusion we came to. Basically, the behaviour would be to always Does that sounds reasonable? |
Sounds right to me. Only addition would be that if the requested commit is not available, the resulting error could be different depending on whether the default GitHub repo or a manually-specified repo was used. In the former case, the issue could be either an erroneously specified commit, or it could be a failure to push committed changes to GitHub; whereas in the latter case, in addition to an erroneously specified commit it could be that the manually specified local repo is not fully up to date. |
97a6457
to
56656ab
Compare
clang-tidy review says "All clean, LGTM! 👍" |
1 similar comment
clang-tidy review says "All clean, LGTM! 👍" |
4c06bd9
to
ed5c0be
Compare
clang-tidy review says "All clean, LGTM! 👍" |
1 similar comment
clang-tidy review says "All clean, LGTM! 👍" |
I have now rebased the changes by allowing the user to specify the
To clone the data we are using
|
I think the projected workflow for developers here is that one would have a single clone of the test data repository, make all changes there, and just set an environment variable to point to that location so that all of their builds can make use of it. So while a cmake variable would be more conceptually faithful in terms of configuring a build system, if it's only there for expediting developer workflow, I think an environment variable is what would be maximally convenient for myself.
Doesn't quite address the question, which was specifically whether there is explicit feedback to the developer that an attempt was made to use a specified local clone but that the requested commit wasn't available there so it is falling back to doing a complete clone from GitHub. I'm guessing if it's an existing function it will report something sensible in that scenario. Also this quote references what happens for re-runs of cmake, not for the initial configuration. I'll try to test. |
My understanding here is that |
We now download testing data at build time instead of cloning the data as a git submodule. The user can also provide local git repos for the data by specifying the MRTRIX_BINARIES_DATA_DIR and MRTRIX_SCRIPTS_DATA_DIR environment variables during the configuration stage. This prevents from polluting the source directory and it's more in line with CMake's philosophy.
172480c
to
91ac695
Compare
@MRtrix3/mrtrix3-devs I believe this is now ready to be merged. |
clang-tidy review says "All clean, LGTM! 👍" |
This PR aims to address #2837. The idea is to remove the Git submodule for testing data from the codebase. Instead, testing data will be cloned in the build directory at build time. This is made possible by using CMake's ExternalProject.