-
Notifications
You must be signed in to change notification settings - Fork 11
load input directory and files #50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…f no input file provided
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please see my inline comment. Also, please double check to have one test per UC and put a comment which UC it addresses. Also in a comment above, copy-paste teh URL to the GH ussue where these are described
example of what I mean:
# Use cases can be found here: \https:github.com.......
["--input-file", "good_data.chi"], [".", "good_data.chi"] # UC 1
I am only looking at tests, not the code yet, but I am not clear what you are testing, in other words, for the most simple one of user specifies a single file....what is it that you want the code to do after that is specified? This is what we want a test for. Right now, I don't see that in these tests. If you want to discuss it on slack, that is ok. Don't bother working on the implementation code until we have written the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please see my inline comment.
params1 = [ | ||
# Use cases can be found here: https://github.com/diffpy/diffpy.labpdfproc/issues/48 | ||
params_input = [ | ||
(["good_data.chi"], [".", "good_data.chi"]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to have a discussion, by comment thread of zoom if it is easier, about what we want the program to do given the inputs and therefore what we want to test. It seems to me that these tests are just testing something to do with metadata, but this PR is about much more than that. Here is a start:
single-file case:
- check the file exists
- read the file
- if valid, compute the cve and process the data
- if unreadable, error with helpful message
- find the absolute path and store it in metadata
- write this into the output file header
We want to make sure all these things are tested. Some will be tested by other functions. But we need tests here for all the things that won't be covered by other functions.
Then we would like a similar list for teh other cases (a list of files, a glob....) and tests for those too. Please can you think about this and have a crack at it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests look ok, please see inline comments.
I don't see tests for 1., 2., 3., 5., 6. in my list above. Some of these will be covered by other functions, but it is good to comment on what and how.
Also, I don't see a similar list for the list and glob situations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, progress. Please see inline comments.
However, I don't see where you handle the lists of files. For example, if there is a list of files we presumably need to apply the correction to each file in the list and then write into the header of that file its parent file, not a glob list.
src/diffpy/labpdfproc/tools.py
Outdated
raise ValueError("Please specify valid input file or directory.") | ||
|
||
if not Path(args.input).is_dir(): | ||
input_dir = Path.cwd() / Path(args.input).parent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether this will fail if the user gives a filename with a path that doesn't include cwd? For example, I think a valid test would be:
cd /user/me/analysis
labpdfcor 2.5 /user/me/data/my_file.xy
Please could you add test for this situation and make sure it passes?
input_file_name = [os.path.basename(input_file_path) for input_file_path in input_files] | ||
setattr(args, "input_directory", input_dir) | ||
setattr(args, "input_file", input_file_name) | ||
return args |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
otherwise, I think this looks good.
…les for inputing a file list
I added tests for file list and edited help message for it. Currently, the program reads the file as a file list only if every line in it is a valid file within the same directory of the file list. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a slight tweak to the tests, and then I think we can move to close.
# Use cases can be found here: https://github.com/diffpy/diffpy.labpdfproc/issues/48
params_input = [
(["good_data.chi"], [".", ["good_data.chi"]]), # single good file, same directory
(["input_dir/good_data.chi"], ["input_dir", ["good_data.chi"]]), # single good file, input directory
( # glob current directory
["."],
[
".",
["good_data.chi", "good_data.xy", "good_data.txt", "unreadable_file.txt", "binary.pkl"],
],
),
( # glob input directory
["./input_dir"],
[
"input_dir",
["good_data.chi", "good_data.xy", "good_data.txt", "unreadable_file.txt", "binary.pkl"],
],
),
( # list of files provided
["input_dir/good_data.chi ./good_data.chi unreadable_file.txt missing_file.txt"], ["input_dir", ["good_data.chi", "good_data.xy", "good_data.txt"]]),
( # file_list.txt list of files provided
["file_list_dir/file_list.txt"], ["file_list_dir", ["./good_data.chi", "input_dir/good_data.chi", "./good_data.xy"]]),
]
btw, for lists and globs I think the best behavior would be to read as many files as possible and skip over the others, letting the user know which have been skipped. Later when we make a function to do the work and we write tests for it, let's have tests for all the same sets of inputs as here and check that the right files are written to the right places (so glob the directories after the corrections are applied and check the list is as expected) and check the printed outputs are correct for the skipped files. Also, let's make sure that the args that are written to the headers are correct, where the parent file is correctly identified (i.e., not just the input file list saved to the header, but also the actual file that was used for that particular correction. This will have to be set inside the loop). Please maybe copy-paste these comments to a new issue for that function. |
I added tests for testing file lists and multiple files. I think here I went a bit further to handle files in different directories... hope this is okay. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see my inline comments.
On balance the function looks a bit complicated. Let's make sure there is not a simpler way of doing it.
p.add_argument( | ||
"input", | ||
nargs="+", | ||
help="The filename or directory of the datafile to load. Required. " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to make it clearer maybe "The filename(s) or folder(s) of the datafile(s) to load. Required.....
The next line is good except doesn't the file with the list of files need to have a specific name?
Then I would give examples of valid inputs.
# This test covers existing single input file, directory, a file list, and multiple files | ||
# We store absolute path into input_directory and file names into input_file | ||
params_input = [ | ||
(["good_data.chi"], [".", "good_data.chi"]), # single good file, same directory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether it is better if we force the file-list to be a list, even if there is only one file or directory specified.
], | ||
), | ||
( # list of files provided (with invalid files and files in different directories) | ||
["input_dir/good_data.chi", "good_data.xy", "missing_file.txt"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't test the case where a file of the same name is in two different directories. I had a test for that. In principle they are distinct files (they could have different data in) and so we want to support this.
src/diffpy/labpdfproc/tools.py
Outdated
if Path(input).is_file(): | ||
input_paths.append(Path(input).resolve()) | ||
input_paths_parent.append(Path(input).resolve().parent) | ||
input_dir = Path(os.path.commonprefix([str(path) for path in input_paths_parent])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure this is the best way to do this? There is now pure Path() way to do it?
src/diffpy/labpdfproc/tools.py
Outdated
|
||
""" | ||
|
||
if len(args.input) > 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
force the input to be a list and remove this.
…tory only for simplication
I edited help message, args.input is now a list, edited the test for same file names in different directories (if this happens, we probably would want to change output name/directory too, since they are the same name in the same directory?). Now it seems that we only need input_directory to get absolute paths for each file (i.e. input_file_name is unncessary, I removed it). |
closes #23, #24