load input directory and files #50

yucongalicechen · 2024-05-07T13:52:32Z

closes #23, #24

…f no input file provided

sbillinge

please see my inline comment. Also, please double check to have one test per UC and put a comment which UC it addresses. Also in a comment above, copy-paste teh URL to the GH ussue where these are described
example of what I mean:

# Use cases can be found here: \https:github.com.......
["--input-file", "good_data.chi"], [".", "good_data.chi"]  # UC 1

I am only looking at tests, not the code yet, but I am not clear what you are testing, in other words, for the most simple one of user specifies a single file....what is it that you want the code to do after that is specified? This is what we want a test for. Right now, I don't see that in these tests. If you want to discuss it on slack, that is ok. Don't bother working on the implementation code until we have written the tests.

src/diffpy/labpdfproc/tests/test_tools.py

sbillinge

please see my inline comment.

src/diffpy/labpdfproc/labpdfprocapp.py

sbillinge · 2024-05-08T12:20:20Z

src/diffpy/labpdfproc/tests/test_tools.py

-params1 = [
+# Use cases can be found here: https://github.com/diffpy/diffpy.labpdfproc/issues/48
+params_input = [
+    (["good_data.chi"], [".", "good_data.chi"]),


I think we need to have a discussion, by comment thread of zoom if it is easier, about what we want the program to do given the inputs and therefore what we want to test. It seems to me that these tests are just testing something to do with metadata, but this PR is about much more than that. Here is a start:

single-file case:

check the file exists

read the file

if valid, compute the cve and process the data

if unreadable, error with helpful message

find the absolute path and store it in metadata

write this into the output file header

We want to make sure all these things are tested. Some will be tested by other functions. But we need tests here for all the things that won't be covered by other functions.

Then we would like a similar list for teh other cases (a list of files, a glob....) and tests for those too. Please can you think about this and have a crack at it.

sbillinge

The tests look ok, please see inline comments.

I don't see tests for 1., 2., 3., 5., 6. in my list above. Some of these will be covered by other functions, but it is good to comment on what and how.

Also, I don't see a similar list for the list and glob situations.

src/diffpy/labpdfproc/tests/test_tools.py

sbillinge

OK, progress. Please see inline comments.

However, I don't see where you handle the lists of files. For example, if there is a list of files we presumably need to apply the correction to each file in the list and then write into the header of that file its parent file, not a glob list.

sbillinge · 2024-05-08T18:38:59Z

src/diffpy/labpdfproc/tools.py

+        raise ValueError("Please specify valid input file or directory.")
+
+    if not Path(args.input).is_dir():
+        input_dir = Path.cwd() / Path(args.input).parent


I wonder whether this will fail if the user gives a filename with a path that doesn't include cwd? For example, I think a valid test would be:

cd /user/me/analysis labpdfcor 2.5 /user/me/data/my_file.xy

Please could you add test for this situation and make sure it passes?

sbillinge · 2024-05-08T18:39:14Z

src/diffpy/labpdfproc/tools.py

+        input_file_name = [os.path.basename(input_file_path) for input_file_path in input_files]
+    setattr(args, "input_directory", input_dir)
+    setattr(args, "input_file", input_file_name)
+    return args


otherwise, I think this looks good.

…les for inputing a file list

yucongalicechen · 2024-05-09T03:30:54Z

I added tests for file list and edited help message for it. Currently, the program reads the file as a file list only if every line in it is a valid file within the same directory of the file list.
I'll add some other tests for file list containing absolute paths of the files (all in the same directory as file list, or all in another directory). From here we can print error if some files are not in the same directory?

sbillinge

Here is a slight tweak to the tests, and then I think we can move to close.

# Use cases can be found here: https://github.com/diffpy/diffpy.labpdfproc/issues/48

params_input = [
    (["good_data.chi"], [".", ["good_data.chi"]]),   # single good file, same directory
    (["input_dir/good_data.chi"], ["input_dir", ["good_data.chi"]]),  # single good file, input directory
    (  # glob current directory
        ["."],
        [
            ".",
            ["good_data.chi", "good_data.xy", "good_data.txt", "unreadable_file.txt", "binary.pkl"],
        ],
    ),
    (  # glob input directory
        ["./input_dir"],
        [
            "input_dir",
            ["good_data.chi", "good_data.xy", "good_data.txt", "unreadable_file.txt", "binary.pkl"],
        ],
    ),
    (  # list of files provided
        ["input_dir/good_data.chi ./good_data.chi unreadable_file.txt missing_file.txt"], ["input_dir", ["good_data.chi", "good_data.xy", "good_data.txt"]]),
    (  # file_list.txt list of files provided
        ["file_list_dir/file_list.txt"], ["file_list_dir", ["./good_data.chi", "input_dir/good_data.chi",  "./good_data.xy"]]),
]

sbillinge · 2024-05-09T11:33:21Z

btw, for lists and globs I think the best behavior would be to read as many files as possible and skip over the others, letting the user know which have been skipped.

Later when we make a function to do the work and we write tests for it, let's have tests for all the same sets of inputs as here and check that the right files are written to the right places (so glob the directories after the corrections are applied and check the list is as expected) and check the printed outputs are correct for the skipped files. Also, let's make sure that the args that are written to the headers are correct, where the parent file is correctly identified (i.e., not just the input file list saved to the header, but also the actual file that was used for that particular correction. This will have to be set inside the loop).

Please maybe copy-paste these comments to a new issue for that function.

yucongalicechen · 2024-05-09T22:44:08Z

I added tests for testing file lists and multiple files. I think here I went a bit further to handle files in different directories... hope this is okay.
Don't know if this is the best for distinguishing between a file list and a data file: currently I read the first line of a file and if it is not a existing file name, I treat it as a data file.
I added your comment above to #52.

sbillinge

Please see my inline comments.

On balance the function looks a bit complicated. Let's make sure there is not a simpler way of doing it.

sbillinge · 2024-05-10T02:52:05Z

src/diffpy/labpdfproc/labpdfprocapp.py

+    p.add_argument(
+        "input",
+        nargs="+",
+        help="The filename or directory of the datafile to load. Required. "


to make it clearer maybe "The filename(s) or folder(s) of the datafile(s) to load. Required.....

The next line is good except doesn't the file with the list of files need to have a specific name?

Then I would give examples of valid inputs.

sbillinge · 2024-05-10T02:54:12Z

src/diffpy/labpdfproc/tests/test_tools.py

+# This test covers existing single input file, directory, a file list, and multiple files
+# We store absolute path into input_directory and file names into input_file
+params_input = [
+    (["good_data.chi"], [".", "good_data.chi"]),  # single good file, same directory


I wonder whether it is better if we force the file-list to be a list, even if there is only one file or directory specified.

sbillinge · 2024-05-10T02:55:11Z

src/diffpy/labpdfproc/tests/test_tools.py

+        ],
+    ),
+    (  # list of files provided (with invalid files and files in different directories)
+        ["input_dir/good_data.chi", "good_data.xy", "missing_file.txt"],


this doesn't test the case where a file of the same name is in two different directories. I had a test for that. In principle they are distinct files (they could have different data in) and so we want to support this.

sbillinge · 2024-05-10T02:58:59Z

src/diffpy/labpdfproc/tools.py

+            if Path(input).is_file():
+                input_paths.append(Path(input).resolve())
+                input_paths_parent.append(Path(input).resolve().parent)
+        input_dir = Path(os.path.commonprefix([str(path) for path in input_paths_parent]))


Are you sure this is the best way to do this? There is now pure Path() way to do it?

sbillinge · 2024-05-10T03:00:26Z

src/diffpy/labpdfproc/tools.py

+
+    """
+
+    if len(args.input) > 1:


force the input to be a list and remove this.

…tory only for simplication

yucongalicechen · 2024-05-10T04:36:48Z

I edited help message, args.input is now a list, edited the test for same file names in different directories (if this happens, we probably would want to change output name/directory too, since they are the same name in the same directory?). Now it seems that we only need input_directory to get absolute paths for each file (i.e. input_file_name is unncessary, I removed it).
I simplied the input function a bit: for every input, we check if it is directory/datafile/file list, use try/except to raise an error message if any file is invalid, and raise an error if all files are invalid.

yucongalicechen added 2 commits May 7, 2024 09:49

current progress on loading specified input file and glob all files i…

b61906b

…f no input file provided

merged updates from main

f7d1174

sbillinge mentioned this pull request May 7, 2024

closes #24: check if args.input_file is specified #35

Closed

modified input file function to accept either one file or one directory

d164dba

sbillinge reviewed May 7, 2024

View reviewed changes

src/diffpy/labpdfproc/tests/test_tools.py Outdated Show resolved Hide resolved

yucongalicechen added 2 commits May 7, 2024 12:21

added test cases for UC1-4 and made input a required argument

3c49a18

added more test cases

f8b7203

sbillinge reviewed May 8, 2024

View reviewed changes

included comments for tests

3d5c5ee

sbillinge reviewed May 8, 2024

View reviewed changes

added tests for a file list and edited help message addressing the ru…

06ae14b

…les for inputing a file list

sbillinge reviewed May 9, 2024

View reviewed changes

added tests for file list and multiple files

a02b573

fix grammar

7d39a74

sbillinge reviewed May 10, 2024

View reviewed changes

intermediate process (more tests need to be added): using input_direc…

ead5830

…tory only for simplication

yucongalicechen closed this May 10, 2024

yucongalicechen deleted the input_dir2 branch May 13, 2024 23:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

load input directory and files #50

load input directory and files #50

yucongalicechen commented May 7, 2024

sbillinge left a comment

sbillinge left a comment

sbillinge May 8, 2024

sbillinge left a comment

sbillinge left a comment

sbillinge May 8, 2024

sbillinge May 8, 2024

yucongalicechen commented May 9, 2024 •

edited

Loading

sbillinge left a comment

sbillinge commented May 9, 2024

yucongalicechen commented May 9, 2024 •

edited

Loading

sbillinge left a comment

sbillinge May 10, 2024

sbillinge May 10, 2024

sbillinge May 10, 2024

sbillinge May 10, 2024

sbillinge May 10, 2024

yucongalicechen commented May 10, 2024

load input directory and files #50

load input directory and files #50

Conversation

yucongalicechen commented May 7, 2024

sbillinge left a comment

Choose a reason for hiding this comment

sbillinge left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sbillinge left a comment

Choose a reason for hiding this comment

sbillinge left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yucongalicechen commented May 9, 2024 • edited Loading

sbillinge left a comment

Choose a reason for hiding this comment

sbillinge commented May 9, 2024

yucongalicechen commented May 9, 2024 • edited Loading

sbillinge left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yucongalicechen commented May 10, 2024

yucongalicechen commented May 9, 2024 •

edited

Loading

yucongalicechen commented May 9, 2024 •

edited

Loading