Refactor serialize_data with better param names and logic #222

bobleesj · 2024-12-13T05:39:37Z

No description provided.

github-actions · 2024-12-13T05:40:04Z

Warning! No news item is found for this PR. If this is a user-facing change/feature/fix,
please add a news item by copying the format from news/TEMPLATE.rst.

bobleesj · 2024-12-13T05:40:26Z

tests/test_serialization.py

    for headerfile in tlm_list:
        # gather data using loadData
        hdata = loadData(headerfile, headers=True)
        data_table = loadData(headerfile)

        # check path extraction
-        generated_data = serialize_data(headerfile, hdata, data_table, dt_colnames=["r", "gr"], show_path=True)
+        generated_data = serialize_data(headerfile, hdata, data_table, dt_col_names=["r", "gr"], show_path=True)


df_col_names and df_colnames are used across, so sticking with dt_col_names

bobleesj · 2024-12-13T05:42:03Z

src/diffpy/utils/parsers/serialization.py

+
+    if dt_col_names is not None:
+        dt_col_names_count = len(dt_col_names)
+        if dt_max_col_count < dt_col_names_count:  # assume numpy.loadtxt gives non-irregular array


Compare the number of col_names values provided as an input for this function against the number of col determined from the data table.

bobleesj

@sbillinge ready for review. I was looking into this issue: #27

Before addressing this issue, I had to understand the code and thought I could help refactor this.

codecov · 2024-12-13T05:43:12Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.69%. Comparing base (5f67f6c) to head (d940b64).
Report is 49 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #222      +/-   ##
==========================================
- Coverage   99.69%   99.69%   -0.01%     
==========================================
  Files           8        8              
  Lines         325      324       -1     
==========================================
- Hits          324      323       -1     
  Misses          1        1

Files with missing lines	Coverage Δ
tests/test_serialization.py	`100.00% <100.00%> (ø)`

src/diffpy/utils/parsers/serialization.py

bobleesj · 2024-12-13T05:47:48Z

@sbillinge ready for review. I was looking into this issue: #27

Before addressing this issue, I had to understand the code and thought I could help refactor this.

Ready for review

sbillinge

I started to review this, but @bobleesj like you I am a bit confused what is going on here right from the get-go. The function is called serialize_data but it seems to be reading data-file. This seems closer to deserializing data, not serializing it. We may want to get @Sparks29032 in on the convo here, I think he may have written it.

I think we probably want code that will dump to different file formats and read from different formats. These are dumpers and loaders. If we serialize and deserialize things I think of it as a kind of (safe) round trip. the unsafe things is just to pickle and then unpickle but we need to be more careful now. But sinnce we want people using diffraction objects, the serializers that most interesting to me would be ones that serialize and deserialize complete diffraction objects.

sbillinge · 2024-12-13T09:50:05Z

src/diffpy/utils/parsers/serialization.py

    hdata: dict
-        File metadata (generally related to data table).
+        The file metadata (generally related to data table).


is there a reason we are not calling this variable metadata? or header_data. calling it hdata I am having to carry around too many pointers in my brain that I have to cognitively resolve as I read.

The hdata came from loadData's naming of hdata for header data information. Good to change to metadata.

sbillinge · 2024-12-13T09:51:27Z

src/diffpy/utils/parsers/serialization.py

    data_table: list or ndarray
-        Data table.
-    dt_colnames: list
+        The data table.


what is a data table? A table with data on it? I think this is the something like the column data read from the file or something.

These serialization functions were coded in tandem with loadData, so the inputs of these match outputs of loadData. This assumes files structured like .chi/.gr/etc. files.

bobleesj · 2024-12-13T16:31:43Z

@Sparks29032 when you have time this Sat, is the serialize_data method used in any of the diffpy packages? Wondering the motivation and how it's used in practice.

Sparks29032 · 2024-12-15T01:10:53Z

I think this was originally added for the pdfitc rest-api project. I don't believe it is implemented in any packages.

These are dumpers and loaders, but called serialization/deserialization as we are moving from serial files to python-readable information.

This was coded a while back, so I don't remember all the details, but I believe these were coded specifically to dump and load PDF data (hence a data table [list type] + header information [dictionary type]) into serial file formats like .json.

Moving forward, it might be good to standardize so only dictionaries can be passed into these functions, and the dumpers and loaders can use specific file-format libraries given the extension of the file.

bobleesj · 2024-12-15T04:09:03Z

Thanks @Sparks29032

I think this was originally added for the pdfitc rest-api project. I don't believe it is implemented in any packages.

To confirm, this function is used by pdfitc? Wasn't sure if the source code for pdffitc is publicly available.

load PDF data (hence a data table [list type] + header information

Yes, there are test cases for loading .gr files and turning these into a single .json file.

@sbillinge As the next step, since we have an issues to write a DiffractionObject serializer/deserializer (#195) in 3.7.0, should we close the following issue #27 that this PR is trying to slowly address?

Sparks29032 · 2024-12-15T04:40:09Z

@bobleesj It is not used by pdfitc, don't worry. It was experimental.

sbillinge · 2024-12-15T06:55:54Z

We definitely want the dumpers and loaders and a pure serializer. We can put it off to 3.7 though to keep momentum. The serializer will change the API though because I would like to give as an option a file path, or probably better a stream so we can have different backends, and have it load the DO from there. So the serializer would be nice now I guess?

sbillinge · 2024-12-15T06:57:47Z

Dumpers and loaders to group standards file formats as we have are a bit specific, but probably worth having since the large community is using these file formats

bobleesj · 2024-12-15T07:20:03Z

Since this is a PR, will copy and paste the comments by @sbillinge in the existing issue here #195.

bobleesj · 2024-12-15T07:20:34Z

Closing - we will probably make a new PR when we are dealing with 3.7.0. For now, this PR isn't needed for 3.6.0 release.

Refactor serialize_data with better param names and logic

dc6a06e

bobleesj commented Dec 13, 2024

View reviewed changes

[pre-commit.ci] auto fixes from pre-commit hooks

d940b64

bobleesj commented Dec 13, 2024

View reviewed changes

bobleesj marked this pull request as ready for review December 13, 2024 05:43

bobleesj commented Dec 13, 2024

View reviewed changes

src/diffpy/utils/parsers/serialization.py Show resolved Hide resolved

sbillinge reviewed Dec 13, 2024

View reviewed changes

bobleesj mentioned this pull request Dec 13, 2024

separate data validation and serialization in serialize.py #27

Open

bobleesj closed this Dec 15, 2024

bobleesj mentioned this pull request Dec 15, 2024

write a DiffractionObject serializer/deserializer #195

Open

bobleesj deleted the validate branch December 15, 2024 07:21

Refactor serialize_data with better param names and logic #222

Refactor serialize_data with better param names and logic #222

Uh oh!

Conversation

bobleesj commented Dec 13, 2024

Uh oh!

github-actions bot commented Dec 13, 2024

Uh oh!

bobleesj Dec 13, 2024

Choose a reason for hiding this comment

Uh oh!

bobleesj Dec 13, 2024

Choose a reason for hiding this comment

Uh oh!

bobleesj left a comment

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Dec 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

bobleesj commented Dec 13, 2024

Uh oh!

sbillinge left a comment

Choose a reason for hiding this comment

Uh oh!

sbillinge Dec 13, 2024

Choose a reason for hiding this comment

Uh oh!

Sparks29032 Dec 15, 2024

Choose a reason for hiding this comment

Uh oh!

sbillinge Dec 13, 2024

Choose a reason for hiding this comment

Uh oh!

Sparks29032 Dec 15, 2024

Choose a reason for hiding this comment

Uh oh!

bobleesj commented Dec 13, 2024

Uh oh!

Sparks29032 commented Dec 15, 2024

Uh oh!

bobleesj commented Dec 15, 2024

Uh oh!

Sparks29032 commented Dec 15, 2024

Uh oh!

sbillinge commented Dec 15, 2024

Uh oh!

sbillinge commented Dec 15, 2024

Uh oh!

bobleesj commented Dec 15, 2024

Uh oh!

bobleesj commented Dec 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Dec 13, 2024 •

edited

Loading

bobleesj commented Dec 15, 2024 •

edited

Loading