-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor serialize_data with better param names and logic #222
Conversation
Warning! No news item is found for this PR. If this is a user-facing change/feature/fix, |
for headerfile in tlm_list: | ||
# gather data using loadData | ||
hdata = loadData(headerfile, headers=True) | ||
data_table = loadData(headerfile) | ||
|
||
# check path extraction | ||
generated_data = serialize_data(headerfile, hdata, data_table, dt_colnames=["r", "gr"], show_path=True) | ||
generated_data = serialize_data(headerfile, hdata, data_table, dt_col_names=["r", "gr"], show_path=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
df_col_names
and df_colnames
are used across, so sticking with dt_col_names
|
||
if dt_col_names is not None: | ||
dt_col_names_count = len(dt_col_names) | ||
if dt_max_col_count < dt_col_names_count: # assume numpy.loadtxt gives non-irregular array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compare the number of col_names
values provided as an input for this function against the number of col determined from the data table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sbillinge ready for review. I was looking into this issue: #27
Before addressing this issue, I had to understand the code and thought I could help refactor this.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #222 +/- ##
==========================================
- Coverage 99.69% 99.69% -0.01%
==========================================
Files 8 8
Lines 325 324 -1
==========================================
- Hits 324 323 -1
Misses 1 1
|
Ready for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started to review this, but @bobleesj like you I am a bit confused what is going on here right from the get-go. The function is called serialize_data
but it seems to be reading data-file. This seems closer to deserializing data, not serializing it. We may want to get @Sparks29032 in on the convo here, I think he may have written it.
I think we probably want code that will dump to different file formats and read from different formats. These are dumpers and loaders. If we serialize and deserialize things I think of it as a kind of (safe) round trip. the unsafe things is just to pickle and then unpickle but we need to be more careful now. But sinnce we want people using diffraction objects, the serializers that most interesting to me would be ones that serialize and deserialize complete diffraction objects.
hdata: dict | ||
File metadata (generally related to data table). | ||
The file metadata (generally related to data table). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason we are not calling this variable metadata
? or header_data
. calling it hdata
I am having to carry around too many pointers in my brain that I have to cognitively resolve as I read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hdata came from loadData's naming of hdata for header data information. Good to change to metadata.
data_table: list or ndarray | ||
Data table. | ||
dt_colnames: list | ||
The data table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is a data table? A table with data on it? I think this is the something like the column data read from the file or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These serialization functions were coded in tandem with loadData, so the inputs of these match outputs of loadData. This assumes files structured like .chi/.gr/etc. files.
@Sparks29032 when you have time this Sat, is the |
I think this was originally added for the These are dumpers and loaders, but called serialization/deserialization as we are moving from serial files to python-readable information. This was coded a while back, so I don't remember all the details, but I believe these were coded specifically to dump and load PDF data (hence a data table [list type] + header information [dictionary type]) into serial file formats like Moving forward, it might be good to standardize so only dictionaries can be passed into these functions, and the dumpers and loaders can use specific file-format libraries given the extension of the file. |
Thanks @Sparks29032
To confirm, this function is used by
Yes, there are test cases for loading @sbillinge As the next step, since we have an issues to write a DiffractionObject serializer/deserializer (#195) in |
@bobleesj It is not used by pdfitc, don't worry. It was experimental. |
We definitely want the dumpers and loaders and a pure serializer. We can put it off to 3.7 though to keep momentum. The serializer will change the API though because I would like to give as an option a file path, or probably better a stream so we can have different backends, and have it load the DO from there. So the serializer would be nice now I guess? |
Dumpers and loaders to group standards file formats as we have are a bit specific, but probably worth having since the large community is using these file formats |
Since this is a PR, will copy and paste the comments by @sbillinge in the existing issue here #195. |
Closing - we will probably make a new PR when we are dealing with |
No description provided.