-
Notifications
You must be signed in to change notification settings - Fork 17
Squeeze morph: Adding UCs tests #182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
beautifully done....a thing of beauty!!!
please see my inline comments
|
||
Configuration Variables | ||
----------------------- | ||
squeeze |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More recently we have been using this syntax:
squeeze : list
The polynomial coefficients [a0, a1, ..., an] for the squeeze function where the polynomial would be
of the form a0 + a1 x + a2 x ^2 and so on. The order of the polynomial is determined by the length
of the list.
I removed the "array-like". Do we want to allow that or we just want a list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just have a list here, I think makes sense and is consistent.
|
||
This applies a polynomial to squeeze the morph non-linearly. The morphed | ||
data is returned on the same grid as the unmorphed data. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it may be nice to have a usage example in the docstring. Basically copy-paste the code from the test to show how the morph can be instantiated and then used.
We normally put docstrings in the methods and in the constructor (the def __init__():
). This class is inheriting the constructor from the base class, so I am not 100% sure how the docstring propagates through in the documentation, but without that, but I think if we put a docstring under the def morph():
which is overloading that method from the base class, it should become clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I was following the architecture of other morphs. They have a docstring bellow the class which is the main description and then another docstring under the function which are few words. Should I keep it consistent?
When I add an example in the docstring do I add it as code using >>>?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have a group standard for this, but more recently I think we are leaning towards having the class-level docstring to be just a few words about the high level intent of the class and the docstring of the constructor to be have more detail. I did look into this question at one point and formed an opinion, but I don't fully remember what I found. It could have been when I was working on my beloved "DiffractionObject"s in diffpy.utils but I am not sure. We could look there. All documentation is better than lack of documentation, so the standards are just for us to figure out the most effective (user-useful) way to write the docs so that any time we spend on it has the most impact.
tests/test_morphsqueeze.py
Outdated
# UCs from issue 181: https://github.com/diffpy/diffpy.morph/issues/181 | ||
# UC2: Same grid | ||
(np.linspace(0, 10, 101), np.linspace(0, 10, 101)), | ||
# UC4: Target extends beyond morph |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add "same grid density" in the comment? likewise below on all the comments.
Do we have any where the grid density varies? I think we want tests for where the target is finer and when the morph is finer. We probably don't need to do the full matrix of longer/shorter/finer/coarser I think, so just for the finer/coarser tests they can be arrays over the same range.
We can discuss what we want the code to do in each of those cases. If one is numerically unstable or leads to inaccuracies we can choose to raise an exception and not allow users to do it as our desired behavior, but we should decide what we want to happen and have a test for that.
If we do raise an exception it goes in a separate test that we often give a name like test_function_bad()
to differentiate it from test_function()
and we check that if a condition is met (something coarser and we don't like that) then the right exception (and helpful error message) is raised.
We like error messages that are of the form "such and such went wrong. Try doing so and so to fix it", as this is most helpful to the user in general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we have those cases covered. We have UC6 where the target is finer grid density and then UC8 where the morph is finer grid density. I'll update the comments in the test definitions to make this clearer, including specifying the grid density differences explicitly.
I think it's fine for the morph and target to have different grid densities in these tests, since we're not performing operations that compare the two arrays. That will become critical during refinement, where the grid densities will affect comparison between morph and target?.
I have edited the comments and docstrings as you mentioned. The test covers same grids as well as different grids for target vs morph inputs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lovely! One last comment on the docstring, but let's push on and implement the functionality!
>>> from numpy.polynomial import Polynomial | ||
>>> from diffpy.morph.morphs.morphsqueeze import MorphSqueeze | ||
|
||
>>> x_target = np.linspace(0, 10, 101) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove this block of text because it will confuse people. This is us building a target, but the assumption is that the user already has a target, so none of this code is needed by people just to use the morph.
>>> poly = Polynomial(squeeze_coeff) | ||
>>> y_morph = np.sin(x_morph + poly(x_morph)) | ||
|
||
>>> morph = MorphSqueeze() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this block is what we are after. We could add a few words about what the variables mean. Maybe also add a couple of lines that show other attributes that the user can access from the morph
instance. For example things like
x_morph_in = morph.x_morph_in
etc. No need to be exhaustive, just give a few examples to help users.
btw, this looks great in general. I am not sure if we have a "group standard" for these code blocks. I think that it is possible to write htese actually so that they can be run by some automated testing codes but we don't really do that. I just feel that, as a user, if there is a few lines of code showing how to use this, it can be super helpful.
@@ -25,17 +25,17 @@ | |||
] | |||
morph_target_grids = [ | |||
# UCs from issue 181: https://github.com/diffpy/diffpy.morph/issues/181 | |||
# UC2: Same grid | |||
# UC2: Same range and same grid density |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now fantastic! I love these tests. Future us will love you.
|
||
This applies a polynomial to squeeze the morph non-linearly. The morphed | ||
data is returned on the same grid as the unmorphed data. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have a group standard for this, but more recently I think we are leaning towards having the class-level docstring to be just a few words about the high level intent of the class and the docstring of the constructor to be have more detail. I did look into this question at one point and formed an opinion, but I don't fully remember what I found. It could have been when I was working on my beloved "DiffractionObject"s in diffpy.utils but I am not sure. We could look there. All documentation is better than lack of documentation, so the standards are just for us to figure out the most effective (user-useful) way to write the docs so that any time we spend on it has the most impact.
I have moved the main docstring below the function definition and the class-level docstring is the short description. I have also added comments on the example. |
Also, I just realized that some of my input 3rd order degree coefficients are too large and the x-array is not strictly increasing and loops back. When we work on the functionality, an idea is that we could add a warning statement when the x-array doesn't increase due to high values of high order polynomials and just return the input data and tell the user to change the initial values for the coefficients and run it again |
I think we are ready to write the code now. Regarding tolerances, the kind of behavior I would want is for the tolerances to be very tight on non-extrapolated regions and I could tolerate a slightly larger tolerance on extrapolated data (That is tagged as such as we discussed). It is probably a mistake to allow loose tolerances on all points to all tests to pass for a few points at the ends. For these unit tests, I think it just computes a squeezed function so I think it is not really extrapolating anything, just computing it over a wider range. maybe use 10-6 or sthg? |
This is great. Honestly, I am not sure if we need this for these tests because no regression is going on. We are just making sure that a known function is correctly computed. Let's put tight tolerances on the tests and then just see if they pass when we write the code. If not, we can go back. Since we decided we would capture the indices (and values?) of the points where the extrapolation occurs, we should write quite tests for those too. What shall we save to those attributes when there is no extrapolation? I think tl;dr: I like this approach but not sure if it is needed, so let's keep this in our back pocket in case we need it. But let's write tests for the morph low and high extrapolation indices/values. |
btw, we should keep going on this PR. |
Our tests do involve extrapolation. The morph function uses |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nearly there. Few comments
Returns | ||
------- | ||
A tuple (x_morph_out, y_morph_out, x_target_out, y_target_out, | ||
min_index, max_index) where the target values remain the same and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need to return min and max index. These can be just updated in place. Also, these names are not so intuitive to me. maybe something like extrap_index_low
and extrap_index_high
? Let's define these and set them to None
when MorphSqueeze
is instantiated.
) | ||
left_extrap = np.where(self.x_morph_in < x_squeezed[0])[0] | ||
right_extrap = np.where(self.x_morph_in > x_squeezed[-1])[0] | ||
min_index = left_extrap[-1] if left_extrap.size else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.extrap_index_low
(or left) and don't return it.
Describe this variable in the constructor where it is first defined.
right_extrap = np.where(self.x_morph_in > x_squeezed[-1])[0] | ||
min_index = left_extrap[-1] if left_extrap.size else None | ||
max_index = right_extrap[0] if right_extrap.size else None | ||
return self.xyallout + (min_index, max_index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apart from it being a bad idea in general, this breaks backwards compatibility.
tests/test_morphsqueeze.py
Outdated
low_extrap_idx, | ||
high_extrap_idx, | ||
) = morph(x_morph, y_morph, x_target, y_target) | ||
if low_extrap_idx is None and high_extrap_idx is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems too complicated to me, and also does it test the extrapolation? How about something like
if low_extrap_idx is None:
low_extrap_idx = 0
elif high_extrap_idx is None:
high_extrap_idx = -1
assert np.allclose(y_morph_actual[low_extrap_index:high_extrap_idx], y_morph_expected[low_extrap_index:high_extrap_idx], atol=1e-6)
assert np.allclose(y_morph_actual[:low_extrap_index-1], y_morph_expected[:low_extrap_index-1], atol=1e-5)
assert np.allclose(y_morph_actual[high_extrap_index+1:], y_morph_expected[high_extrap_index+1:], atol=1e-5)
assert morph.low_extrap_idx == expected_low_extrap_idx
assert morph.high_extrap_idx == expected_high_extrap_idx
Hi Simon, thank you very much for the feedback! I have updated the test and the function accordingly. There are some tests that are failing and I am pretty sure is because there are cases where the extrapolation is large and inaccurate. |
that's interesting in its own right. What do you think is a good behavior? One thing we can do in principle is to allow a user to specify a tolerance and alert them when an extrapolation exceeds that tolerance. We could then have a default tolerance for the interpolation and for the extrapolation. That could actually be a nice "scientifically rigorous" solution. However, it is making things more complicated. Maybe a suggestion would be to
|
I like your idea where the user can specify tolerance and we can warn them if it exceeds that tolerance for extrapolated values. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #182 +/- ##
==========================================
+ Coverage 96.85% 96.99% +0.13%
==========================================
Files 18 19 +1
Lines 795 831 +36
==========================================
+ Hits 770 806 +36
Misses 25 25
🚀 New features to boost your workflow:
|
super!!!!! got it merged. That was a huge job but we learned a lot about our morph functions so it was time well spent. We should decide what is next. We could
We started this so we could do the thing of morphing a low-Qmax F(Q) onto a high Qmax one, and I think that we want (2) to do that, then for the actual UC for the function to be the function in PDFgetX3 that applies the corrections to get the F(Q)..... Thoughts? |
Thanks again for all your mentoring! |
Regarding the next steps, I think we should develop both a "multiplicative" morph and an "additive" morph to convert from I(q) to F(q) given an F(q) target (for our XFEL UC). |
I have written different UCs tests from issue 181 for the squeeze morph. I am keeping the morphed x-axis the same as the unmorphed as we discussed. I have left the morph function empty. My inputs are a uniform target, and a squeezed morph, then the expected morph is the unsqueezed morph.