Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests for scale_to #211

Merged
merged 9 commits into from
Dec 14, 2024
Merged

tests for scale_to #211

merged 9 commits into from
Dec 14, 2024

Conversation

yucongalicechen
Copy link
Contributor

@yucongalicechen yucongalicechen commented Dec 8, 2024

closes #49, closes #186

Copy link

codecov bot commented Dec 8, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (5f67f6c) to head (53413fb).
Report is 39 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff             @@
##             main      #211      +/-   ##
===========================================
+ Coverage   99.69%   100.00%   +0.30%     
===========================================
  Files           8         8              
  Lines         325       352      +27     
===========================================
+ Hits          324       352      +28     
+ Misses          1         0       -1     
Files with missing lines Coverage Δ
tests/test_diffraction_objects.py 100.00% <100.00%> (ø)

... and 1 file with indirect coverage changes

@@ -390,14 +390,15 @@ def scale_to(self, target_diff_object, xtype=None, xvalue=None):

data = self.on_xtype(xtype)
target = target_diff_object.on_xtype(xtype)
if len(data[0]) == 0 or len(target[0]) == 0 or len(data[0]) != len(target[0]):
raise ValueError("I cannot scale two diffraction objects with empty or different lengths.")
Copy link
Contributor Author

@yucongalicechen yucongalicechen Dec 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sbillinge I added this error message here but I'm not sure if this is appropriate. Does it make more sense that we find xindex for each diffraction object? (see comment below)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the arrays are empty I think it will blow up anyway, right? so we don't need to blow it up?

If they are different lengths, I think we can still handle it maybe? If the xvalue given is outside one of the two different array lengths it will blow up. But in general I think it would be handy to be able to do this on different length arrays.

It does make it more complicated because we would have to interpolate one on to the other before doing the comparison. we could move that more awkward case to an issue on a later release, or just go for it. We can check. I think it kept adding and subtracting etc. to only between arrays on the same grids. But we often wan to compare data on different grids...that is kind of the point of these diffraction objects....makinng those tricky things easy. For example, I may have a diffraction pattern I got from a paper from my sample, and then I am at the synchrotron and I measure something and I want to know if it is what I am expecting and I just want to scale them and plot them on top of each other but it is a super hassle because they are on different length arrays and one is on tth and the other on q and so on on. The DO's are supposed to make those hassles all go away.

@sbillinge
Copy link
Contributor

@yucongalicechen I was explaining this functionality to colleagues in an email and the syntax that came most naturally to me was:

pyplot.plot(xpd_data.on_q[0], xpd_data.on_q[1]) 
pyplot.plot(lab_data.on_q[0], lab_data.scaled_to(xpd_data, q=10, offset=5).on_q[1])
pyplot.show()

which looks more intuitive to me (though less extensible) than lab_data.scaled_to(xpd_data, xvalue=10, xtype=q, offset=5).on_q[1]). Shall we do it this way? We will have to define as options tth=None and logic to figure out which one to use. It is a tradeoff, easier for the user vs. easier for us....

Also, please let's be in the habit of making the news early in the process to make it easier to see when tests are passing.

@yucongalicechen
Copy link
Contributor Author

@yucongalicechen I was explaining this functionality to colleagues in an email and the syntax that came most naturally to me was:

pyplot.plot(xpd_data.on_q[0], xpd_data.on_q[1]) 
pyplot.plot(lab_data.on_q[0], lab_data.scaled_to(xpd_data, q=10, offset=5).on_q[1])
pyplot.show()

which looks more intuitive to me (though less extensible) than lab_data.scaled_to(xpd_data, xvalue=10, xtype=q, offset=5).on_q[1]). Shall we do it this way? We will have to define as options tth=None and logic to figure out which one to use. It is a tradeoff, easier for the user vs. easier for us....

Also, please let's be in the habit of making the news early in the process to make it easier to see when tests are passing.

Yeah I also like this better!

yself = data[1][xindex]
scaled.on_tth[1] = data[1] * ytarget / yself
scaled.on_q[1] = data[1] * ytarget / yself
x_data, x_target = (np.abs(data[0] - xvalue)).argmin(), (np.abs(target[0] - xvalue)).argmin()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compute different indices for the two diffraction objects

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's change the variable name to xindex_data to remind us that these are indices.

@yucongalicechen
Copy link
Contributor Author

@sbillinge ready for another review

Copy link
Contributor

@sbillinge sbillinge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few comments

@@ -390,14 +390,15 @@ def scale_to(self, target_diff_object, xtype=None, xvalue=None):

data = self.on_xtype(xtype)
target = target_diff_object.on_xtype(xtype)
if len(data[0]) == 0 or len(target[0]) == 0 or len(data[0]) != len(target[0]):
raise ValueError("I cannot scale two diffraction objects with empty or different lengths.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the arrays are empty I think it will blow up anyway, right? so we don't need to blow it up?

If they are different lengths, I think we can still handle it maybe? If the xvalue given is outside one of the two different array lengths it will blow up. But in general I think it would be handy to be able to do this on different length arrays.

It does make it more complicated because we would have to interpolate one on to the other before doing the comparison. we could move that more awkward case to an issue on a later release, or just go for it. We can check. I think it kept adding and subtracting etc. to only between arrays on the same grids. But we often wan to compare data on different grids...that is kind of the point of these diffraction objects....makinng those tricky things easy. For example, I may have a diffraction pattern I got from a paper from my sample, and then I am at the synchrotron and I measure something and I want to know if it is what I am expecting and I just want to scale them and plot them on top of each other but it is a super hassle because they are on different length arrays and one is on tth and the other on q and so on on. The DO's are supposed to make those hassles all go away.


Returns
-------
the rescaled DiffractionObject as a new object

"""
scaled = deepcopy(self)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can use @bobleesj nice copy method to do this now. ACtually, it just does a deepcopy, but let's model syntax that we would like users to use..... so scaled = self.copy() or sthg like that?

@sbillinge
Copy link
Contributor

@yucongalicechen some of the comments might be out of date...i may have made them but not submitted. Please just use common sense and ask if not sure. If they don't make sense it is because we changed our desired api.

@yucongalicechen
Copy link
Contributor Author

@sbillinge ready for review

["q", np.array([1, 2, 4, 6])],
),
# UC4: different x-array lengths with approximate x-value match
(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A test example for scaling DOs with different array lengths. Here I think it makes more sense to scale them on q=61 (for self) & q=62 (for target).

@bobleesj
Copy link
Contributor

@yucongalicechen this one just needs to be reviewed right?

Copy link
Contributor

@sbillinge sbillinge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, forgot to finish it up

)
with pytest.raises(
ValueError, match="You can only specify one of 'q', 'tth', or 'd'. Please rerun specifying only one."
):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a test for error message

# scaling factor is calculated at index = 5 for self and index = 6 for target
["tth", np.array([1, 2, 3, 4, 5, 6, 10])],
),
# UC5: user did not specify anything, use the midpoint of the current object's q-array
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a test for specifying nothing

@yucongalicechen
Copy link
Contributor Author

@sbillinge ready for review!

Copy link
Contributor

@sbillinge sbillinge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please see inline

One other thought circling in my mind is to have the DO's carry around a kind of master array that kind of goes from 0 to q=45 (about the widest range we will ever encounter) with a fine grid and we interpolate the values onto there (linear or quadratic, whichever gives us sufficient accuracy. We could use it for various things, but this would be one of them because we could compute the scaling for the arrays on this grid (but apply the scaling to the data on all the grids). Don't make this change, but let's thin about it. The pro is it is cleaner and nicer and could be used for other things. The con is that we have to carry around another largish array in memory.

yself = data[1][xindex]
scaled.on_tth[1] = data[1] * ytarget / yself
scaled.on_q[1] = data[1] * ytarget / yself
x_data, x_target = (np.abs(data[0] - xvalue)).argmin(), (np.abs(target[0] - xvalue)).argmin()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's change the variable name to xindex_data to remind us that these are indices.

scaled.on_tth[1] = data[1] * ytarget / yself
scaled.on_q[1] = data[1] * ytarget / yself
x_data, x_target = (np.abs(data[0] - xvalue)).argmin(), (np.abs(target[0] - xvalue)).argmin()
y_data, y_target = data[1][x_data], target[1][x_target]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line makes things less readable. I would put the data[1][xindex_data] (and so on) directly in the expression below.


params_scale_to_bad = [
# UC1: user did not specify anything
(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the bad test case for specifying nothing

@yucongalicechen
Copy link
Contributor Author

please see inline

One other thought circling in my mind is to have the DO's carry around a kind of master array that kind of goes from 0 to q=45 (about the widest range we will ever encounter) with a fine grid and we interpolate the values onto there (linear or quadratic, whichever gives us sufficient accuracy. We could use it for various things, but this would be one of them because we could compute the scaling for the arrays on this grid (but apply the scaling to the data on all the grids). Don't make this change, but let's thin about it. The pro is it is cleaner and nicer and could be used for other things. The con is that we have to carry around another largish array in memory.

@sbillinge the new commit is ready for review. I like this idea and I think it will be very helpful for comparing absorption corrected curves. We can also have a function that allows user to specify which xarray to interpolate on?

with pytest.raises(
ValueError, match="You must specify exactly one of 'q', 'tth', or 'd'. Please rerun specifying only one."
):
orig_diff_object.scale_to(target_diff_object, q=inputs[8], tth=inputs[9], d=inputs[10], offset=inputs[11])
Copy link
Contributor

@bobleesj bobleesj Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having inputs up to ,etc inputs[10] does not appear scalable to me and I found this was very hard to read and maintain in diffpy.snmf which I had to refactor: https://github.com/diffpy/diffpy.snmf/pull/120/files#diff-1bd6af744434d75c63490430b955f577f60277dfe95e9ad716e3f808a2ed9d48L85-L87

Discussion here:
#225 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way to resolve this future nightware could be having reusable instances of DiffractionObject defined under conftest.py with specific UC cases. Then, we import these instances through the parameters in each test func. Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I agree in this case, this would be helpful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, to make it more readable we could also pass the inputs as a dict so it would read input["wavelenght"] instead of input[0]. The intent of the former is much clearer.

@sbillinge
Copy link
Contributor

please see inline
One other thought circling in my mind is to have the DO's carry around a kind of master array that kind of goes from 0 to q=45 (about the widest range we will ever encounter) with a fine grid and we interpolate the values onto there (linear or quadratic, whichever gives us sufficient accuracy. We could use it for various things, but this would be one of them because we could compute the scaling for the arrays on this grid (but apply the scaling to the data on all the grids). Don't make this change, but let's thin about it. The pro is it is cleaner and nicer and could be used for other things. The con is that we have to carry around another largish array in memory.

@sbillinge the new commit is ready for review. I like this idea and I think it will be very helpful for comparing absorption corrected curves. We can also have a function that allows user to specify which xarray to interpolate on?

yes, that had occured to me too. I think that was where my functions came from for building arrays using arange etc.. But for most flexibility for hte least code, I would suggest we just allow the user to pass in a master array. We can force them to do it on "q". If they want to specify a tth value to scale to we will have to convert.

@yucongalicechen will you put these on new issues? IN any case, let's put them on milestone 3.7.

Are you finished working on this PR and it can be merged?

Copy link
Contributor

@sbillinge sbillinge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please see inline.

the diffraction object you want to scale the current one onto

q, tth, d : float, optional, must specify exactly one of them
the xvalue (in `q`, `tth`, or `d` space) to align the current and target objects
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The value of the x-array where you want the curves to line up vertically. Specify a value on one of the allowed grids, q, tth, or d), e.g., q=10."

"You must specify exactly one of 'q', 'tth', or 'd'. Please rerun specifying only one."
)

xtype = "q" if q is not None else "tth" if tth is not None else "d" if d is not None else "q"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we drop the last "else "q""? given our validation above?

)

xtype = "q" if q is not None else "tth" if tth is not None else "d" if d is not None else "q"
data, target = self.on_xtype(xtype), target_diff_object.on_xtype(xtype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

split to two lines for greater readability

with pytest.raises(
ValueError, match="You must specify exactly one of 'q', 'tth', or 'd'. Please rerun specifying only one."
):
orig_diff_object.scale_to(target_diff_object, q=inputs[8], tth=inputs[9], d=inputs[10], offset=inputs[11])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, to make it more readable we could also pass the inputs as a dict so it would read input["wavelenght"] instead of input[0]. The intent of the former is much clearer.

@yucongalicechen
Copy link
Contributor Author

thanks for the suggestions @sbillinge @bobleesj . I've edited the docstring and tests to make the codes more readable. Also created a new issue #230 for the comment above. This PR is ready to be reviewed again.

@sbillinge sbillinge merged commit 75e0ef8 into diffpy:main Dec 14, 2024
5 checks passed
@yucongalicechen yucongalicechen deleted the scaleto branch December 15, 2024 17:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

write tests for diffraction object
3 participants