Skip to content

USE CASES #181

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Luiskitsu opened this issue Apr 10, 2025 · 13 comments
Open

USE CASES #181

Luiskitsu opened this issue Apr 10, 2025 · 13 comments

Comments

@Luiskitsu
Copy link
Contributor

Luiskitsu commented Apr 10, 2025

Generate squeeze morph tests for different user cases.

UC1: Original UC

  1. Caden has a set of PDFs measured as a function of temperature and wants to remove benign temperature effects
  2. Caden runs morph giving a target name from the set and a list of other files to morph to it
  3. morph morphs the morphs onto the target and return morph objects
  4. Caden saves as output morph parameter vs. file and the morphed curves written to file

UC2: as UC1 but programmatic API

  1. UC1.1-1.4 but Caden wants to run the code through an API in jupyter so info is input and output as np arrays and python objects

UC3: morph to synchrotron data

  1. Luis has a synchrotron diffraction pattern and is collecting xfel data over a narrower Q-range
  2. Luis runs morph on a single xfel dataset morphing it to the synchrotron data
  3. Luis saves the morphed pattern written to file and the morph parameters. All the morphed files are on the same Q-grid as the unmorphed XFEL data

UC4: as UC3 but using the API

  1. Luis wants to do UC3 working in jupyter using the API and get the data as python objects.

UC5: as UC3 but xfel data is in coarser grid than synchrotron data

  1. Luis collects xfel data over a narrower Q-range and with a coarser grid compared to synchrotron
  2. Luis runs morph xfel dataset, morphing xfel to the synchrotron data
  3. Luis saves the morphed pattern written to file and the morph parameters. All the morphed files are on the same Q-grid as the unmorphed XFEL data

UC6: as UC5 but using the API

  1. Luis wants to do UC5 working in jupyter using the API and get the data as python objects.

UC7: as UC3 but xfel data is in finer grid than synchrotron data

  1. Luis collects xfel data over a narrower Q-range and with a finer grid compared to synchrotron
  2. Luis runs morph xfel dataset, morphing xfel to the synchrotron data
  3. Luis saves the morphed pattern written to file and the morph parameters. All the morphed files are on the same Q-grid as the unmorphed XFEL data

UC8: as UC7 but using the API

  1. Luis wants to do UC7 working in jupyter using the API and get the data as python objects.

UC9: as UC3 but Qmin xfel < Qmin synchrotron

  1. Luis collects xfel data over a narrower Q-range but with a lower Qmin compared to synchrotron
  2. Luis runs morph xfel dataset, morphing xfel to the synchrotron data
  3. Luis saves the morphed pattern written to file and the morph parameters. All the morphed files are on the same Q-grid as the unmorphed XFEL data

UC10: as UC5 but using the API

  1. Luis wants to do UC9 working in jupyter using the API and get the data as python objects.

UC11: morph to MD simulated data

  1. Keith has scattering pattern simulated via molecular dynamics and wants to compare it against several lab-based scattering patterns that cover a broader Q-range than the simulated data.
  2. Keith runs morph on all the lab-based datasets, morphing them to the simulated data
  3. Keith saves as output morph parameter vs. file and the morphed data written to file. All the morphed files are on the same Q-grid as the unmorphed XFEL data

UC12: as UC5 but using the API

  1. Keith wants to do UC11.1-11.3 in jupyter notebook using the API and get the data as python objects
@sbillinge
Copy link
Contributor

These UCs are great. They need to always be in the form "Actor does Action on something". Only in the first one are we allowed an "Actor wants something", which scopes that UC. Please could you edit your UCs to keep the basic idea hyou had, but to put them into that format . You will find that following this rule has amazing properties of capturing critical details that really help when we come to the implementation step.

@sbillinge
Copy link
Contributor

I realize I broke my own rule a bit there too letting you want something in bullet ~4, so please can you fix that too..... Just change it to "Luis saves the output to file", so the "want" part is superfluous (or you wouldn't have saved it....).

Note, there is an art to choosing the actor too. I generally recommend picking a real person but that person should "exemplify" the target audience. We would have to write quite different code if the user is a high-school student or a grad student or Mike Toney, for example. Since the success definition is that our target audience is happy, it is really helpful to have a person in mind and to think what would make them happy.....

@sbillinge
Copy link
Contributor

finally, do you agree that as a general matter that we want to return the morphed data on the same grid as the input morph data? I think this makes the most sense to me from the point of view of the user. I have a pattern and I just want it back again on the same grid but slightly modified.

What we fit to what on what grid under the hood, the user probably doesn't care as long we make good choices for good accuracy. But finally we want to return it back on the original grid.....is my thought.

Also, from a user perspective, I may want to give a target that is either lower or higher grid density than my morph data. I may also want to have the target narrower or broader than my morph data. I mean, why not, right? If i have a snippet of data in the middle of Q, I could still want to morph to it. So UCs for each of these would be good. We may find that in this latter case, accuracy is low, or that the range of the snippet domain needs to related to the flexibility of the morph. e.g., if we just doing a stretch or a shift the snippet can be short, but if we are doing a 5th order polynomial squeeze this range should be longer. It is not our job to police the users. Whatever software we give them they can use it well or badly, it is up to them, but we can do simple things to help them if they easy and quick to implement.

I think the other morphs are limited to morph and target being on the same grid, so once we implement this for the squeeze I think we have some work to do to generalize them too. But the squeeze is a generalization of the others so once we figure this out, they will be quick.

@Luiskitsu
Copy link
Contributor Author

Hi Simon, thanks for the feedback, this makes more sense. I am and will be editing the UCs using the correct form.
Yes, I think it makes sense that we want to get back the morphed data on the same grid as the input data. I agree that we can make it work under the hood and probably we need some tolerance for accuracy. I’ll make sure the tests reflect that, even if the internal computations involve different Q-grids.

I agree with the UCs you just mentioned, UC3, UC5 and UC7 are examples of broader/narrower grid as well as different grids. We could create a couple of additional UCs specifically for different grid densities (e.g., morphing to a higher-density and a lower-density target), or we could add those conditions to an existing UC like UC3 for simplicity. Let me know what you'd prefer.

I am starting fresh with a new branch and started creating new tests for each of these UCs. I am calling the morph function but the function is empty. So the idea is to have these tests set up and then commit them on a new PR (we can delete the previous one). I won't touch the function until we have all the correct tests, I think that was my mistake, too focused on the function makes me blind. Few questions: Do I need to create a test for the UCs that is the same but with API? I am not sure I understand how the test differs for API vs file. Also, I am writing all the tests in the same python file and I get some warning about duplicate code. Is this okey in the meantime?

Very cool to see how I started with a very specific code for one case and now the code can be used for many more.
I will be at APS all next week for beamtime but if you have time in the near future it might be great if we could meet over zoom.

@Luiskitsu
Copy link
Contributor Author

By the way, for UC7, when the morph data axis is broader compared to the target data, do we also want to return the morphed data on the same axis as the unmorphed?

@Luiskitsu
Copy link
Contributor Author

Quick question: are we making the test/code harder than necessary by trying to unsqueeze the squeezed data, rather than just applying a squeeze directly to a dataset?

Hardcoding the expected morph (by applying a known squeeze) in the test is straightforward, inverting the squeeze polynomial is harder due to the nonlinearity?

@sbillinge
Copy link
Contributor

I agree with the UCs you just mentioned, UC3, UC5 and UC7 are examples of broader/narrower grid as well as different grids. We could create a couple of additional UCs specifically for different grid densities (e.g., morphing to a higher-density and a lower-density target), or we could add those conditions to an existing UC like UC3 for simplicity. Let me know what you'd prefer.

ok, perfect. Simpler is better. However, conditionals don't really work in this for of UC, so it is often easier making a new UC, but it would be short, something like As "UC3.1-3.5 but target is on coarser grid than morph"

@sbillinge
Copy link
Contributor

sbillinge commented Apr 11, 2025

I am starting fresh with a new branch and started creating new tests for each of these UCs. I am calling the morph function but the function is empty. So the idea is to have these tests set up and then commit them on a new PR (we can delete the previous one). I won't touch the function until we have all the correct tests, I think that was my mistake, too focused on the function makes me blind. Few questions: Do I need to create a test for the UCs that is the same but with API? I am not sure I understand how the test differs for API vs file. Also, I am writing all the tests in the same python file and I get some warning about duplicate code. Is this okey in the meantime?

Perfect. The way we can do the tests with mark.parametrize is, as well as passing in the morph parameters, to also pass in parameters for the morph and target range and nsteps. Then make the code in the test as brief and readable as possible such that it builds the target and the expected and gets the acutals and does the asserts. The test may only be a few lines long (a few longer in our case). But the list that is passed to parametrize can become very long with all the different cases we are testing. and comments. Let's put a URL of this issue and refer to the numbered UCs in the comments as we build that list.

The unit tests that we are working on here only test the API. We keep the other UCs for when we build the CLI or GUI.

@sbillinge
Copy link
Contributor

By the way, for UC7, when the morph data axis is broader compared to the target data, do we also want to return the morphed data on the same axis as the unmorphed?

Great question. I think I like the logic that, as a user, I always get back the same thing. As programmers we always have the option of returning a warning or raising an exception if the user asks for something impossible. We can also think of specific UCs to help resolve hard questions. Here would be my UC. Let's say I want to use this to find some two-theta-zero offset (so just a shift morph). I could, in principle compute a target that is a single Bragg peak at the right position, then measure an entire diffraction pattern and apply a shift morph to line up just that Bragg peak. What I would want back is my morphed complete diffraction pattern, not the single Bragg peak. So I think the answer is yes.

@sbillinge
Copy link
Contributor

Quick question: are we making the test/code harder than necessary by trying to unsqueeze the squeezed data, rather than just applying a squeeze directly to a dataset?

Hardcoding the expected morph (by applying a known squeeze) in the test is straightforward, inverting the squeeze polynomial is harder due to the nonlinearity?

I am not sure I completely understand your question. These tests just capture the behavior, so we want a hard-coded starting and ending morph and target which is what we are doing. Then we assert that the expected and the actual are the same within whatever tolerance we choose. Any inversion will happen in the morphing code itself, which we just muck around with until the test passes. We will have to decide there whether to use a linear or quadratic or cubic spline and what fineness of grid to use and what regression engine to use etc. to make it work. That is kind of the point of the tests. They are actually way more important than the code, which is why we are spending so much time on them. Then we have all kinds of flexibility on the numerical side to make different choices. We can also change those things later and every time we make any changes anywhere in the code we will run all the tests and make sure they all still pass after we made that change......

@Luiskitsu
Copy link
Contributor Author

Thanks for all the helpful feedback. I’ve implemented the suggested comments in the tests. I agree that, as users, we generally want the data to be returned on the same grid and within the same range. However, this can require extrapolation in certain UCs. Is that something we actually want to allow from a user perspective? And let them in their own risk?
The alternative is to avoid extrapolation but this means the returned morphed grid and range might differ from the unmorphed input. In this type of scenario, how do we decide which behavior is more useful from the user’s point of view? Personally I would not want to extrapolate my data as this could lead to unphysical values if not careful.

@sbillinge
Copy link
Contributor

Thanks for all the helpful feedback. I’ve implemented the suggested comments in the tests. I agree that, as users, we generally want the data to be returned on the same grid and within the same range. However, this can require extrapolation in certain UCs. Is that something we actually want to allow from a user perspective? And let them in their own risk? The alternative is to avoid extrapolation but this means the returned morphed grid and range might differ from the unmorphed input. In this type of scenario, how do we decide which behavior is more useful from the user’s point of view? Personally I would not want to extrapolate my data as this could lead to unphysical values if not careful.

in real situations the extrapolations are tiny. I wouldn't worry about it and the inconvenience of having every dataset in set of 200 all on a different grid seems like something no-one wants to deal with. The extrapolations will also be highly controlled in most cases, for example a third order polynomial morphing diffraction data over a wide range, very little can go wrong.

How about we store in the morph metadata any extrapolation points and allow the user to get this information. Then if they want to they can remove any extrapolation points themselves (and deal with the consequences of everything on a different grid).

@Luiskitsu
Copy link
Contributor Author

Thanks for all the helpful feedback. I’ve implemented the suggested comments in the tests. I agree that, as users, we generally want the data to be returned on the same grid and within the same range. However, this can require extrapolation in certain UCs. Is that something we actually want to allow from a user perspective? And let them in their own risk? The alternative is to avoid extrapolation but this means the returned morphed grid and range might differ from the unmorphed input. In this type of scenario, how do we decide which behavior is more useful from the user’s point of view? Personally I would not want to extrapolate my data as this could lead to unphysical values if not careful.

in real situations the extrapolations are tiny. I wouldn't worry about it and the inconvenience of having every dataset in set of 200 all on a different grid seems like something no-one wants to deal with. The extrapolations will also be highly controlled in most cases, for example a third order polynomial morphing diffraction data over a wide range, very little can go wrong.

How about we store in the morph metadata any extrapolation points and allow the user to get this information. Then if they want to they can remove any extrapolation points themselves (and deal with the consequences of everything on a different grid).

Your comment makes a lot of sense. I agree that for our diffraction real scenarios the extrapolation will be tinny. I was thinking in the case of a user that is trying to morph something in another field, and lets say he wants to squeeze with a 7th order polynomial, the extrapolation error will become significant. I think your idea of allowing the user to get the extrapolation points and remove them is excellent! And we can also add warnings for high order polynomials.

@sbillinge sbillinge changed the title Add tests for squeeze morph to verify desired behavior across different user cases USE CASES May 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants