-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent results of materials query #964
Comments
Hi @fxcoudert! Just to check --- was this meant for pymatgen or for https://github.com/materialsproject/api? I want to make sure you get the quickest feedback possible. CC @tschaume in case it's relevant to him. |
Thanks @Andrew-S-Rosen! The difference is due to the new ~15k GNoMe materials that are included in the API response if a user accepted its terms on the website. You can set |
The all-knowing Patrick has spoken!! |
Thanks @tschaume. I don't know if I have accepted the new terms or not, but what I am sure is that both queries were made at the same time, with the same function. So whether the terms were accepted or not, shouldn't the numbers be consistent? (169k in the first case, 154 in the second case) PS: how can I check whether I have accepted the new terms or not? I can't seem to find the information in the dashboard for my account. Re. @Andrew-S-Rosen: I have no idea if it is an API bug or a pymatgen bug. I have only queried through the pymatgen functions, not tried directly the API from another code. |
@fxcoudert I agree we should and can make this a lot more transparent. If you see the group @yang-ruoxi We might have to add an explicit line on the dashboard that indicates whether the user has accepted the GNoMe terms or not. @tsmathis Would you mind taking a look at the example code in this issue and see if you can reproduce it? We might have to double-check the |
@tschaume, the results here are reproducible. This is a side effect of user group access control behavior mixing with bulk download behavior in the client. I'll link you my slack messages where I had investigated this a little bit ago, we can discuss from there. |
@fxcoudert I started PR #974 to address this inconsistency. It's still work in progress and will need some data reorg on our end. We're hoping we can get this out with our next data release. |
@tschaume quick question: once I have run a query and gotten structures back, how can I identify if a specific structure is in the gnome dataset or not? I thought it would be somewhere in the metadata, for example as struct.builder_meta.license, but that one always has value 'BY-C' (which is actually weird, cause it's not a valid license code?) |
@fxcoudert both the |
Python version
3.12.8
Pymatgen version
2025.1.24
Operating system version
macOS 15.2
Current behavior
The following code:
returns
Of these, there are 169385 non deprecated materials, as returned by:
This is consistent with the number of the web portal. Good. But now, consider this:
It is exactly the same query, except I ask for all non deprecated by passing
deprecated=False
. But it now returns:Expected Behavior
I expect the two routes to return the same number (and same list) of non deprecated materials.
Minimal example
Relevant files to reproduce this bug
No response
The text was updated successfully, but these errors were encountered: