-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dict memoization sorting before normalizing keys is unsafe #2140
Comments
crossref #1591 which also talks about function memoization |
Hey @benclifford I'd like to learn more about memoization and work on this issue. |
@MundiaNderi ok, some places to start: do the parsl tutorial to get a basic understanding of what we're trying to do here. that's linked from the front page of parsl-project.org and is: https://mybinder.org/v2/gh/Parsl/parsl-tutorial/master Theres a section in the user guide https://parsl.readthedocs.io/en/stable/userguide/checkpoints.html - the words "memoization", "caching" and "checkpointing" are all used fairly interchangeably (although they have some subtle differences) - this section will give you some background information. There are some test cases that you can look at the source code for and check you can run them:
and
That should give you some understanding of the background for this issue and the sort of things we test here. Then some basic development sequence might be: i) make a new test case that fails, using pytest |
note from Matthias Diener that i'm adding here for future reference
In particular, for dictionaries and sets, KeyBuilder uses an unordered hashing |
This line struck me as odd:
parsl/parsl/dataflow/memoization.py
Line 89 in 1449ea7
My first thought was that it might not give you a stable sort if objects are hashable (can be used in a dict), but define something meaningless for comparison (my thought was comparison of memory addresses).
The case I found when trying to exploit this to make something unstable is a type which is hashable but raises an error on comparison. Functions:
To show that this doesn't only happen in "silly" cases, I also tried out
enum.auto
, and found that it also is hashable but not comparable:In theory we might be able to find a type which does define comparison, but does it in an unreliable way which is tied to the state of the current process. I think given the presence of types which define
__hash__
but not__lt__
, we should assume that such objects exist.Resolution
To solve this, I believe that all we need to do is do the key normalization step first, saving the normalization results in a temporary dict, and then operate on the normed keyspace.
Something to the effect of
The text was updated successfully, but these errors were encountered: