-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache to file and auto-load from cache #169
Comments
Hi @hXtreme, thanks for the great idea! I'm generally open to PR contributions for good ideas, just don't have much time to implement new things myself. On your idea, I have a few questions on how this might work. At a high level, the challenge is to (efficiently) detect when a result is cached versus not cached. You could do this I think at either the level of a pipeline stage (e.g., |
This is what I'd think it works like: # Untested Code
from pathlib import Path
from typing import TypeVar
import dill
import functional as funct
from functional.pipeline import Sequence
T = TypeVar("T")
def with_caching(seq: Sequence[T], file: str, path: str = "~/.cache") -> Sequence[T]:
path = Path(path)
path.mkdir(exist_ok=True)
cache_file = path / file
if cache_file.exists():
cache = dill.load(file=cache_file, ignore=False)
return funct.seq(cache)
else:
results = seq.to_list()
dill.dump(results, file=cache_file)
return seq This is an external function (because I don't know enough about PyFunctional to be able to modify it) It is used as follows: expensive_result = with_caching(s.map(expensive_function), file="expensive_result", path="./experiment-cache") Hope this helps. |
I also like the idea but I think there are some things to take into account:
So the hashing should take into account the input that is processed at the caching point but also the code of the expensive_function. I think JobLib does exactly what you are after taking into account all of the above: https://joblib.readthedocs.io/en/latest/memory.html
|
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
It would be awesome if there was an easy way to cache results to a file and if the cache was found then load from the cache without recomputing.
To basically support the following usecase/api
First run:
Second run (the first run should have created the cache file which will now be used instead of recomputing the sequence
expensive_result
):Please let me know if you have any questions or would like some clarifications.
Loving the project so far, thanks for your effort!
The text was updated successfully, but these errors were encountered: