Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do projections need re-run? (i.e. does the cache need to be cleared) #5

Open
lewisfogden opened this issue Mar 23, 2024 · 1 comment

Comments

@lewisfogden
Copy link
Owner

lewisfogden commented Mar 23, 2024

The current approach in heavylight is that projections run when the instance is created, e.g. if the user model is:

class MyModel(heavylight.Model):
    def <user_method>(self, t):
        return <stuff>

Then when this is run proj1 = MyModel(do_run=True, proj_len=10), the model will be run and results stored in the cache.

Once the projection is run, users can access the values from proj1.<user_method>(t) for individual values; proj1.<user_method>.values for an array, and proj1.ToDataFrame() as the optional way to pull all single parameter values into a dataframe (handy for debugging/viewing as easy to copy into Excel). There is also a sum method on the cache which returns the total, e.g. proj.<user_method>.sum().

The proj_len variable controls how much of the projection is pre-computed (from t=0...proj_len-1), if the user requests a method result from after this then the model will run through all the intermediate calculations and cache these.

e.g.: proj.<user_method>(20) would calculate a further 11 values and cache them (10, 11, ... 20).

Rationale for doing the pre-computing is that the python stack can overload with a lot of recursion.

I initially allowed the cache to be cleared, however I found this risky (I use some proprietary software which doesn't always clear the cache correctly 😯), and instead decided that if a new projection is needed, you should just create a new instance.

@MatthewCaseres
Copy link
Collaborator

MatthewCaseres commented Mar 24, 2024

Either way, same pattern?

If you are running a lot of scenarios, you will have to create many instances. But this will make the memory consumption huge. So you will end up deleting them?

psuedocode

aggregated_results = ...
for scenario in scenarios:
    proj = MyModel(do_run=True, proj_len=10)
    aggregated_results += proj.results
    del proj

And the code is basically the same if you allow clearing the cache, just instead of deleting an instance you will clear the cache?

LightModel

LightModel needs to clear the cache because it does a warmup run and then a real run. So it supports clearing the cache. If you want to have a uniform API between the two classes (maybe they should both inherit from some Abstract base class idk), then you will want to support clearing the cache.

So I'd cast my vote on supporting the clearing of the cache, but also say that it isn't a huge deal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants