Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What utility/support items do we need? #6

Open
lewisfogden opened this issue Mar 23, 2024 · 4 comments
Open

What utility/support items do we need? #6

lewisfogden opened this issue Mar 23, 2024 · 4 comments
Labels
enhancement New feature or request question Further information is requested

Comments

@lewisfogden
Copy link
Owner

A few topics

Inputs

  • preparing inputs (probably a simple dataframe converter)
  • validation (would need to specify the datatype of each input somewhere in the model?)

Outputs

  • function to extract one model point from a vectorised run (mostly written)
  • function to summarise all model points (mostly written)
  • function to extract specific variables, either aggregated. (pandas.DataFrame.agg style?)
  • exporter that saves to Excel, and includes function definition as a comment/note.

Examples

  • function to generate a new run folder containing demo/examples models/model templates (e.g. heavylight.demo.create_sample('numpy_template', 'path/to/folder')
@lewisfogden lewisfogden added enhancement New feature or request question Further information is requested labels Mar 23, 2024
@MatthewCaseres
Copy link
Collaborator

MatthewCaseres commented Mar 24, 2024

How it (sort of) currently works in heavylight.LightModel

To provide users a dataframe that is processed, you can allow them to apply a custom function to the results.

model = Model(do_run=True, proj_len=10) # or whatever
model.ToDataFrame() # no aggregation or processing
model.ToDataFrame(lambda x: np.sum(x))
model.ToDataFrame(lambda x: x[0] if isinstance(x, np.ndarray) else x) # get the first value from arrays

I'm partial to this approach because it is flexible and rewards users for having a working knowledge of Python. They can select the tenth policy for example, which you can't do in cashflower's API at the moment

@variable(aggregation_type="first")
def my_variable(t):
    ...

Implementation details

The internal representation can be a dictionary.

self.results =  {'num_pols_if': {0: np.array([0.1  , 0.325, 0.55 , 0.775, 1.   ]),
              1: np.array([0.099  , 0.32175, 0.5445 , 0.76725, 0.99   ]),
              2: np.array([0.09801  , 0.3185325, 0.539055 , 0.7595775, 0.9801   ]),
              3: np.array([0.0970299 , 0.31534718, 0.53366445, 0.75198173, 0.970299  ])},
             'cashflow': {0: np.array([ 10. ,  32.5,  55. ,  77.5, 100. ]),
              1: np.array([ 9.9  , 32.175, 54.45 , 76.725, 99.   ]),
              2: np.array([ 9.801  , 31.85325, 53.9055 , 75.95775, 98.01   ]),
              3: np.array([ 9.70299  , 31.5347175, 53.366445 , 75.1981725, 97.0299   ])},
             'pols_death': {0: np.array([0.001  , 0.00325, 0.0055 , 0.00775, 0.01   ]),
              1: np.array([0.00099  , 0.0032175, 0.005445 , 0.0076725, 0.0099   ]),
              2: np.array([0.0009801 , 0.00318533, 0.00539055, 0.00759577, 0.009801  ]),
              3: np.array([0.0009703 , 0.00315347, 0.00533664, 0.00751982, 0.00970299])},
             't': {0: 0, 1: 1, 2: 2, 3: 3}}

You can write yourself a utility to apply a function to all of the values and return a dataframe. By default, it does no aggregation and will return everything as is

def ToDataFrame(self, applied_function = lambda x: x):
    return pd.DataFrame({k: {t: applied_function(v) for t, v in d.items()} for k, d in self.results.items()})

When the user asks for the dataframe by default everything is in it

model.ToDataFrame()
num_pols_if cashflow pols_death t
0 [0.1 0.325 0.55 0.775 1. ] [ 10. 32.5 55. 77.5 100. ] [0.001 0.00325 0.0055 0.00775 0.01 ] 0
1 [0.099 0.32175 0.5445 0.76725 0.99 ] [ 9.9 32.175 54.45 76.725 99. ] [0.00099 0.0032175 0.005445 0.0076725 0.0099 ] 1
2 [0.09801 0.3185325 0.539055 0.7595775 0.9801 ] [ 9.801 31.85325 53.9055 75.95775 98.01 ] [0.0009801 0.00318533 0.00539055 0.00759577 0.009801 ] 2
3 [0.0970299 0.31534718 0.53366445 0.75198173 0.970299 ] [ 9.70299 31.5347175 53.366445 75.1981725 97.0299 ] [0.0009703 0.00315347 0.00533664 0.00751982 0.00970299] 3
model.ToDataFrame(lambda x: np.sum(x))
num_pols_if cashflow pols_death t
0 2.75 275 0.0275 0
1 2.7225 272.25 0.027225 1
2 2.69527 269.528 0.0269528 2
3 2.66832 266.832 0.0266832 3
model.ToDataFrame(lambda x: x[-1] if isinstance(x, np.ndarray) else x)
num_pols_if cashflow pols_death t
0 1 100 0.01 0
1 0.99 99 0.0099 1
2 0.9801 98.01 0.009801 2
3 0.970299 97.0299 0.00970299 3

Edit: if this proposed API is reasonable, we can cross off the first three tasks of your outputs list and migrate them to a separate ticket?

@lewisfogden
Copy link
Owner Author

I like this - maybe we need a few helper functions (mainly for vectorised libraries). I might do a utils sub-package (openpyxl style)

@MatthewCaseres
Copy link
Collaborator

I sort of don't want to even give the users a ToDataFrame. Just give the dictionary and the utility functions. A documentation page on how to write their own utility. That way the class API stays minimal and it stays more in basic Python datastructures without having to explain some aggfunc api. Just have to explain to users that they need to learn Python or use AI coding or something to work with a dict[str, dict[int, np.ndarray]]

@MatthewCaseres
Copy link
Collaborator

What needs to happen for this ticket to get closed? Some of these items might only be done on LightModel.

Inputs

  • preparing inputs (probably a simple dataframe converter)

  • validation (would need to specify the datatype of each input somewhere in the model?)
    Outputs

  • (LightModel) function to extract one model point from a vectorised run (mostly written)

  • (LightModel) function to summarise all model points (mostly written)

  • (LightModel) function to extract specific variables, either aggregated. (pandas.DataFrame.agg style?)

  • exporter that saves to Excel, and includes function definition as a comment/note.
    Examples

  • function to generate a new run folder containing demo/examples models/model templates (e.g. heavylight.demo.create_sample('numpy_template', 'path/to/folder')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants