Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the naming discrepancy (path_) of referenced to_disk=True object #212

Closed
mpvanderschelling opened this issue Oct 31, 2023 · 1 comment
Assignees

Comments

@mpvanderschelling
Copy link
Collaborator

Scenario

If you want to store the response of a data generator, you can either do that to_disk or not

  • to_disk=True will save a reference to the object in the ExperimentData.output_data object, and the data will be stored on disk
  • to_disk=False will save the object to the ExperimentData.output_data object.

See the documentation on ExperimentSample.store() for more information

In order to distinguish if a string as output is a reference or a literal string, the output parameter name is prefixed with path_.

The problem

When accessing a to_disk=True object, the name of the object is changed from <name> to path_<name>. This is confusing. The user of the dataset does not need to know if the output is referenced or not.

Possible solution

In the current version (v1.4.4), the names of the output parameters are stored in the private _Data object as headers of a pandas DataFrame. Pull request #211 already changed this so that the names are stored in a private _Columns object. This object does also hold information about if the output column is a reference or literal.

Upon creation of the output_parameter in the _Data object, this to_disk must be set accordingly and upon accessing the ExperimentSample, the data can be lazy loaded.

@mpvanderschelling
Copy link
Collaborator Author

The following solution has been implemented in #216:

Domain object and OutputParameter

  • The Domain object now keeps track of the output parameters as well in its output_space attribute
  • For this purpose, I created an OutputParameter class, inherited from Parameter with one attribute to_disk. This indicates if the output parameter is literal or a reference

Changes to ExperimentSample

  • Whenever the output of your datagenerator is stored with the ExperimentSample.store() method, the ExperimentSample keeps track if you selected to_disk=True or to_disk=False. This information is given to the ExperimentData and the Domain object within ExperimentData is updated accordingly
  • When you retrieve an ExperimentSample with ExperimentData.get_experiment_sample(index), the information if the output parameters are literal or referenced is passed through the creation of ExperimentSample.
  • If you call the property ExperimentSample.output_data, the loaded object is returned (this is different from previous versions). The alias .output_data_loaded does the same and is kept for backwards compatibility.
  • If you want to get the references, you can use ExperimentSample.output_data_with_references

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant