Remove the naming discrepancy (`path_`) of referenced to_disk=True object #212

mpvanderschelling · 2023-10-31T19:28:02Z

Scenario

If you want to store the response of a data generator, you can either do that to_disk or not

to_disk=True will save a reference to the object in the ExperimentData.output_data object, and the data will be stored on disk
to_disk=False will save the object to the ExperimentData.output_data object.

See the documentation on ExperimentSample.store() for more information

In order to distinguish if a string as output is a reference or a literal string, the output parameter name is prefixed with path_.

The problem

When accessing a to_disk=True object, the name of the object is changed from <name> to path_<name>. This is confusing. The user of the dataset does not need to know if the output is referenced or not.

Possible solution

In the current version (v1.4.4), the names of the output parameters are stored in the private _Data object as headers of a pandas DataFrame. Pull request #211 already changed this so that the names are stored in a private _Columns object. This object does also hold information about if the output column is a reference or literal.

Upon creation of the output_parameter in the _Data object, this to_disk must be set accordingly and upon accessing the ExperimentSample, the data can be lazy loaded.

The text was updated successfully, but these errors were encountered:

…ject Fixes #212

mpvanderschelling · 2023-11-02T21:00:29Z

The following solution has been implemented in #216:

Domain object and OutputParameter

The Domain object now keeps track of the output parameters as well in its output_space attribute
For this purpose, I created an OutputParameter class, inherited from Parameter with one attribute to_disk. This indicates if the output parameter is literal or a reference

Changes to ExperimentSample

Whenever the output of your datagenerator is stored with the ExperimentSample.store() method, the ExperimentSample keeps track if you selected to_disk=True or to_disk=False. This information is given to the ExperimentData and the Domain object within ExperimentData is updated accordingly
When you retrieve an ExperimentSample with ExperimentData.get_experiment_sample(index), the information if the output parameters are literal or referenced is passed through the creation of ExperimentSample.
If you call the property ExperimentSample.output_data, the loaded object is returned (this is different from previous versions). The alias .output_data_loaded does the same and is kept for backwards compatibility.
If you want to get the references, you can use ExperimentSample.output_data_with_references

mpvanderschelling self-assigned this Nov 2, 2023

mpvanderschelling added a commit that referenced this issue Nov 2, 2023

Remove the naming discrepancy (path_) of referenced to_disk=True ob…

03b861b

…ject Fixes #212

mpvanderschelling mentioned this issue Nov 2, 2023

Remove the naming discrepancy (path_) of referenced to_disk=True object #216

Merged

mpvanderschelling closed this as completed Nov 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove the naming discrepancy (`path_`) of referenced to_disk=True object #212

Remove the naming discrepancy (`path_`) of referenced to_disk=True object #212

mpvanderschelling commented Oct 31, 2023

mpvanderschelling commented Nov 2, 2023

Remove the naming discrepancy (path_) of referenced to_disk=True object #212

Remove the naming discrepancy (path_) of referenced to_disk=True object #212

Comments

mpvanderschelling commented Oct 31, 2023

Scenario

The problem

Possible solution

mpvanderschelling commented Nov 2, 2023

Domain object and OutputParameter

Changes to ExperimentSample

Remove the naming discrepancy (`path_`) of referenced to_disk=True object #212

Remove the naming discrepancy (`path_`) of referenced to_disk=True object #212