You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If you want to store the response of a data generator, you can either do that to_disk or not
to_disk=True will save a reference to the object in the ExperimentData.output_data object, and the data will be stored on disk
to_disk=False will save the object to the ExperimentData.output_data object.
See the documentation on ExperimentSample.store() for more information
In order to distinguish if a string as output is a reference or a literal string, the output parameter name is prefixed with path_.
The problem
When accessing a to_disk=True object, the name of the object is changed from <name> to path_<name>. This is confusing. The user of the dataset does not need to know if the output is referenced or not.
Possible solution
In the current version (v1.4.4), the names of the output parameters are stored in the private _Data object as headers of a pandas DataFrame. Pull request #211 already changed this so that the names are stored in a private _Columns object. This object does also hold information about if the output column is a reference or literal.
Upon creation of the output_parameter in the _Data object, this to_disk must be set accordingly and upon accessing the ExperimentSample, the data can be lazy loaded.
The text was updated successfully, but these errors were encountered:
The following solution has been implemented in #216:
Domain object and OutputParameter
The Domain object now keeps track of the output parameters as well in its output_space attribute
For this purpose, I created an OutputParameter class, inherited from Parameter with one attribute to_disk. This indicates if the output parameter is literal or a reference
Changes to ExperimentSample
Whenever the output of your datagenerator is stored with the ExperimentSample.store() method, the ExperimentSample keeps track if you selected to_disk=True or to_disk=False. This information is given to the ExperimentData and the Domain object within ExperimentData is updated accordingly
When you retrieve an ExperimentSample with ExperimentData.get_experiment_sample(index), the information if the output parameters are literal or referenced is passed through the creation of ExperimentSample.
If you call the property ExperimentSample.output_data, the loaded object is returned (this is different from previous versions). The alias .output_data_loaded does the same and is kept for backwards compatibility.
If you want to get the references, you can use ExperimentSample.output_data_with_references
Scenario
If you want to store the response of a data generator, you can either do that
to_disk
or notto_disk=True
will save a reference to the object in theExperimentData.output_data
object, and the data will be stored on diskto_disk=False
will save the object to theExperimentData.output_data
object.In order to distinguish if a string as output is a reference or a literal string, the output parameter name is prefixed with
path_
.The problem
When accessing a
to_disk=True
object, the name of the object is changed from<name>
topath_<name>
. This is confusing. The user of the dataset does not need to know if the output is referenced or not.Possible solution
In the current version (v1.4.4), the names of the output parameters are stored in the private
_Data
object as headers of a pandas DataFrame. Pull request #211 already changed this so that the names are stored in a private_Columns
object. This object does also hold information about if the output column is a reference or literal.Upon creation of the output_parameter in the
_Data
object, this to_disk must be set accordingly and upon accessing the ExperimentSample, the data can be lazy loaded.The text was updated successfully, but these errors were encountered: