Creating variants of the _Data object to allow gradient tracking and avoid unnecessary casting to numpy #208

mpvanderschelling · 2023-10-30T14:57:11Z

The problem

At the moment, the ExperimentData object consists of input_data, output_data, jobs, and domain. These are all custom objects that are private (except the Domain) object:

domain: f3dasm.design.Domain (public!)
input_data: f3dasm._src.design._data._Data
output_data: f3dasm._src.design._data._Data
jobs: f3dasm._src.design._jobqueue._JobQueue

Focussing on the input_data, any data (e.g. pd.DataFrame, numpy array, csv-file) that is given to ExperimentData will be converted to the _Data object. The _Data object back-end is pandas. This means that internally the data will be casted to something that is compatible with pandas datastorage; numpy

For automated differentiation tools this might be problematic, since the gradient needs to be 'tracked'. Any casting to numpy will break the chain.

In v1.4.3, we are using autograd.numpy to track these gradients and for tensorflow optimizers a conversion function will provide the 'custom gradient' so that it works with casting to numpy.

Additionally, optimized libraries will experience overhead costs when doing this conversion back an forth between e.g. jax arrays and numpy arrays

Proposal

Because the ExperimentData object is only depending on _Data and not directly on a pandas DataFrame, we can create a variant of the _Data object for any underlying datatype (e.g. a dictionary of tensorflow tensors).
We need to implement all the methods of the _Data object for that particular datatype.

Then, the user can choose upon creation of the ExperimentData object if they want to use the 'normal' backend (e.g. pandas/numpy) or any specialized backend (e.g. tensorflow, pytorch, jax).

This could also be inferred automatically when providing initial input_data.

First steps

This issue will investigate if we can implement this by starting with a _Data variant that works with an jax dataformat.

The text was updated successfully, but these errors were encountered:

mpvanderschelling · 2023-10-30T14:58:02Z

@SNMS95 ; I created an issue that might be relevant for your application with f3dasm. Feel free to add things here that might address this issue!

mpvanderschelling self-assigned this Oct 30, 2023

mpvanderschelling mentioned this issue Oct 31, 2023

Mpvanderschelling/issue208 #211

Merged

mpvanderschelling linked a pull request Oct 31, 2023 that will close this issue

Mpvanderschelling/issue208 #211

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Creating variants of the _Data object to allow gradient tracking and avoid unnecessary casting to numpy #208

Creating variants of the _Data object to allow gradient tracking and avoid unnecessary casting to numpy #208

mpvanderschelling commented Oct 30, 2023

mpvanderschelling commented Oct 30, 2023

Creating variants of the _Data object to allow gradient tracking and avoid unnecessary casting to numpy #208

Creating variants of the _Data object to allow gradient tracking and avoid unnecessary casting to numpy #208

Comments

mpvanderschelling commented Oct 30, 2023

The problem

Proposal

First steps

mpvanderschelling commented Oct 30, 2023