Originally posted by LeonardoEssence September 21, 2023
I was running some of the AutoML examples on the documentation here, and the code for all time series examples kept breaking at a pandas key error prompt. See below:
Traceback (most recent call last): File "/mnt/uni_variate_time_series_flaml.py", line 30, in <module> automl.fit(dataframe=train_df, # training data File "/opt/conda/lib/python3.9/site-packages/flaml/automl/automl.py", line 1663, in fit task.validate_data( File "/opt/conda/lib/python3.9/site-packages/flaml/automl/task/time_series_task.py", line 167, in validate_data data = TimeSeriesDataset( File "/opt/conda/lib/python3.9/site-packages/flaml/automl/time_series/ts_data.py", line 57, in __init__ self.frequency = pd.infer_freq(train_data[time_col].unique()) File "/opt/conda/lib/python3.9/site-packages/pandas/core/frame.py", line 3505, in __getitem__ indexer = self.columns.get_loc(key) File "/opt/conda/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3623, in get_loc raise KeyError(key) from err KeyError: 'index'
I went deep into the code and found what I believe is a small bug in the class TimeSeriesTask, when calling the function TimeSeriesDataset in line 167 in the file time_series_task.py.
The function is expecting a data frame with train data and the time stamp vector, however, the code in line 165, is only concatenating Xt and yt, leaving out the time vector.
I propose to change line 165 from df_t = pd.concat([Xt, yt], axis=1) to df_t = pd.concat([pre_data.all_data[pre_data.time_col], Xt, yt], axis=1). That worked for me, however, I'm not 100% sure that's the intended functionality but as it is now, it is not working.
Is anybody finding the same? or can provide some suggestions?
Discussed in #1224
Originally posted by LeonardoEssence September 21, 2023
I was running some of the AutoML examples on the documentation here, and the code for all time series examples kept breaking at a pandas
key errorprompt. See below:Traceback (most recent call last): File "/mnt/uni_variate_time_series_flaml.py", line 30, in <module> automl.fit(dataframe=train_df, # training data File "/opt/conda/lib/python3.9/site-packages/flaml/automl/automl.py", line 1663, in fit task.validate_data( File "/opt/conda/lib/python3.9/site-packages/flaml/automl/task/time_series_task.py", line 167, in validate_data data = TimeSeriesDataset( File "/opt/conda/lib/python3.9/site-packages/flaml/automl/time_series/ts_data.py", line 57, in __init__ self.frequency = pd.infer_freq(train_data[time_col].unique()) File "/opt/conda/lib/python3.9/site-packages/pandas/core/frame.py", line 3505, in __getitem__ indexer = self.columns.get_loc(key) File "/opt/conda/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3623, in get_loc raise KeyError(key) from err KeyError: 'index'I went deep into the code and found what I believe is a small bug in the class
TimeSeriesTask, when calling the functionTimeSeriesDatasetin line 167 in the file time_series_task.py.The function is expecting a data frame with train data and the time stamp vector, however, the code in line 165, is only concatenating
Xtandyt, leaving out the time vector.I propose to change line 165 from
df_t = pd.concat([Xt, yt], axis=1)todf_t = pd.concat([pre_data.all_data[pre_data.time_col], Xt, yt], axis=1). That worked for me, however, I'm not 100% sure that's the intended functionality but as it is now, it is not working.Is anybody finding the same? or can provide some suggestions?