Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inhomogeneous arrays from data_generator.py #35

Open
charlottem-rna opened this issue Nov 22, 2024 · 1 comment
Open

Inhomogeneous arrays from data_generator.py #35

charlottem-rna opened this issue Nov 22, 2024 · 1 comment

Comments

@charlottem-rna
Copy link

There is an error when I try to test training on TS1 and TS2. It seems like data_list[0].pairs and data_list[1].pairs can't be concatenated correctly:

$ python ufold_train.py --train_files TS1 TS2
#####Stage 1#####
Loading dataset:  TS1
Loading dataset:  TS2
Data Loading Done!!!
Traceback (most recent call last):
  File "/data/RUDD/RUDD/2d-prediction/charlotte/UFold/ufold_train.py", line 220, in <module>
    main()
  File "/data/RUDD/RUDD/2d-prediction/charlotte/UFold/ufold_train.py", line 189, in main
    train_merge = Dataset_FCN_merge(train_data_list)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/RUDD/RUDD/2d-prediction/charlotte/UFold/ufold/data_generator.py", line 612, in __init__
    self.data = self.merge_data(data_list)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/RUDD/RUDD/2d-prediction/charlotte/UFold/ufold/data_generator.py", line 625, in merge_data
    self.data2.pairs = np.concatenate((data_list[0].pairs,data_list[1].pairs),axis=0)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<__array_function__ internals>", line 200, in concatenate
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (60,) + inhomogeneous part.

Can this be resolved?

@sperfu
Copy link
Contributor

sperfu commented Nov 23, 2024

Hi,

The issue seems to stem from mismatched dimensions in the self.data_x and self.data_pairs attributes within the data generator class. This likely arises because the pairs arrays in data_list[0] and data_list[1] have different dimensions or structures, making them incompatible for concatenation.

To resolve this, I recommend examining the dimensions and structure of self.data_x and self.data_pairs in the two datasets (TS1 and TS2). Ensure that both datasets are preprocessed so their dimensions are aligned and compatible. You may need to modify the preprocessing pipeline or the merging logic to unify the data shapes.

We are currently revising this part of the code to enhance compatibility with inputs of varying dimensions. Hopefully, these updates will address such issues in future iterations.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants