Use Botorch MultiTaskGP for transfer learning #549

AVHopp · 2025-05-07T07:39:26Z

Replaces the custom IndexKernel construction with BoTorch's MultiTaskGP (which became possible due the added all_tasks argument).

AdrianSosic

Hi @Hrovatin, here the first batch of comments

CHANGELOG.md

baybe/surrogates/gaussian_process/core.py

tests/test_transfer_learning.py

Scienfitz · 2025-08-15T10:26:31Z

@Hrovatin would you consider abandoning this PR? I think if this topic is picked up again its better to start afresh (and only open a PR after investigations have concluded).

Hrovatin · 2025-08-23T16:51:50Z

@Scienfitz I would keep open as the main blocker for this was randomness in benchmarks. Since that may be solved now I would suggest running benchmarks again on the new HPC (need to confirm it is also reproducible there)

Scienfitz · 2025-09-09T07:50:50Z

@Hrovatin any update?

Hrovatin · 2025-09-09T08:16:53Z

No, I need to first set up testing on oneHPC to reproducibly benchmark - as that seems to be the only option to make fully reproducible. I will post update here once I have the results @Scienfitz

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Hrovatin · 2025-09-17T10:26:47Z

@AdrianSosic @Scienfitz @AVHopp Update on the comparison of MultiTask GP from botorch and current kernel:

The results are not identical, but very close, except for michaelewicz (but it seems that variation is likely not significant here as well)
A concern: When using botorch multitask gp the hartman tl benchmark always fails due to ooo (when using at 0.05 but not 0.01 source data). I have not yet figured out why. Before investigating this we should probably make a call if we are ok with accepting some deviation from current main (named benchmarks-reproducibility-beforeBug on the plot) or not as if we decide we need 100% reproducibility anyways it also does not make sense to investigate any other issues further.

AVHopp

First round of comments, but we should discuss some of the points (in particular the one regarding multiple active values) internally first.

baybe/parameters/categorical.py

baybe/surrogates/gaussian_process/core.py

AVHopp

Would be willing to approve - however, since this is technically my PR I can't

.github/workflows/benchmark.yml

baybe/surrogates/gaussian_process/core.py

tests/test_transfer_learning.py

baybe/surrogates/gaussian_process/core.py

Hrovatin · 2025-10-02T07:21:29Z

Results after rebase:
Note:

Hartmann tl did not run for the new branch due to issues in local setup (shows only the main branch). But I tested that it runs successfully in actions
Reproducibility is in general not 100% (also when not using the tl code that was changed)

.github/workflows/benchmark.yml

Scienfitz

lets do the final check referenced here and merge if everything is alright

tests/test_transfer_learning.py

Co-authored-by: AdrianSosic <[email protected]>

…lues

…happen

Co-authored-by: Alexander V. Hopp <[email protected]>

Hrovatin · 2025-10-14T05:26:10Z

@AdrianSosic even after the new rebase, this branch and main still differ even in the naive case (when no TaskParam/Kernerl is used)
Here is the info about the branches I used (both from 13. 10. 2025)
Main last commit:

commit 823558afa173b5cfaecb23c6cb5431ef7e28088b (HEAD -> main, origin/main, origin/HEAD)
Merge: 48046272c 4206a24da
Author: Martin Fitzner <[email protected]>
Date:   Thu Oct 9 15:54:14 2025 +0200

This branch PR last commit:

commit 075c33322bb36051cdf5efd6d7ee672621edc5a7 (HEAD -> tl_benchmarking_investigation, origin/tl_benchmarking_investigation)
Author: AdrianSosic <[email protected]>
Date:   Thu Oct 9 13:16:08 2025 +0200

Hrovatin · 2025-10-15T05:33:15Z

@AVHopp and @AdrianSosic the issue is indeed the handling of active dimensions in base kernel (as speculated here).

Changing the active dims as in this branch ensures that in naive case the outcome matches exactly the main branch, while in non-naive case it still stays as was this branch.

These are equality analysis for Michalewicz (as that one was usually most obviously discrepant) - output True means equal:

tl-benchmarking-investigation-activeDims and main in naive True:
True
tl-benchmarking-investigation-activeDims and main in naive False:
False
tl-benchmarking-investigation-activeDims and tl-benchmarking-investigation in naive True:
False
tl-benchmarking-investigation-activeDims and tl-benchmarking-investigation in naive False:
True
main and tl-benchmarking-investigation in naive True:
False
main and tl-benchmarking-investigation in naive False:
False

AdrianSosic · 2025-10-15T05:52:36Z

baybe/surrogates/gaussian_process/core.py

+        covar_module = kernel.to_gpytorch(
+            ard_num_dims=kernel_num_dims,
            batch_shape=batch_shape,
+            active_dims=tuple(range(kernel_num_dims)),


Interesting 🤔 Can you explain how exactly I have to understand these prints?

@AVHopp and @AdrianSosic the issue is indeed the handling of active dimensions in base kernel (as speculated here).

Changing the active dims as in this branch ensures that in naive case the outcome matches exactly the main branch, while in non-naive case it still stays as was this branch.

These are equality analysis for Michalewicz (as that one was usually most obviously discrepant) - output True means equal:

tl-benchmarking-investigation-activeDims and main in naive True: True tl-benchmarking-investigation-activeDims and main in naive False: False tl-benchmarking-investigation-activeDims and tl-benchmarking-investigation in naive True: False tl-benchmarking-investigation-activeDims and tl-benchmarking-investigation in naive False: True main and tl-benchmarking-investigation in naive True: False main and tl-benchmarking-investigation in naive False: False

Branches

tl-benchmarking-investigation-activeDims - branch where I specify active dims directly

tuple(range(train_x.shape[-1] - context.n_task_dimensions))

tl-benchmarking-investigation - version of this branch before the last commit where I added active_dims. To my understanding based on Gpytorch they should be the same as inputing None just uses all dimensions. Note that in BoTorch MultiTaskGP data is split up into non-task and task parts so in both in TL and naive case (with SingleTaskGP and no Task kernel) one would just use integers [0:n-1] for the base_kernel.

main - main branch with last comit from 9. 10.

Naive

naive =True uses no task, not TL

naive = False uses task, TL

The True/False printed below each line is whether the two branches had exactly the same CumBest result in either naive or TL setting.

All comparisons were done on full Michlewicz domain benchmark

You can see that tl-benchmarking-investigation-activeDims retains performance of main in naive setting while matching the current tl-benchmarking-investigation branch in TL setting. I am not sure why setting the active_dims seems to make a difference only in the naive setting (and not in TL). But at least we now have full reproducibility for the naive setting. For the TL setting it is still not 100% reproducible with main (also in general due to using MultiTaskGP), but I think we already decided this is acceptable.

AVHopp requested review from AdrianSosic and Scienfitz as code owners May 7, 2025 07:39

AVHopp assigned AVHopp and Hrovatin May 7, 2025

AVHopp marked this pull request as draft May 7, 2025 07:40

AVHopp mentioned this pull request May 7, 2025

Use Botorch MultiTaskGP for transfer learning #484

Closed

3 tasks

AVHopp changed the title ~~Tl benchmarking investigation~~ Use Botorch MultiTaskGP for transfer learning May 7, 2025

Hrovatin force-pushed the tl_benchmarking_investigation branch 2 times, most recently from 8fee382 to 88e1dfe Compare June 4, 2025 11:18

Hrovatin marked this pull request as ready for review June 5, 2025 10:39

AdrianSosic reviewed Jun 6, 2025

View reviewed changes

Hrovatin requested a review from AdrianSosic June 6, 2025 14:40

Copilot AI review requested due to automatic review settings September 12, 2025 11:22

Hrovatin force-pushed the tl_benchmarking_investigation branch from 8ce5fba to bee32aa Compare September 12, 2025 11:22

Copilot AI reviewed Sep 12, 2025

View reviewed changes

AVHopp commented Sep 22, 2025

View reviewed changes

baybe/parameters/categorical.py Show resolved Hide resolved

baybe/surrogates/gaussian_process/core.py Outdated Show resolved Hide resolved

Hrovatin force-pushed the tl_benchmarking_investigation branch from de81707 to 68a9c24 Compare September 25, 2025 07:13

AVHopp commented Sep 30, 2025

View reviewed changes

Hrovatin reviewed Oct 2, 2025

View reviewed changes

.github/workflows/benchmark.yml Outdated Show resolved Hide resolved

AdrianSosic force-pushed the tl_benchmarking_investigation branch 4 times, most recently from 5cfb366 to 7bb49d9 Compare October 6, 2025 08:50

AdrianSosic approved these changes Oct 6, 2025

View reviewed changes

Scienfitz approved these changes Oct 8, 2025

View reviewed changes

tests/test_transfer_learning.py Outdated Show resolved Hide resolved

Hrovatin and others added 23 commits October 9, 2025 11:26

Replace SingleTaskGP+IndexKernel with MultiTaskGP for transfer learning

42a02dd

Expand tests with different types of available data

a5c459c

Improve the approach used to get the single task parameter

771527f

Co-authored-by: AdrianSosic <[email protected]>

Correct anti-pattern in test

ce6bc98

Co-authored-by: AdrianSosic <[email protected]>

Implement review comments

e4718c7

Clarify why single active value is required

00cdab3

Improve test description

8cd03d7

Change TaskParameter computational representation to int

ac3e50a

Remove int conversion for new int-based comp_df of task parameter

cc8a7d0

Add test for transfer learning with multiple active task parameter va…

7aff8ac

…lues

Remove constraint to use single active task parameter value

c451e7a

Update tests and assert that multiple active values are recommended

ecf452e

Remove mypy errors

5d1f275

Remove check that both tasks were recommended as this may not always …

8bb4853

…happen

Update baybe/surrogates/gaussian_process/core.py

95db096

Co-authored-by: Alexander V. Hopp <[email protected]>

Update tests/test_transfer_learning.py

d4f3a0b

Co-authored-by: Alexander V. Hopp <[email protected]>

Remove unnecessary comments

6c2314b

Clarify tests

7c8b4df

Reuse parent method for integer casting

675ec05

Add temporary _task_parameter property to SearchSpace class

45ee196

Refactor GP fitting method

86d152e

Refactor transfer learning tests using parametrization/fixtures

768b2f4

Update CHANGELOG.md

6c3dd93

AdrianSosic force-pushed the tl_benchmarking_investigation branch from 953b609 to 6c3dd93 Compare October 9, 2025 09:26

Use parametrization instead of request

075c333

Directly specify active_dims in kernel

7a91412

AdrianSosic reviewed Oct 15, 2025

View reviewed changes

Use Botorch MultiTaskGP for transfer learning #549

Are you sure you want to change the base?

Use Botorch MultiTaskGP for transfer learning #549

Conversation

AVHopp commented May 7, 2025 • edited by AdrianSosic Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AdrianSosic left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Scienfitz commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Hrovatin commented Aug 23, 2025

Uh oh!

Scienfitz commented Sep 9, 2025

Uh oh!

Hrovatin commented Sep 9, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Hrovatin commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AVHopp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

AVHopp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Hrovatin commented Oct 2, 2025

Uh oh!

Uh oh!

Scienfitz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Hrovatin commented Oct 14, 2025

Uh oh!

Hrovatin commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AdrianSosic Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

Hrovatin Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

AVHopp commented May 7, 2025 •

edited by AdrianSosic

Loading

Scienfitz commented Aug 15, 2025 •

edited

Loading

Hrovatin commented Sep 17, 2025 •

edited

Loading

Hrovatin commented Oct 15, 2025 •

edited

Loading