Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sktime/sklearn integration? #38

Open
fkiraly opened this issue Oct 17, 2023 · 8 comments
Open

sktime/sklearn integration? #38

fkiraly opened this issue Oct 17, 2023 · 8 comments

Comments

@fkiraly
Copy link

fkiraly commented Oct 17, 2023

@anniegbryant, @benfulcher, I would like to congratulate you to this nice package, I really like the concept and it is quite nicely designed! There are also a lot of useful methods collected! Nice.

Now imo the next "big" question is integrability with the wider modelling ecosystem, e.g., can I use the pairwise time series metrics as components in sktime or sklearn. Where with "I", of course, I mean the wider user ecosystem.

Currently, I think there are a few blockers, but would you be interested to resolve them together?

Two main points imo from the codebase review:

  • sklearn interoperable interfaces expect a few things such as __init__ signature related, and availability of get_params, set_params. You can get this for free by inheriting from scikit-base base classes, of course that's not the only way to satisfy the interface requirements.
  • sktime has related classes which you could adopt or adapt, e.g., the BasePairwiseTransformerPanel. Options could involve, writing an adapter in sktime, or using the class in pyspi, the latter would give you testing for free by using check_estimator. Or, writing your own base class template based on scikit-base that marries the current interface definition with sklearn and sktime expectations.

Side points but synergistic points:

  • testing could - and should - be more systematic for reliable use, e.g., CI on operating system and python version combinations. Happy to help setting this up if we set aside some time. Of course, the "sktime interface" option would take care of this as part of sktime, although bugfixing could become more clunky as we would have to push bug reports upstream (like in pycatch22).
  • a good object/estimator search utility might be nice for the user, there are a lot of implemented objects! We could lift some components from sktime or skbase here.
@benfulcher
Copy link
Collaborator

Thanks @fkiraly for the kind words and enthusiasm! The compliments are best directed at @olivercliff who did the software dev for this project.

I personally don't have the time or python expertise to contribute much to software expansion efforts, but @olivercliff may be able to weigh in on this point. It's possible @anniegbryant may be able to help somewhat but will leave to her…

Ultimately would be great to have a student or keen software dev join the team—e.g., could be a good Google Summer of Code project. Will keep you posted…

@olivercliff
Copy link
Collaborator

Hi @fkiraly, glad to hear you like it! In fact, I designed the code with future integration of the sktime/sklearn framework in mind, which is probably why certain parts of it feel familiar (and hopefully the integration would not be too much of a hassle).

Your two main points, imo, would not only allow integration with sklearn/sktime, but also significantly improve the readability and usability of the standalone package. My thoughts after having a quick look at the code you referenced:

  • The sklearn-base classes might be the more difficult aspect to implement, as it looks like it requires pyspi to handle data differently - is that correct? Many methods store certain results directly in the data object in order to extract statistics from these results later on; otherwise the computation time blows out significantly. I imagine there is a simpler way to achieve this using the sklearn framework but I have not come across it yet.
  • Adopting the BasePairwiseTransformerPanel sounds achievable in a shorter period of time. Moreover, the arguments cover all cases that the methods in pyspi require (e.g., multivariate or bivariate) and extend in useful directions (e.g., handles NaN or not).

I am unfortunately quite short on time these days and don't work directly on the codebase anymore, so I think the idea of a GSoC project, as @benfulcher suggests, is a great way forward.

@bruAristimunha
Copy link

Hey @fkiraly, @benfulcher, @olivercliff!

Has there been any progress on the Google Summer of code? I might be interested in doing the sklearn integration, but I didn't find the project in the sktime projects list.

@fkiraly
Copy link
Author

fkiraly commented Apr 14, 2024

@bruAristimunha, apologies, I did not see this post!

Yes, we have been selected for GSoC 2024, and this would have been an excellent topic!

Unfortunately, the application deadline was April 2.

We could still work on this though?
We have a great (unpaid) mentoring programme!
https://github.com/sktime/mentoring/tree/main

Or perhaps @benfulcher has an academic internship available?

@fkiraly
Copy link
Author

fkiraly commented Apr 14, 2024

@benfulcher, @olivercliff, apologies, I missed the more recent discusion in my inbox.

Let us know if further collaboration here is of interest, we are going to kick off our summer workstreams in May.

@bruAristimunha
Copy link

Hi @fkiraly,

Unfortunately, doing unpaid work this way is not very interesting for me, but I appreciate the answer. It would be a "hard" project, with a lot of code, and a lot of time commitment.

Maybe next year if sktimes is selected.

@fkiraly
Copy link
Author

fkiraly commented Apr 16, 2024

@bruAristimunha, we did get selected 2024, getting paid would have required an application by April 2. Sorry that I did not see this.

How about an alternative idea then, @benfulcher: you (or someone from your team) could present pyspi in one of the sktime meet-ups, these are Fridays 4pm UTC at the moment. There is one free slot on April 26, and most of June is also available.

The aim would be to present pyspi and a potential integration project, I'm sure many members of the community and adjacent listeners would find this interesting, someone might take that up.

@benfulcher
Copy link
Collaborator

Ok sounds good thanks for the invite—would be happy to present pyspi. @jmoo2880 has done a bunch of work on it recently, getting it into a nice format (e.g., now pip installable). Trouble is that 4pm UTC seems to be 2am Sydney time, so it's not going to work at that timing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants