Skip to content
New issue

Have a question about this project?Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of serviceand privacy statement.We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sktime/sklearn integration? #38

Open
fkiralyopened this issue Oct 17, 2023 · 8 comments
Open

sktime/sklearn integration? #38

fkiralyopened this issue Oct 17, 2023 · 8 comments

Comments

@fkiraly
Copy link

fkiraly commented Oct 17, 2023

@anniegbryant,@benfulcher,I would like to congratulate you to this nice package, I really like the concept and it is quite nicely designed! There are also a lot of useful methods collected! Nice.

Now imo the next "big" question is integrability with the wider modelling ecosystem, e.g., can I use the pairwise time series metrics as components insktimeorsklearn.Where with "I", of course, I mean the wider user ecosystem.

Currently, I think there are a few blockers, but would you be interested to resolve them together?

Two main points imo from the codebase review:

  • sklearninteroperable interfaces expect a few things such as__init__signature related, and availability ofget_params,set_params.You can get this for free by inheriting fromscikit-basebase classes, of course that's not the only way to satisfy the interface requirements.
  • sktimehas related classes which you could adopt or adapt, e.g., theBasePairwiseTransformerPanel.Options could involve, writing an adapter insktime,or using the class inpyspi,the latter would give you testing for free by usingcheck_estimator.Or, writing your own base class template based onscikit-basethat marries the current interface definition withsklearnandsktimeexpectations.

Side points but synergistic points:

  • testing could - and should - be more systematic for reliable use, e.g., CI on operating system and python version combinations. Happy to help setting this up if we set aside some time. Of course, the "sktime interface" option would take care of this as part of sktime, although bugfixing could become more clunky as we would have to push bug reports upstream (like inpycatch22).
  • a good object/estimator search utility might be nice for the user, there are a lot of implemented objects! We could lift some components fromsktimeorskbasehere.
@benfulcher
Copy link
Collaborator

Thanks@fkiralyfor the kind words and enthusiasm! The compliments are best directed at@olivercliffwho did the software dev for this project.

I personally don't have the time or python expertise to contribute much to software expansion efforts, but@olivercliffmay be able to weigh in on this point. It's possible@anniegbryantmay be able to help somewhat but will leave to her…

Ultimately would be great to have a student or keen software dev join the team—e.g., could be a good Google Summer of Code project. Will keep you posted…

@olivercliff
Copy link
Collaborator

Hi@fkiraly,glad to hear you like it! In fact, I designed the code with future integration of the sktime/sklearn framework in mind, which is probably why certain parts of it feel familiar (and hopefully the integration would not be too much of a hassle).

Your two main points, imo, would not only allow integration with sklearn/sktime, but also significantly improve the readability and usability of the standalone package. My thoughts after having a quick look at the code you referenced:

  • Thesklearn-baseclasses might be the more difficult aspect to implement, as it looks like it requirespyspito handle data differently - is that correct? Many methods store certain results directly in the data object in order to extract statistics from these results later on; otherwise the computation time blows out significantly. I imagine there is a simpler way to achieve this using thesklearnframework but I have not come across it yet.
  • Adopting theBasePairwiseTransformerPanelsounds achievable in a shorter period of time. Moreover, the arguments cover all cases that the methods inpyspirequire (e.g., multivariate or bivariate) and extend in useful directions (e.g., handles NaN or not).

I am unfortunately quite short on time these days and don't work directly on the codebase anymore, so I think the idea of a GSoC project, as@benfulchersuggests, is a great way forward.

@bruAristimunha
Copy link

Hey@fkiraly,@benfulcher,@olivercliff!

Has there been any progress on the Google Summer of code? I might be interested in doing the sklearn integration, but I didn't find the project in the sktime projects list.

@fkiraly
Copy link
Author

fkiraly commented Apr 14, 2024

@bruAristimunha,apologies, I did not see this post!

Yes, we have been selected for GSoC 2024, and this would have been an excellent topic!

Unfortunately, the application deadline was April 2.

We could still work on this though?
We have a great (unpaid) mentoring programme!
https://github.com/sktime/mentoring/tree/main

Or perhaps@benfulcherhas an academic internship available?

@fkiraly
Copy link
Author

fkiraly commented Apr 14, 2024

@benfulcher,@olivercliff,apologies, I missed the more recent discusion in my inbox.

Let us know if further collaboration here is of interest, we are going to kick off our summer workstreams in May.

@bruAristimunha
Copy link

Hi@fkiraly,

Unfortunately, doing unpaid work this way is not very interesting for me, but I appreciate the answer. It would be a "hard" project, with a lot of code, and a lot of time commitment.

Maybe next year if sktimes is selected.

@fkiraly
Copy link
Author

fkiraly commented Apr 16, 2024

@bruAristimunha,we did get selected 2024, getting paid would have required an application by April 2. Sorry that I did not see this.

How about an alternative idea then,@benfulcher:you (or someone from your team) could presentpyspiin one of thesktimemeet-ups, these are Fridays 4pm UTC at the moment. There is one free slot on April 26, and most of June is also available.

The aim would be to presentpyspiand a potential integration project, I'm sure many members of the community and adjacent listeners would find this interesting, someone might take that up.

@benfulcher
Copy link
Collaborator

Ok sounds good thanks for the invite—would be happy to present pyspi.@jmoo2880has done a bunch of work on it recently, getting it into a nice format (e.g., now pip installable). Trouble is that 4pm UTC seems to be 2am Sydney time, so it's not going to work at that timing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants