-
Notifications
You must be signed in to change notification settings - Fork 0
Upcoming events
We are organizing a coding sprint after the NIPS 2011 <http://nips.cc/>
_ conference.
For this sprint, we are trying to gather funding for contributors to fly in. Please list your name and who is funding your trip.
-
Gael Varoquaux: Funding: INRIA_
-
Bertrand Thirion: Funding: INRIA_
-
Fabian Pedregosa: Funding: INRIA_
-
Alex Gramfort: Funding: INRIA_
-
Olivier Grisel: Funding: Google_ + tinyclues_
-
Jake Vanderplas: Funding: Google_ + tinyclues_
-
David Warde-Farley: Funding: LISA_
-
Gilles Louppe: Funding:
University of Liège
_ -
Lars Buitinck: Funding: Google_ + tinyclues_
-
Vlad Niculae: Funding: Google_ + tinyclues_
-
Andreas Mueller: Funding: Google_ + tinyclues_
-
Mathieu Blondel: Funding: Google_ + tinyclues_ + private
-
Nicolás Della Penna: private.
.. _Google: http://google.com
.. _tinyclues: http://www.tinyclues.com
.. _INRIA: http://inria.fr/en/
.. _LISA: http://www.iro.umontreal.ca/~lisa
.. _University of Liège
: http://www.ulg.ac.be
-
Granada University, Instituto de la Paz y los Conflictos <http://www.ugr.es/~eirene/main.html>
, Campoamor classroom(map) <http://maps.google.com/maps/place?q=granada,+instituto+de+la+paz+y+los+conflictos&hl=en&cid=9774673672502768790>
Contributors might find useful the coding guidelines <http://scikit-learn.org/dev/developers/index.html#coding-guidelines>
_ .
Top priorities are merging: pull requests, fixing easyfix issues and improving documentation consistency.
In addition to the tasks listed below, it is useful to consider any issue in this list : https://github.com/scikit-learn/scikit-learn/issues
- Merge in Randomized linear models (branch 'randomized_lasso' on GaelVaroquaux's github (Gael Varoquaux and Alex Gramfort working on this)
-
Improve test coverage: Run 'make test-coverage' after installing the coverage module, find low hanging fruits to improve coverage, and add tests. Try to test the logic, and not simple aim for augmenting the number of lines covered.
-
Py3k support: First test joblib on Python3, then scikit-learn. Both generate sources that are python3 compatible, but these have not been tested.
Improving and merging existing pull requests is the number one priority: https://github.com/scikit-learn/scikit-learn/pulls
There is a lot of very good code lying there, it often just needs a small amount of polishing
- Rationalize images in documentation: we have 56Mo of images generated in the documentation (doc/_build/html/_images). First we should save jpg instead of pngs: it shrinks this directory to 45Mo (not a huge gain, granted). Second there is many times the same file saved. I need to understand what is going on, and fix that.
- Affinity propagation using sparse matrices: the affinity propagation algorithm (scikits.learn.cluster.affinity_propagation_) should be able to work on sparse input affinity matrices without converting them to dense. A good implementation should make this efficient on very large data.
-
Improve the documentation: You understand some aspects machine-learning. You can help making the scikit rock without writing a line of code: http://scikit-learn.org/dev/developers/index.html#documentation. See also Documentation-related issues in the issue tracker.
-
Text feature extraction (refactoring / API simplification) + hashing vectorizer: Olivier Grisel
-
Nearest Neighbors Classification/Regression : allowing more flexible Bayesian priors (currently only a flat prior is used); implementing alternative distance metrics: Jake Vanderplas
Participants: @mblondel
- Code clean up
- Speed improvements: don't reallocate clusters, track clusters that didn't change, triangular inequality
- L1 distance: use L1 distance in e step and median (instead of mean) in m step
- Fuzzy K-means: k-means with fuzzy cluster membership (not the same as GMM)
- Move argmin and average operators to pairwise module (for L1/L2)
- Support chunk size argument in argmin operator
- Merge @ogrisel's branch
- Add a score function (opposite of the kmeans objective)
- Sparse matrices
- fit_transform
- more output options in transform (hard, soft, dense)
Participants: @mblondel
- Merge random SVD PR
- Merge sparse RP PR
- Cython utils for fast and memory-efficient projection
Participants: @amueller
- Move to random projection module
- Patch liblinear to have warm restart + LogisticRegressionCV. Comment (by Fabian): I tried this, take a look here: liblinear fork
- Decision Tree (support boosted trees, loss matrix, multivariate regression)
- Ensemble classifiers Comment (by Gilles): I plan to review @pprett PR on Gradient Boosted Trees. I also want to implement parallel tree construction and prediction in the current implementation of forest of trees.
- Locality Sensitive Hashing, talk to Brian Holt
- Fused Lasso
- Group Lasso, talk to Alex Gramfort (by email)
- Manifold learning: MDS, t-SNE (talk to DWF)
- Bayesian classification (e.g. RVM)
- Sparse matrix support in dictionary learning module
Some of us are planning to stay at a Guest House in Granada to reduce the Hotel costs. If you are interested add your name and arrival and departure dates below:
========================== ================== =================== Name From To ========================== ================== =================== Olivier Grisel Dec. 11 Dec. 21
Gael Varoquaux Dec. 11 Dec. 21
David Warde-Farley Dec. 18 Dec. 21
Alex Gramfort Dec. 11 Dec. 21
Jake Vanderplas Dec. 15 Dec. 22
Bertrand Thirion Dec 12 Dec. 20
Gilles Louppe Dec 18 Dec. 21
Mathieu Blondel Dec 18 Dec. 22
Lars Buitinck Dec 18 Dec 22
Vlad Niculae Dec 18 Dec 22
Andreas Mueller Dec 11 Dec 22
Nicolás Della Penna Dec 18 Dec 22
(add your name here) ========================== ================== ===================