You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/src/index.md
+6-6
Original file line number
Diff line number
Diff line change
@@ -78,11 +78,11 @@ pkg> free ParallelKMeans
78
78
-[X] Implementation of [Coresets](http://proceedings.mlr.press/v51/lucic16-supp.pdf).
79
79
-[X] Support for weighted K-means.
80
80
-[X] Support of MLJ Random generation hyperparameter.
81
+
-[X] Implementation of [Mini-batch KMeans variant](https://www.eecs.tufts.edu/~dsculley/papers/fastkmeans.pdf)
81
82
-[ ] Support for other distance metrics supported by [Distances.jl](https://github.com/JuliaStats/Distances.jl#supported-distances).
82
83
-[ ] Implementation of [Geometric methods to accelerate k-means algorithm](http://cs.baylor.edu/~hamerly/papers/sdm2016_rysavy_hamerly.pdf).
83
84
-[ ] Native support for tabular data inputs outside of MLJModels' interface.
84
-
-[ ] Refactoring and finalization of API design.
85
-
-[ ] GPU support.
85
+
-[ ] GPU support?
86
86
-[ ] Distributed calculations support.
87
87
-[ ] Optimization of code base.
88
88
-[ ] Improved Documentation
@@ -127,7 +127,7 @@ r.converged # whether the procedure converged
127
127
-[Elkan()](https://www.aaai.org/Papers/ICML/2003/ICML03-022.pdf) - Recommended for high dimensional data.
128
128
-[Yinyang()](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/ding15.pdf) - Recommended for large dimensions and/or large number of clusters.
129
129
-[Coreset()](http://proceedings.mlr.press/v51/lucic16-supp.pdf) - Recommended for very fast clustering of very large datasets, when extreme accuracy is not important.
130
-
-[MiniBatch()](https://www.eecs.tufts.edu/~dsculley/papers/fastkmeans.pdf) - Recommended for extremely large datasets.
130
+
-[MiniBatch()](https://www.eecs.tufts.edu/~dsculley/papers/fastkmeans.pdf) - Recommended for extremely large datasets, when extreme accuracy is not important.
Currently, the benchmark speed tests are based on the search for optimal number of clusters using the [Elbow Method](https://en.wikipedia.org/wiki/Elbow_method_(clustering)) since this is a practical use case for most practioners employing the K-Means algorithm.
0 commit comments