You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
make_blobs | Generate isotropic Gaussian blobs for clustering. | [link](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_moons.html)
31
-
make_moons | Make two interleaving half circles | [link](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_blobs.html)
32
-
make_s_curve | Generate an S curve dataset. | [link](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_s_curve.html)
33
-
make_regression | Generate a random regression problem. | [link](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_regression.html])
34
-
make_classification | Generate a random n-class classification problem. | [link](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html])
make_blobs | Generate isotropic Gaussian blobs for clustering. | [link](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_moons.html)
31
+
make_moons | Make two interleaving half circles | [link](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_blobs.html)
32
+
make_s_curve | Generate an S curve dataset. | [link](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_s_curve.html)
33
+
make_regression | Generate a random regression problem. | [link](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_regression.html])
34
+
make_classification | Generate a random n-class classification problem. | [link](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html])
35
+
make_friedman1 | Generate the “Friedman #1” regression problem. | [link](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_friedman1.html)
36
+
make_friedman2 | Generate the “Friedman #2” regression problem. | [link](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_friedman2.html)
37
+
make_friedman3 | Generate the “Friedman #3” regression problem. | [link](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_friedman3.html)
38
+
make_circles | Make a large circle containing a smaller circle in 2d | [link](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_circles.html)
39
+
make_regression | Generate a random regression problem. | [link](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_regression.html)
40
+
make_classification | Generate a random n-class classification problem. | [link](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html)
41
+
make_low_rank_matrix | Generate a mostly low rank matrix with bell-shaped singular values. | [link](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_low_rank_matrix.html)
42
+
make_swiss_roll | Generate a swiss roll dataset. | [link](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_swiss_roll.html)
35
43
36
44
**Disclaimer**: SyntheticDatasets.jl borrows code and documentation from
37
45
[scikit-learn](https://scikit-learn.org/stable/modules/classes.html#samples-generator) in the dataset module, but *it is not an official part
Copy file name to clipboardExpand all lines: src/sklearn.jl
+154-5Lines changed: 154 additions & 5 deletions
Original file line number
Diff line number
Diff line change
@@ -85,6 +85,37 @@ function generate_s_curve(; n_samples::Int = 100,
85
85
returnconvert(features, labels)
86
86
end
87
87
88
+
"""
89
+
function generate_circles(; n_samples::Int = 100,
90
+
shuffle::Bool = true,
91
+
noise::Float64 = 0.0,
92
+
random_state::Union{Int, Nothing} = nothing,
93
+
factor::Float64 = 0.8)::DataFrame
94
+
Make a large circle containing a smaller circle in 2d. Sklearn interface to make_circles.
95
+
# Arguments
96
+
- `n_samples::Union{Int, Tuple{Int, Int}} = 100`: If int, it is the total number of points generated. For odd numbers, the inner circle will have one point more than the outer circle. If two-element tuple, number of points in outer circle and inner circle.
97
+
- `shuffle::Bool = true`: Whether to shuffle the samples.
98
+
- `noise::Union{Nothing, Float64} = nothing`: Standard deviation of Gaussian noise added to the data.
99
+
- `random_state::Union{Int, Nothing} = nothing`: Determines random number generation for dataset shuffling and noise. Pass an int for reproducible output across multiple function calls.
100
+
- `factor::Float64 = 0.8`: Scale factor between inner and outer circle.
Generate the “Friedman #1” regression problem. Sklearn interface to make_regression.
249
+
#Arguments
250
+
- `n_samples::Int = 100`: The number of samples.
251
+
- `n_features::Int = 10`: The number of features. Should be at least 5.
252
+
- `noise::Union{Nothing, Float64} = nothing`: The standard deviation of the gaussian noise applied to the output.
253
+
- `random_state::Union{Int, Nothing} = nothing`: Determines random number generation for dataset noise. Pass an int for reproducible output across multiple function calls.
Generate the “Friedman #2” regression problem. Sklearn interface to make_friedman2.
274
+
#Arguments
275
+
- `n_samples::Int = 100`: The number of samples.
276
+
- `n_features::Int = 10`: The number of features. Should be at least 5.
277
+
- `noise::Union{Nothing, Float64} = nothing`: The standard deviation of the gaussian noise applied to the output.
278
+
- `random_state::Union{Int, Nothing} = nothing`: Determines random number generation for dataset noise. Pass an int for reproducible output across multiple function calls.
Generate the “Friedman #3” regression problem. Sklearn interface to make_friedman3.
297
+
#Arguments
298
+
- `n_samples::Int = 100`: The number of samples.
299
+
- `noise::Union{Nothing, Float64} = nothing`: The standard deviation of the gaussian noise applied to the output.
300
+
- `random_state::Union{Int, Nothing} = nothing`: Determines random number generation for dataset noise. Pass an int for reproducible output across multiple function calls.
function generate_low_rank_matrix(; n_samples::Int =100,
316
+
n_features::Int =100,
317
+
effective_rank::Int =10,
318
+
tail_strength::Float64 =0.5,
319
+
random_state::Union{Int, Nothing} = nothing)
320
+
Generate a mostly low rank matrix with bell-shaped singular values
321
+
#Arguments
322
+
- `n_samples::Int = 100`: The number of samples.
323
+
- `n_features::Int = 20`: The total number of features. These comprise `n_informative` informative features, `n_redundant` redundant features, `n_repeated` duplicated features and `n_features-n_informative-n_redundant-n_repeated` useless features drawn at random.
324
+
- `effective_rank::Int = 10`: The approximate number of singular vectors required to explain most of the data by linear combinations.
325
+
- `tail_strength::Float64 = 0.5`: The relative importance of the fat noisy tail of the singular values profile.
326
+
- `random_state::Union{Int, Nothing} = nothing`: Determines random number generation for dataset creation. Pass an int for reproducible output across multiple function calls. See Glossary.
features = datasets.make_low_rank_matrix(n_samples = n_samples,
336
+
n_features = n_features,
337
+
effective_rank = effective_rank,
338
+
tail_strength = tail_strength,
339
+
random_state = random_state)
340
+
return features
341
+
end
342
+
343
+
"""
344
+
function generate_swiss_roll(; n_samples::Int = 100,
345
+
noise::Float64 = 0.0,
346
+
random_state::Union{Int,Nothing} = nothing)
347
+
Generate a swiss roll dataset.
348
+
#Arguments
349
+
- `n_samples::Int = 100`: The number of samples.
350
+
- `noise::Float64 = 0.0 : Standard deviation of Gaussian noise added to the data.
351
+
- `random_state::Union{Int, Nothing} = nothing`: Determines random number generation for dataset creation. Pass an int for reproducible output across multiple function calls. See Glossary.
0 commit comments