Skip to content

Commit ecf7441

Browse files
authored
Merge pull request #14 from ATISLabs/feature/generate_friendman_functions
[#1] - Feature/generate friendman functions
2 parents d6dcc86 + 75ba12c commit ecf7441

File tree

3 files changed

+102
-10
lines changed

3 files changed

+102
-10
lines changed

README.md

Lines changed: 15 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -25,16 +25,21 @@ The package has an interface for the dataset generator of the [ScikitLearn](http
2525
### ScikitLearn
2626
List of package datasets:
2727

28-
Dataset | Title | Reference
29-
----------------|------------------------------------------------------------------------|--------------------------------------------------
30-
make_blobs | Generate isotropic Gaussian blobs for clustering. | [link](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_moons.html)
31-
make_moons | Make two interleaving half circles | [link](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_blobs.html)
32-
make_s_curve | Generate an S curve dataset. | [link](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_s_curve.html)
33-
make_circles | Make a large circle containing a smaller circle in 2d | [link](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_circles.html)
34-
make_regression | Generate a random regression problem. | [link](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_regression.html)
35-
make_classification | Generate a random n-class classification problem. | [link](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html)
36-
make_low_rank_matrix | Generate a mostly low rank matrix with bell-shaped singular values.| [link](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_low_rank_matrix.html)
37-
make_swiss_roll | Generate a swiss roll dataset. | [link](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_swiss_roll.html)
28+
Dataset | Title | Reference
29+
---------------------|-------------------------------------------------------------------------|--------------------------------------------------
30+
make_blobs | Generate isotropic Gaussian blobs for clustering. | [link](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_moons.html)
31+
make_moons | Make two interleaving half circles | [link](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_blobs.html)
32+
make_s_curve | Generate an S curve dataset. | [link](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_s_curve.html)
33+
make_regression | Generate a random regression problem. | [link](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_regression.html])
34+
make_classification | Generate a random n-class classification problem. | [link](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html])
35+
make_friedman1 | Generate the “Friedman #1” regression problem. | [link](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_friedman1.html)
36+
make_friedman2 | Generate the “Friedman #2” regression problem. | [link](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_friedman2.html)
37+
make_friedman3 | Generate the “Friedman #3” regression problem. | [link](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_friedman3.html)
38+
make_circles | Make a large circle containing a smaller circle in 2d | [link](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_circles.html)
39+
make_regression | Generate a random regression problem. | [link](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_regression.html)
40+
make_classification | Generate a random n-class classification problem. | [link](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html)
41+
make_low_rank_matrix | Generate a mostly low rank matrix with bell-shaped singular values. | [link](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_low_rank_matrix.html)
42+
make_swiss_roll | Generate a swiss roll dataset. | [link](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_swiss_roll.html)
3843

3944
**Disclaimer**: SyntheticDatasets.jl borrows code and documentation from
4045
[scikit-learn](https://scikit-learn.org/stable/modules/classes.html#samples-generator) in the dataset module, but *it is not an official part

src/sklearn.jl

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -240,6 +240,77 @@ function generate_classification(; n_samples::Int = 100,
240240
return convert(features, labels)
241241
end
242242

243+
"""
244+
function generate_friedman1(; n_samples::Int = 100,
245+
n_features::Int = 10,
246+
noise::Float64 = 0.0,
247+
random_state::Union{Int, Nothing} = nothing)::DataFrame
248+
Generate the “Friedman #1” regression problem. Sklearn interface to make_regression.
249+
#Arguments
250+
- `n_samples::Int = 100`: The number of samples.
251+
- `n_features::Int = 10`: The number of features. Should be at least 5.
252+
- `noise::Union{Nothing, Float64} = nothing`: The standard deviation of the gaussian noise applied to the output.
253+
- `random_state::Union{Int, Nothing} = nothing`: Determines random number generation for dataset noise. Pass an int for reproducible output across multiple function calls.
254+
Reference: [link](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_friedman1.html)
255+
"""
256+
function generate_friedman1(; n_samples::Int = 100,
257+
n_features::Int = 10,
258+
noise::Float64 = 0.0,
259+
random_state::Union{Int, Nothing} = nothing)::DataFrame
260+
261+
(features, labels) = datasets.make_friedman1( n_samples = n_samples,
262+
n_features = n_features,
263+
noise = noise,
264+
random_state = random_state)
265+
266+
return convert(features, labels)
267+
end
268+
269+
"""
270+
function generate_friedman2(; n_samples::Int = 100,
271+
noise::Float64 = 0.0,
272+
random_state::Union{Int, Nothing} = nothing)::DataFrame
273+
Generate the “Friedman #2” regression problem. Sklearn interface to make_friedman2.
274+
#Arguments
275+
- `n_samples::Int = 100`: The number of samples.
276+
- `n_features::Int = 10`: The number of features. Should be at least 5.
277+
- `noise::Union{Nothing, Float64} = nothing`: The standard deviation of the gaussian noise applied to the output.
278+
- `random_state::Union{Int, Nothing} = nothing`: Determines random number generation for dataset noise. Pass an int for reproducible output across multiple function calls.
279+
Reference: [link](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_friedman2.html)
280+
"""
281+
function generate_friedman2(; n_samples::Int = 100,
282+
noise::Float64 = 0.0,
283+
random_state::Union{Int, Nothing} = nothing)::DataFrame
284+
285+
(features, labels) = datasets.make_friedman2( n_samples = n_samples,
286+
noise = noise,
287+
random_state = random_state)
288+
289+
return convert(features, labels)
290+
end
291+
292+
"""
293+
function generate_friedman3(; n_samples::Int = 100,
294+
noise::Float64 = 0.0,
295+
random_state::Union{Int, Nothing} = nothing)::DataFrame
296+
Generate the “Friedman #3” regression problem. Sklearn interface to make_friedman3.
297+
#Arguments
298+
- `n_samples::Int = 100`: The number of samples.
299+
- `noise::Union{Nothing, Float64} = nothing`: The standard deviation of the gaussian noise applied to the output.
300+
- `random_state::Union{Int, Nothing} = nothing`: Determines random number generation for dataset noise. Pass an int for reproducible output across multiple function calls.
301+
Reference: [link](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_friedman3.html)
302+
"""
303+
function generate_friedman3(; n_samples::Int = 100,
304+
noise::Float64 = 0.0,
305+
random_state::Union{Int, Nothing} = nothing)::DataFrame
306+
307+
(features, labels) = datasets.make_friedman3( n_samples = n_samples,
308+
noise = noise,
309+
random_state = random_state)
310+
311+
return convert(features, labels)
312+
end
313+
243314
"""
244315
function generate_low_rank_matrix(; n_samples::Int =100,
245316
n_features::Int =100,

test/runtests.jl

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,22 @@ using Test
4747
@test size(data)[1] == samples
4848
@test size(data)[2] == features + 1
4949

50+
data = SyntheticDatasets.generate_friedman1(n_samples = samples,
51+
n_features = features)
52+
53+
@test size(data)[1] == samples
54+
@test size(data)[2] == features + 1
55+
56+
data = SyntheticDatasets.generate_friedman2(n_samples = samples)
57+
58+
@test size(data)[1] == samples
59+
@test size(data)[2] == 5
60+
61+
data = SyntheticDatasets.generate_friedman3(n_samples = samples)
62+
63+
@test size(data)[1] == samples
64+
@test size(data)[2] == 5
65+
5066
data = SyntheticDatasets.generate_low_rank_matrix(n_samples = samples,
5167
n_features = features,
5268
effective_rank = 10,

0 commit comments

Comments
 (0)