Skip to content

Commit c0c556d

Browse files
committed
Try a 40K dataset (4x data)
1 parent 06ea489 commit c0c556d

File tree

3 files changed

+25
-24
lines changed

3 files changed

+25
-24
lines changed

Diff for: data/data.xml.dvc

+5-4
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,13 @@
1-
md5: f5ea021eddd7b1df6de80b904cba1da6
1+
md5: 25c4a84510a41557840c61692dd14c11
22
frozen: true
33
deps:
44
- path: get-started/data.xml
55
repo:
66
url: https://github.com/iterative/dataset-registry
7-
rev_lock: f59388cd04276e75d70b2136597aaa27e7937cc3
7+
rev_lock: cf6481baf56f156aa0876709cc231aaf3f3a3c29
8+
rev: get-started-40K
89
outs:
9-
- md5: 22a1a2931c8370d3aeedd7183606fd7f
10-
size: 14445097
10+
- md5: 4bd325a30d5f1d5ea1a451d98767ddde
11+
size: 59918667
1112
hash: md5
1213
path: data.xml

Diff for: dvc.lock

+19-19
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,8 @@ stages:
55
deps:
66
- path: data/data.xml
77
hash: md5
8-
md5: 22a1a2931c8370d3aeedd7183606fd7f
9-
size: 14445097
8+
md5: 4bd325a30d5f1d5ea1a451d98767ddde
9+
size: 59918667
1010
- path: src/prepare.py
1111
hash: md5
1212
md5: f54d670ac8a4f63206781fc31d1f2651
@@ -18,38 +18,38 @@ stages:
1818
outs:
1919
- path: data/prepared
2020
hash: md5
21-
md5: 153aad06d376b6595932470e459ef42a.dir
22-
size: 8437363
21+
md5: f8934609be51496ee500f80eea539c6f.dir
22+
size: 35339221
2323
nfiles: 2
2424
featurize:
2525
cmd: python src/featurization.py data/prepared data/features
2626
deps:
2727
- path: data/prepared
2828
hash: md5
29-
md5: 153aad06d376b6595932470e459ef42a.dir
30-
size: 8437363
29+
md5: f8934609be51496ee500f80eea539c6f.dir
30+
size: 35339221
3131
nfiles: 2
3232
- path: src/featurization.py
3333
hash: md5
3434
md5: e22789fc9581cad11ef7a6fa3aa3f17b
3535
size: 4158
3636
params:
3737
params.yaml:
38-
featurize.max_features: 200
38+
featurize.max_features: 500
3939
featurize.ngrams: 2
4040
outs:
4141
- path: data/features
4242
hash: md5
43-
md5: f35d4cc2c552ac959ae602162b8543f3.dir
44-
size: 2232588
43+
md5: 121056a1b192b22d31e15f61c8376928.dir
44+
size: 12597137
4545
nfiles: 2
4646
train:
4747
cmd: python src/train.py data/features model.pkl
4848
deps:
4949
- path: data/features
5050
hash: md5
51-
md5: f35d4cc2c552ac959ae602162b8543f3.dir
52-
size: 2232588
51+
md5: 121056a1b192b22d31e15f61c8376928.dir
52+
size: 12597137
5353
nfiles: 2
5454
- path: src/train.py
5555
hash: md5
@@ -63,27 +63,27 @@ stages:
6363
outs:
6464
- path: model.pkl
6565
hash: md5
66-
md5: d1f6e055f7f5e2827fcfae68d9b64d4c
67-
size: 1958115
66+
md5: 0af1d96a26c6bdaca6094842c4bc45f3
67+
size: 3365729
6868
evaluate:
6969
cmd: python src/evaluate.py model.pkl data/features
7070
deps:
7171
- path: data/features
7272
hash: md5
73-
md5: f35d4cc2c552ac959ae602162b8543f3.dir
74-
size: 2232588
73+
md5: 121056a1b192b22d31e15f61c8376928.dir
74+
size: 12597137
7575
nfiles: 2
7676
- path: model.pkl
7777
hash: md5
78-
md5: d1f6e055f7f5e2827fcfae68d9b64d4c
79-
size: 1958115
78+
md5: 0af1d96a26c6bdaca6094842c4bc45f3
79+
size: 3365729
8080
- path: src/evaluate.py
8181
hash: md5
8282
md5: a1a59f55636170fb56e0c6afd3e28fa4
8383
size: 3315
8484
outs:
8585
- path: eval
8686
hash: md5
87-
md5: 80a081570c800c60b9b98ca4b3c91dd7.dir
88-
size: 1292342
87+
md5: 53c957a76e8202d50581613c16c5ee93.dir
88+
size: 4964257
8989
nfiles: 8

Diff for: params.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ prepare:
33
seed: 20170428
44

55
featurize:
6-
max_features: 200
6+
max_features: 500
77
ngrams: 2
88

99
train:

0 commit comments

Comments
 (0)