You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PR should start as a draft, then move to ready for review state after CI is passed and all applicable checkboxes are closed.
12
+
This approach ensures that reviewers don't spend extra time asking for regular requirements.
13
+
14
+
You can remove a checkbox as not applicable only if it doesn't relate to this PR in any way.
15
+
16
+
Checklist to comply with **before moving PR from draft**:
17
+
18
+
**PR completeness and readability**
19
+
20
+
-[ ] I have reviewed my changes thoroughly before submitting this pull request.
21
+
-[ ] I have commented my code, particularly in hard-to-understand areas.
22
+
-[ ] I have updated the documentation to reflect the changes or created a separate PR with update and provided its number in the description, if necessary.
23
+
-[ ] Git commit message contains an appropriate signed-off-by string _(see [CONTRIBUTING.md](https://github.com/intel/scikit-learn-intelex/blob/main/CONTRIBUTING.md#pull-requests) for details)_.
24
+
-[ ] I have added a respective label(s) to PR if I have a permission for that.
25
+
-[ ] I have resolved any merge conflicts that might occur with the base branch.
26
+
27
+
**Testing**
28
+
29
+
-[ ] I have run it locally and tested the changes extensively.
30
+
-[ ] All CI jobs are green or I have provided justification why they aren't.
31
+
-[ ] I have extended testing suite if new functionality was introduced in this PR.
3. Convert to requested form (data type, format, order, etc.)
9
9
10
+
Existing data sources:
11
+
- Synthetic data from sklearn
12
+
- OpenML datasets
13
+
- Custom loaders for named datasets
14
+
- User-provided datasets in compatible format
15
+
16
+
## Data Caching
17
+
10
18
There are two levels of caching with corresponding directories: `raw cache` for files downloaded from external sources, and just `cache` for files applicable for fast-loading in benchmarks.
11
19
12
20
Each dataset has few associated files in usual `cache`: data component files (`x`, `y`, `weights`, etc.) and JSON file with dataset properties (number of classes, clusters, default split arguments).
@@ -21,16 +29,39 @@ data_cache/
21
29
```
22
30
23
31
Cached file formats:
24
-
| Format | File extension | Associated Python types |
0 commit comments