Skip to content

Commit 501b997

Browse files
author
ArturoAmorQ
committed
Revert merga main
1 parent ed21c5a commit 501b997

File tree

146 files changed

+1294
-1869
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

146 files changed

+1294
-1869
lines changed

.github/workflows/deploy-gh-pages.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ jobs:
5858
5959
- name: Upload jupyter-book artifact for preview in PRs
6060
if: ${{ github.event_name == 'pull_request' }}
61-
uses: actions/upload-artifact@v4
61+
uses: actions/upload-artifact@v3
6262
with:
6363
name: jupyter-book
6464
path: |

.github/workflows/jupyter-book-pr-preview.yml

+4-3
Original file line numberDiff line numberDiff line change
@@ -19,10 +19,11 @@ jobs:
1919
sha: ${{ github.event.workflow_run.head_sha }}
2020
context: 'JupyterBook preview'
2121

22-
- uses: actions/download-artifact@v4
22+
- uses: dawidd6/action-download-artifact@v2
2323
with:
24-
github-token: ${{secrets.GITHUB_TOKEN}}
25-
run-id: ${{ github.event.workflow_run.id }}
24+
github_token: ${{secrets.GITHUB_TOKEN}}
25+
workflow: deploy-gh-pages.yml
26+
run_id: ${{ github.event.workflow_run.id }}
2627
name: jupyter-book
2728

2829
- name: Get pull request number

.gitignore

+1
Original file line numberDiff line numberDiff line change
@@ -33,3 +33,4 @@ doc/_build
3333
.idea
3434
*.code-workspace
3535
.vscode
36+

.jupyter/README.md

+1
Original file line numberDiff line numberDiff line change
@@ -1 +1,2 @@
11
This directory is to setup jupyter on binder
2+

.jupyter/jupyter_notebook_config.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
11
# To use jupytext in binder
2-
c.ContentsManager.preferred_jupytext_formats_read = "py:percent" # noqa
2+
c.ContentsManager.preferred_jupytext_formats_read = 'py:percent' # noqa

.pre-commit-config.yaml

+6-6
Original file line numberDiff line numberDiff line change
@@ -5,16 +5,16 @@ repos:
55
- id: check-yaml
66
- id: end-of-file-fixer
77
exclude: notebooks
8-
exclude_types: [svg]
98
- id: trailing-whitespace
109
exclude: notebooks
11-
exclude_types: [svg]
1210
- repo: https://github.com/psf/black
1311
rev: 23.1.0
1412
hooks:
1513
- id: black
16-
- repo: https://github.com/astral-sh/ruff-pre-commit
17-
rev: v0.11.2
14+
- repo: https://github.com/pycqa/flake8
15+
rev: 4.0.1
1816
hooks:
19-
- id: ruff
20-
args: ["--fix", "--output-format=full"]
17+
- id: flake8
18+
entry: pflake8
19+
additional_dependencies: [pyproject-flake8]
20+
types: [file, python]

CITATION.cff

-8
This file was deleted.

README.md

+4-3
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,9 @@
11
# scikit-learn course
22

3-
This is the source code for the [Machine learning in Python with scikit-learn
4-
MOOC](https://www.fun-mooc.fr/en/courses/machine-learning-python-scikit-learn).
5-
Enroll for the full MOOC experience (quiz solutions, executable
3+
📢 📢 📢 A new session of the [Machine learning in Python with scikit-learn
4+
MOOC](https://www.fun-mooc.fr/en/courses/machine-learning-python-scikit-learn),
5+
is available starting on November 8th, 2023 and will remain open on self-paced
6+
mode. Enroll for the full MOOC experience (quizz solutions, executable
67
notebooks, discussion forum, etc ...) !
78

89
The MOOC is free and hosted on the [FUN-MOOC](https://fun-mooc.fr/) platform

build_tools/generate-exercise-from-solution.py

+16-49
Original file line numberDiff line numberDiff line change
@@ -5,10 +5,6 @@
55
from jupytext.myst import myst_to_notebook
66
import jupytext
77

8-
9-
WRITE_YOUR_CODE_COMMENT = "# Write your code here."
10-
11-
128
def replace_simple_text(input_py_str):
139
result = input_py_str.replace("📃 Solution for", "📝")
1410
return result
@@ -23,62 +19,37 @@ def remove_solution(input_py_str):
2319
before this comment and add "# Write your code here." at the end of the
2420
cell.
2521
"""
26-
nb = jupytext.reads(input_py_str, fmt="py:percent")
22+
nb = jupytext.reads(input_py_str, fmt='py:percent')
2723

28-
cell_tags_list = [c["metadata"].get("tags") for c in nb.cells]
29-
is_solution_list = [
30-
tags is not None and "solution" in tags for tags in cell_tags_list
31-
]
24+
cell_tags_list = [c['metadata'].get('tags') for c in nb.cells]
25+
is_solution_list = [tags is not None and 'solution' in tags
26+
for tags in cell_tags_list]
3227
# Completely remove cells with "solution" tags
33-
nb.cells = [
34-
cell
35-
for cell, is_solution in zip(nb.cells, is_solution_list)
36-
if not is_solution
37-
]
28+
nb.cells = [cell for cell, is_solution in zip(nb.cells, is_solution_list)
29+
if not is_solution]
3830

3931
# Partial cell removal based on "# solution" comment
4032
marker = "# solution"
41-
pattern = re.compile(f"^{marker}.*", flags=re.MULTILINE | re.DOTALL)
33+
pattern = re.compile(f"^{marker}.*", flags=re.MULTILINE|re.DOTALL)
4234

43-
cells_to_modify = [
44-
c
45-
for c in nb.cells
46-
if c["cell_type"] == "code" and marker in c["source"]
47-
]
35+
cells_to_modify = [c for c in nb.cells if c["cell_type"] == "code" and
36+
marker in c["source"]]
4837

4938
for c in cells_to_modify:
50-
c["source"] = pattern.sub(WRITE_YOUR_CODE_COMMENT, c["source"])
51-
52-
previous_cell_is_write_your_code = False
53-
all_cells_before_deduplication = nb.cells
54-
nb.cells = []
55-
for c in all_cells_before_deduplication:
56-
if c["cell_type"] == "code" and c["source"] == WRITE_YOUR_CODE_COMMENT:
57-
current_cell_is_write_your_code = True
58-
else:
59-
current_cell_is_write_your_code = False
60-
if (
61-
current_cell_is_write_your_code
62-
and previous_cell_is_write_your_code
63-
):
64-
# Drop duplicated "write your code here" cells.
65-
continue
66-
nb.cells.append(c)
67-
previous_cell_is_write_your_code = current_cell_is_write_your_code
39+
c["source"] = pattern.sub("# Write your code here.", c["source"])
6840

6941
# TODO: we could potentially try to avoid changing the input file jupytext
7042
# header since this info is rarely useful. Let's keep it simple for now.
71-
py_nb_str = jupytext.writes(nb, fmt="py:percent")
43+
py_nb_str = jupytext.writes(nb, fmt='py:percent')
7244
return py_nb_str
7345

7446

7547
def write_exercise(solution_path, exercise_path):
76-
print(f"Writing exercise to {exercise_path} from solution {solution_path}")
7748
input_str = solution_path.read_text()
7849

7950
output_str = input_str
8051
for replace_func in [replace_simple_text, remove_solution]:
81-
output_str = replace_func(output_str)
52+
output_str= replace_func(output_str)
8253
exercise_path.write_text(output_str)
8354

8455

@@ -88,9 +59,7 @@ def write_all_exercises(python_scripts_folder):
8859
for solution_path in solution_paths:
8960
exercise_path = Path(str(solution_path).replace("_sol_", "_ex_"))
9061
if not exercise_path.exists():
91-
print(
92-
f"{exercise_path} does not exist, generating it from solution."
93-
)
62+
print(f"{exercise_path} does not exist")
9463

9564
write_exercise(solution_path, exercise_path)
9665

@@ -101,14 +70,12 @@ def write_all_exercises(python_scripts_folder):
10170
if path.is_dir():
10271
write_all_exercises(path)
10372
else:
104-
if "_ex_" not in str(path):
73+
if '_ex_' not in str(path):
10574
raise ValueError(
106-
f"Path argument should be an exercise file. Path was {path}"
107-
)
75+
f'Path argument should be an exercise file. Path was {path}')
10876
solution_path = Path(str(path).replace("_ex_", "_sol_"))
10977
if not solution_path.exists():
11078
raise ValueError(
111-
f"{solution_path} does not exist, check argument path {path}"
112-
)
79+
f"{solution_path} does not exist, check argument path {path}")
11380

11481
write_exercise(solution_path, path)

build_tools/generate-index.py

+5-12
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ def get_first_title(path):
4141
elif path.suffix == ".md":
4242
md_str = path.read_text()
4343
else:
44-
raise ValueError(f"{path} is not a .py or a .md file")
44+
raise ValueError(f"{filename} is not a .py or a .md file")
4545

4646
return get_first_title_from_md_str(md_str)
4747

@@ -96,9 +96,7 @@ def get_single_file_markdown(docname):
9696
# This is simpler to point to inria.github.io generated HTML otherwise
9797
# there are quirks (MyST in quizzes not supported, slides not working,
9898
# etc ...)
99-
relative_url = (
100-
str(target).replace("jupyter-book/", "").replace(".md", ".html")
101-
)
99+
relative_url = str(target).replace("jupyter-book/", "").replace(".md", ".html")
102100
target = f"https://inria.github.io/scikit-learn-mooc/{relative_url}"
103101

104102
return f"[{title}]({target})"
@@ -142,9 +140,7 @@ def test_get_lesson_markdown():
142140
documents = json_info["documents"]
143141
print(
144142
get_lesson_markdown(
145-
documents[
146-
"predictive_modeling_pipeline/01_tabular_data_exploration_index"
147-
]
143+
documents["predictive_modeling_pipeline/01_tabular_data_exploration_index"]
148144
)
149145
)
150146

@@ -160,8 +156,7 @@ def get_module_markdown(module_dict, documents):
160156
module_title = module_dict["caption"]
161157
heading = f"# {module_title}"
162158
content = "\n\n".join(
163-
get_lesson_markdown(documents[docname])
164-
for docname in module_dict["items"]
159+
get_lesson_markdown(documents[docname]) for docname in module_dict["items"]
165160
)
166161
return f"{heading}\n\n{content}"
167162

@@ -224,9 +219,7 @@ def get_full_index_ipynb(toc_path):
224219
md_str = get_full_index_markdown(toc_path)
225220
nb = jupytext.reads(md_str, format=".md")
226221

227-
nb = nbformat.v4.new_notebook(
228-
cells=[nbformat.v4.new_markdown_cell(md_str)]
229-
)
222+
nb = nbformat.v4.new_notebook(cells=[nbformat.v4.new_markdown_cell(md_str)])
230223

231224
# nb_content = jupytext.writes(nb, fmt=".ipynb")
232225
# nb = json.loads(nb_content)

build_tools/generate-quizzes.py

+12-19
Original file line numberDiff line numberDiff line change
@@ -13,21 +13,16 @@ def remove_solution(input_myst_str):
1313
"""
1414
nb = myst_to_notebook(input_myst_str)
1515

16-
cell_tags_list = [c["metadata"].get("tags") for c in nb.cells]
17-
is_solution_list = [
18-
tags is not None and "solution" in tags for tags in cell_tags_list
19-
]
20-
nb.cells = [
21-
cell
22-
for cell, is_solution in zip(nb.cells, is_solution_list)
23-
if not is_solution
24-
]
25-
26-
myst_nb_str = jupytext.writes(nb, fmt="myst")
27-
28-
header_pattern = re.compile(
29-
r"---\njupytext.+---\s*", re.DOTALL | re.MULTILINE
30-
)
16+
cell_tags_list = [c['metadata'].get('tags') for c in nb.cells]
17+
is_solution_list = [tags is not None and 'solution' in tags
18+
for tags in cell_tags_list]
19+
nb.cells = [cell for cell, is_solution in zip(nb.cells, is_solution_list)
20+
if not is_solution]
21+
22+
myst_nb_str = jupytext.writes(nb, fmt='myst')
23+
24+
header_pattern = re.compile(r"---\njupytext.+---\s*",
25+
re.DOTALL | re.MULTILINE)
3126
return re.sub(header_pattern, "", myst_nb_str)
3227

3328

@@ -44,14 +39,12 @@ def write_all_exercises(input_root_path, output_root_path):
4439

4540
for input_path in input_exercises:
4641
# FIXME there may be a better way with the pathlib API
47-
relative_path_str = re.sub(
48-
str(input_root_path) + "/?", "", str(input_path)
49-
)
42+
relative_path_str = re.sub(str(input_root_path) + "/?", "",
43+
str(input_path))
5044
output_path = Path(output_root_path).joinpath(relative_path_str)
5145
print(str(input_path), str(output_path))
5246
write_exercise_myst(input_path, output_path)
5347

54-
5548
if __name__ == "__main__":
5649
input_root_path = sys.argv[1]
5750
output_root_path = sys.argv[2]

build_tools/sanity-check.py

+5-6
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,8 @@
44

55
# TODO: we could get the list from .gitignore
66
IGNORE_LIST = [
7-
".ipynb_checkpoints",
8-
"__pycache__",
7+
'.ipynb_checkpoints',
8+
'__pycache__',
99
]
1010

1111
folder1, folder2 = sys.argv[1:3]
@@ -28,7 +28,6 @@ def get_basename(folder):
2828
only_in_folder2 = set(basenames2) - set(basenames1)
2929

3030
raise RuntimeError(
31-
f"Inconsistency between folder {folder1} and {folder2}\n"
32-
f"Only in folder {folder1}: {only_in_folder1}\n"
33-
f"Only in folder {folder2}: {only_in_folder2}"
34-
)
31+
f'Inconsistency between folder {folder1} and {folder2}\n'
32+
f'Only in folder {folder1}: {only_in_folder1}\n'
33+
f'Only in folder {folder2}: {only_in_folder2}')

check_env.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ def import_version(pkg, min_ver, fail_msg=""):
6666
"numpy": "1.16",
6767
"scipy": "1.2",
6868
"matplotlib": "3.0",
69-
"sklearn": "1.6",
69+
"sklearn": "1.3",
7070
"pandas": "1",
7171
"seaborn": "0.11",
7272
"notebook": "5.7",

datasets/penguins.csv

+1
Original file line numberDiff line numberDiff line change
@@ -343,3 +343,4 @@ PAL0910,65,Chinstrap penguin (Pygoscelis antarctica),Anvers,Dream,"Adult, 1 Egg
343343
PAL0910,66,Chinstrap penguin (Pygoscelis antarctica),Anvers,Dream,"Adult, 1 Egg Stage",N99A2,No,2009-11-21,49.6,18.2,193,3775,MALE,9.4618,-24.70615,Nest never observed with full clutch.
344344
PAL0910,67,Chinstrap penguin (Pygoscelis antarctica),Anvers,Dream,"Adult, 1 Egg Stage",N100A1,Yes,2009-11-21,50.8,19,210,4100,MALE,9.98044,-24.68741,NA
345345
PAL0910,68,Chinstrap penguin (Pygoscelis antarctica),Anvers,Dream,"Adult, 1 Egg Stage",N100A2,Yes,2009-11-21,50.2,18.7,198,3775,FEMALE,9.39305,-24.25255,NA
346+

environment-dev.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ name: scikit-learn-course
22
channels:
33
- conda-forge
44
dependencies:
5-
- scikit-learn >= 1.6
5+
- scikit-learn >= 1.3
66
- pandas >= 1
77
- matplotlib-base
88
- seaborn >= 0.13

environment.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ channels:
44
- conda-forge
55

66
dependencies:
7-
- scikit-learn >= 1.6
7+
- scikit-learn >= 1.3
88
- pandas >= 1
99
- matplotlib-base
1010
- seaborn >= 0.13
-35.3 KB
Loading
-78.1 KB
Loading

figures/plot_iris_visualization.py

+13-12
Original file line numberDiff line numberDiff line change
@@ -18,19 +18,20 @@
1818
plt.figure(figsize=(2.5, 2))
1919
patches = list()
2020
for this_y, target_name in enumerate(iris.target_names):
21-
patch = plt.hist(
22-
x[y == this_y],
23-
bins=np.linspace(x.min(), x.max(), 20),
24-
label=target_name,
25-
)
21+
patch = plt.hist(x[y == this_y],
22+
bins=np.linspace(x.min(), x.max(), 20),
23+
label=target_name)
2624
patches.append(patch[-1][0])
2725
style_figs.light_axis()
28-
feature_name = feature_name.replace(" ", "_")
29-
feature_name = feature_name.replace("(", "")
30-
feature_name = feature_name.replace(")", "")
31-
plt.savefig("iris_{}_hist.svg".format(feature_name))
26+
feature_name = feature_name.replace(' ', '_')
27+
feature_name = feature_name.replace('(', '')
28+
feature_name = feature_name.replace(')', '')
29+
plt.savefig('iris_{}_hist.svg'.format(feature_name))
3230

33-
plt.figure(figsize=(6, 0.25))
34-
plt.legend(patches, iris.target_names, ncol=3, loc=(0, -0.37), borderaxespad=0)
31+
plt.figure(figsize=(6, .25))
32+
plt.legend(patches, iris.target_names, ncol=3, loc=(0, -.37),
33+
borderaxespad=0)
3534
style_figs.no_axis()
36-
plt.savefig("legend_irises.svg")
35+
plt.savefig('legend_irises.svg')
36+
37+

0 commit comments

Comments
 (0)