Skip to content

Commit 0656f89

Browse files
fmohrDeathn0tJan van RijnandreasparaskevaChengYan7
authored
Deephyper (#23)
* update * update scripts * update * update * update * moved table of datasets metadata * updated learning curve plotter * added *.jpg to gitignore * recording last epoch of training * updated running scripts for polaris * adding liblinear resutls in polaris * fixed bug brier_score * added json query to retrieve other key-values from anchors * added balanced accuracy from confusion matrix * adapted plot_learning_curves for all plots to have the same number of ranks if same lenght of list * added function to retrieve hyperparameter values from row * removed generic from preprocessing * updated knn workflow * fix in knn * removed generic univariate feature selector * updated knn * adding xgboost scripts * import mpi4py is now optional for cli/_run.py * removed harver * updated run function * updated notebook * added logging to run * making poly_degree max to 3 for preprocessing * do not generate bias feature for poly features * adding comment * adding scripts for knn * adding logging and config to lcdb test * updated notebooks * default transform in baseworkflow is identity * adding constant predictor * updated color map for learning curve plots * adding *.pyc to .gitignore * adding code for lcdb 1 curves * adding scripts for constant predictor * running constant predictor on all datasets with different seeds * updated ploting of learning curves * added RandomClassifier * added computation of OOB metrics for RFs * some changes on dummies ConstantWorkflow is now MajorityWorkflow Added MeanWorkflow and MedianWorkflow and also added the option to specify a regression task in the CLI via -tt=regression * organizing and sharing code for neurocom * updating .gitignore * plots of HPOBench with constant predictor * pushed code for multi-fidelity * added scripts for mf-hpo * adding notebooks * updating scripts * updated notebooks * added scripts for other problems of hpobench * updated doc of lcdb run --help * updated fetch command to show split * added dhb/lcdb/hpo benchmark * updating notebooks * made random forest scripts as array jobs * updated create script * updated scripts * removing notebooks, updating scripts * udpating scripts for knn * solving issue #18 * updated densenn scripts * solving issue on oob scores * fixing bug in densenn scorer * task-type was not a defined parameter of main * lcdb run can now decide the type of schedule for anchor and epoch * adding anchor and epoch schedule parameters to lcdb test * densenn workflow now computes and return the number of parameters of the network * updated gitignore * added scripts for 2024-neurocom * added pfn experimental scripts * updated experimental scripts for MF-HPO * updated get_iteration_schedule with generic get_schedule for random forest * adding scripts * adding scripts * surf snellius scripts - setup, liblinear * adding tensorflow dependency for nn workflows * cleaning script for polaris installation * updating .gitignore * Merge branch 'deephyper' of https://github.com/fmohr/lcdb.git into deephyper * updated install script * added scripts to run experiments in delft * changed paths in the run * updated run script * update slurm config * updated jobscript * adding a comment * update notebooks * update * Create Readme.md * Update Readme.md * update * update * working with keras 3.1.1 * updated dependencies * Restriction of PolynomialFeatures to not increase the DB beyond 4GB * updated scripts for surf-snellius cluster * Update Readme.md * Fix imputation of NaN values * added lookahead regularization * Update Readme.md * Update Readme.md * printing preprocess details * updating memory consumption check in preprocessing workflow * check iteration-wise curve by plotting * adding jmespath to setup * removing unecessary tmp variable * starting to integrate the memory_limit * serial evaluator will now use 1 worker by default * refactor * added memory limit to run * changing import for deephyper analysis and removing memory check in preprocessing * replacing profile decorator by partial with terminate_on_memory_exceeded * adding memory limit in test * using terminate_on_memory_exceeded for lcdb test * handling BrokenProcessPool exception when memory_limit is exhausted * script cleanup and plot additions * Update Readme.md * fixed json serialization issue, numpy types * update highlight default hyperparameter in the plotting * added support for regularizers: (#21) * added support for regularizers: - SWA - Lookahead - Snapshot Ensembles * made periodicity of snapshot ensembles a hyperparameter * Huge changes for proper scoring and pre-processing * changed import order in utils * added code for SWA * added several sklearn models (some without hyperparameters yet) * updated workflows * solved problem with passed pp hyperparameters * fixing issue in lcdb run when initial configs are not passed and inactive hyperparameters exist in the default hp configuration * cleanup snellius scripts * Update Readme.md * updated and added several regularization techniques for DenseNN * changed project structure, repaired bug in SVM, added campaign logic * added logic for plotting and moved around some CLI parameters * update data augmentation workflos in DenseNN * update debug for extracting the traceback * adjusted the _run.py w.r.t. to the bug reported by Andreas * WIP submitted fix for RandomForest that may affect ExtraTrees (check max_samples and max_features arguments) * refactor * update README for Snellius * Deephyper update (#22) * WIP creating db, builder subpackages, started to revert experiments/_experiments.py * created logic for lcdb add, and created repository logic * resolved some bugs. Should work properly now. * added LCDB class to ease things * fixed some bugs in the fetching of all results. --------- Co-authored-by: Deathn0t <[email protected]> Co-authored-by: felix <felix@frank> * repaired logic to initialize a non-existing LCDB in a system. * added method to count the number of results before they are fetched this is to avoid an overflow. * plot was creating an undesired out.json file, which has been fixed * Update README.md * Update README.md * updating lcdb run command with lcdb.builder supackage * adding TreesEnsembleWorkflow to merge RandomForest and ExtraTrees * setting ConfigSpace to 1.1.1 * updated functionality of repositories and lcdb init * updated the folder logic of LCDB to always use a .lcdb folder * Update README.md * added campaign script, updated snellius scripts * minor script fix * addressed relative path issue * temporary commit * added logic to save output file * refactor of script * adds option for filetype * developed numpy interface for results and include example notebook * moving files to snellius/abalysis * adding progress bar when query results from LCDB * passing json query to get results, adding dummy queries * updated analysis notebook * added generative result generation with progress bar and some utilities * removed duplicate function * Update README.md * Update README.md * Update README.md * Update README.md * added processors, enabled all sorts of combinations of queries also removed config parameters from LCDB object * Update README.md * updates results * incorporated config space object * adds hyperparameter importance * update runtime plot and regression * added LearningCurveExatractor * added unit tests and fixed bug in LearningCurveExtractor * adding "parameterized" as dev extra in setup * applying formatting to lcdb.analysis.util * WIP porting lcdb to deephyper 0.8.0 * update to be compatible with deephyper==0.8.0 * fixing deephyper==0.8.0 in setup * updated Installation in readme * improved the learning curve class and enabled grouping * Update README.md * added tracking of runtime and dataset size after pre-processing steps * added folder and notebook for use casess * added logic for padding of sample-wise curves and merging for iteration-curves * added notebook for OOB comparison (use case 6) * resolved bug in merge function * fixed bug in padding * added notebook for analaysis of variance * made adjustments in variance use case notebook * adjustments in use case notebooks * resolved bugs in iteration wise learning curve of trees ensembles also added support for timer injection. * fixed typo * added documentation for scheules and increased forest size to 2048 * added logging to the run * added an error message that should be thrown if the training fails. * adding logged warning when evaluation is failed because of memory limit * fixed campaign float naming issue, restructure scripts * update use case curve fitting * update use case runtime plot * refactor * Update README.md * Update README.md * added learning curve groups * Merge branch 'deephyper' of https://github.com/fmohr/lcdb.git into deephyper * fixed problems with XGBoost. * changed standard parameter value of n_estimators in ExtraTrees * first draft for ci pipeline with github action * changing working dir of CI * adding parameterized to ci install dependencies * moving install of parameterized to tox.ini * adding pytest mark for db requirement * update plot for curve fitting use case * update LCDB.debug to extract all tracebacks, error messages, and configs * Update hyperparameter_importance.py updates changes made locally * Update hyperparameter_importance.py * Update hyperparameter_importance.py final changes to experimental setup * update use case curves fitting * added PCloud Repository * using pcloud repository as the standard when initializing LCDB * enabled uploads to the pCloud repository. * added option to limit the token lifetime. * Update runtime estimation usecase * moving standardize_run_function_output to lcdb.builder.utils because removed from deephyper.evaluator * update curve fitting plot * adjusted some unit tests. * disabled coverage and running pytest directly --------- Co-authored-by: Deathn0t <[email protected]> Co-authored-by: Jan van Rijn <[email protected]> Co-authored-by: andreasparaskeva <[email protected]> Co-authored-by: Cheng Yan <[email protected]> Co-authored-by: janvanrijn <[email protected]> Co-authored-by: Tom Viering <[email protected]>
1 parent e105db0 commit 0656f89

File tree

293 files changed

+41089
-3333
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

293 files changed

+41089
-3333
lines changed

.github/workflows/ci.yml

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
name: Continuous integration
2+
3+
on:
4+
- pull_request
5+
- push
6+
7+
8+
jobs:
9+
10+
test:
11+
runs-on: ubuntu-latest
12+
strategy:
13+
matrix:
14+
python-version:
15+
- "3.12"
16+
defaults:
17+
run:
18+
working-directory: publications/2023-neurips/
19+
steps:
20+
- uses: actions/checkout@v3
21+
- name: Set up Python ${{ matrix.python-version }}
22+
uses: actions/setup-python@v3
23+
with:
24+
python-version: ${{ matrix.python-version }}
25+
- name: Install dependencies
26+
run: |
27+
pip install --upgrade pip
28+
pip install tox pylint black
29+
# - name: Run Formatter
30+
# run: black --diff --check $(git ls-files '*.py')
31+
- name: Run Linter
32+
run: pylint --exit-zero $(git ls-files '*.py')
33+
- name: Run tests with tox
34+
run: tox -e py3
35+
- name: Upload coverage report
36+
if: ${{ matrix.python-version == 3.12 }} # Only upload coverage once
37+
uses: codecov/codecov-action@v1

.gitignore

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,14 @@ publications/2023-neurips/build/
1212
.DS_Store
1313
*.egg-info/
1414
*.log
15+
*.err
1516
*.db
1617
publications/2023-neurips/lightning_logs/*
1718
publications/2023-neurips/MNIST/*
1819

1920
.codecarbon.config
21+
*.jpg
22+
*.pyc
23+
pf.txt
24+
*.xz
25+
*.gz
Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,21 @@
11
cpu.max = 4
2-
mem.max = 8000
2+
mem.max = 55000
33

44
keyfields = openmlid:int(5), learner:varchar(100), outer_seed, inner_seed_index:int(3)
55
resultfields = result:text
66

77
ignore.time = .*
88
ignore.memory = .*
99

10-
openmlid = 1485, 1590, 1515, 1457, 1475, 1468, 1486, 1489, 23512, 23517, 4541, 4534, 4538, 4134, 4135, 40978, 40996, 41027, 40981, 40982, 40983, 40984, 40701, 40670, 40685, 40900, 1111, 42732, 42733, 42734, 40498, 41161, 41162, 41163, 41164, 41165, 41166, 41167, 41168, 41169, 41142, 41143, 41144, 41145, 41146, 41147, 41150, 41156, 41157, 41158, 41159, 41138, 54, 181, 188, 1461, 1494, 1464, 12, 23, 3, 1487, 40668, 1067, 1049, 40975, 31
11-
outer_seed = 0
10+
#openmlid = 3, 6, 11, 12, 13, 23, 30, 31, 54, 55, 60, 61, 181, 188, 201, 273, 293, 299, 336, 346, 380, 446, 1042, 1049, 1067, 1083, 1084, 1085, 1086, 1087, 1088, 1128, 1130, 1134, 1138, 1139, 1142, 1146, 1161, 1216, 1233, 1235, 1236, 1441, 1448, 1450, 1457, 1461, 1464, 1465, 1468, 1475, 1477, 1479, 1483, 1485, 1486, 1487, 1488, 1489, 1494, 1499, 1503, 1509, 1515, 1566, 1567, 1575, 1590, 1591, 1592, 1597, 4134, 4135, 4137, 4534, 4538, 4541, 23512, 23517, 40498, 40664, 40668, 40670, 40672, 40677, 40685, 40687, 40701, 40713, 40900, 40910, 40971, 40975, 40978, 40981, 40982, 40983, 40984, 40994, 40996, 41027, 41142, 41143, 41144, 41145, 41146, 41150, 41156, 41157, 41158, 41159, 41161, 41163, 41164, 41165, 41166, 41167, 41168, 41169, 41228, 41540, 41972, 42720, 42732, 42733, 42734, 42742, 42769, 42809, 42810, 42844
11+
12+
# rest of openmlids , , , ,
13+
14+
# too big: 1503, 1509, 1567
15+
16+
#openmlid = 40677, 40685, 40687, 40701, 40713, 40900, 40910, 40971, 40975, 40978, 40981, 40982, 40983, 40984, 40994, 40996, 41027, 41142, 41143, 41144, 41145, 41146, 41150, 41156, 41157, 41158, 41159, 41161, 41163, 41164, 41165, 41166, 41167, 41168, 41169, 41228, 41540, 41972, 42720, 42732, 42733, 42734, 42742, 42769, 42809, 42810, 42844
17+
#,
18+
openmlid = 1509, 1567
19+
outer_seed = 0, 1, 2, 3, 4
1220
inner_seed_index = 0
1321
learner = SVC_linear, SVC_poly, SVC_rbf, SVC_sigmoid, sklearn.tree.DecisionTreeClassifier, sklearn.tree.ExtraTreeClassifier, sklearn.linear_model.LogisticRegression, sklearn.linear_model.PassiveAggressiveClassifier, sklearn.linear_model.Perceptron, sklearn.linear_model.RidgeClassifier, sklearn.linear_model.SGDClassifier, sklearn.neural_network.MLPClassifier, sklearn.discriminant_analysis.LinearDiscriminantAnalysis, sklearn.discriminant_analysis.QuadraticDiscriminantAnalysis, sklearn.naive_bayes.BernoulliNB, sklearn.naive_bayes.MultinomialNB, sklearn.neighbors.KNeighborsClassifier, sklearn.ensemble.ExtraTreesClassifier, sklearn.ensemble.RandomForestClassifier, sklearn.ensemble.GradientBoostingClassifier

publications/2023-neurips/.gitignore

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,14 @@
11
.ipynb_checkpoints
22
__pycache__
3-
.idea
3+
.idea
4+
*.json
5+
*.tar
6+
*.gz
7+
*.csv
8+
*.png
9+
*.yaml
10+
*.zip
11+
12+
# Output I/O files from PBS scheduler
13+
*.sh.e*
14+
*.sh.o*

0 commit comments

Comments
 (0)