Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List of tabular models update #1379

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

dmitryglhf
Copy link
Collaborator

@dmitryglhf dmitryglhf commented Mar 17, 2025

This is a 🙋 feature or enhancement.

Summary

todo's:

  • actualize the set of candidate models
  • update docs for the list of available models
  • test on benchmark with enabled composer (1 fold 1h8c) for updated model candidates
  • adaptation of tests for the updated candidates
  • benchmarking with limited operations num: {'resample', 'scaling', 'pca', 'normalization', 'poly_features'}
  • check optimization history
  • check benchmark with models: {'lgbm', 'xgboost', 'catboost', 'rf', 'logit', 'ridge', 'knn', 'treg', 'linear', 'lasso', 'adareg'}

done:

  • new set of tabular candidate models: {'lgbm', 'xgboost', 'catboost', 'rf', 'logit', 'ridge', 'knn', 'treg'}

Context

Closes #1339

Sorry, something went wrong.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
@dmitryglhf dmitryglhf self-assigned this Mar 17, 2025
@dmitryglhf dmitryglhf added enhancement New feature or request in progress task in progress labels Mar 17, 2025
Copy link
Contributor

github-actions bot commented Mar 17, 2025

All PEP8 errors has been fixed, thanks ❤️

Comment last updated at Fri, 28 Mar 2025 18:28:37

@nicl-nno
Copy link
Collaborator

try to implement operations aggregations

Думаю это лучше отдельным PR-ом

Copy link

codecov bot commented Mar 19, 2025

Codecov Report

Attention: Patch coverage is 60.00000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 80.21%. Comparing base (eae485e) to head (e5ceb44).

Files with missing lines Patch % Lines
...ot/core/composer/gp_composer/specific_operators.py 50.00% 2 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1379   +/-   ##
=======================================
  Coverage   80.20%   80.21%           
=======================================
  Files         146      146           
  Lines       10597    10597           
=======================================
+ Hits         8499     8500    +1     
+ Misses       2098     2097    -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dmitryglhf
Copy link
Collaborator Author

dmitryglhf commented Mar 22, 2025

Results on benchmark with 1 hour 8c and 1 fold:

Metric (mean) Master Models update PR#1380-Operations update
0 auc 0.901339 0.901033 0.898024
1 acc 0.86848 0.865966 0.863784
2 balacc 0.854555 0.852153 0.8357
3 logloss 0.339388 0.340543 0.339235
4 training_duration 2101.31 1810.22 1930.7

Метрика немного упала.

Dataset Version Pipeline Accuracy AUC Balanced Accuracy Log Loss
Australian Master logit, fast_ica, logit, pca, scaling, xgboost, lgbm, catboost, bernb 0.855072 0.941426 0.859508 0.449198
  PR-Models rf, scaling, poly_features, scaling, normalization 0.869565 0.944822 0.872666 0.29653
  PR-Operations+Models logit, xgboost, scaling, knn, knn, xgboost, knn 0.855072 0.94652 0.859508 0.331309
Blood Transfusion Service Center Master mlp, normalization, poly_features, normalization, pca, fast_ica, normalization 0.76 0.759747 0.595029 0.486645
  PR-Models logit, poly_features, scaling, resample 0.773333 0.774366 0.69883 0.59641
  PR-Operations+Models logit, logit, poly_features, normalization 0.746667 0.757797 0.52924 0.489928
car Master rf 0.803468   0.865695 0.386452
  PR-Models catboost, logit 0.774566   0.750098 0.40911
  PR-Operations+Models rf 0.797688   0.743699 0.387637
christine Master catboost, scaling 0.750923 0.83595 0.750923 0.505129
  PR-Models catboost, scaling 0.750923 0.83595 0.750923 0.505129
  PR-Operations+Models catboost, scaling 0.750923 0.83595 0.750923 0.505129
cnae-9 Master logit, scaling 0.962963   0.962963 0.141408
  PR-Models logit 0.962963   0.962963 0.138759
  PR-Operations+Models catboost 0.935185   0.935185 0.191485
credit-g Master logit, poly_features, scaling, isolation_forest_class 0.82 0.850952 0.77619 0.449925
  PR-Models logit, poly_features, isolation_forest_class, scaling, poly_features 0.81 0.82619 0.769048 0.465911
  PR-Operations+Models logit, pca, scaling, poly_features, scaling 0.81 0.822381 0.75 0.469132
fabert Master logit 0.697816   0.66528 0.834174
  PR-Models logit 0.697816   0.66528 0.834174
  PR-Operations+Models logit 0.697816   0.66528 0.834174
jasmine Master rf, bernb, poly_features, isolation_forest_class 0.80602 0.874116 0.805638 0.402396
  PR-Models logit, rf, normalization, fast_ica, isolation_forest_class, xgboost 0.819398 0.880761 0.81906 0.454436
  PR-Operations+Models rf, normalization 0.826087 0.873333 0.825749 0.397765
kr-vs-kp Master logit, xgboost, scaling, catboost, lgbm, dt 0.99375 0.999961 0.993738 0.0173921
  PR-Models logit, catboost, scaling, fast_ica, xgboost, lgbm 1.0 1.0 1.0 0.00739224
  PR-Operations+Models logit, catboost, scaling, lgbm, xgboost, scaling 0.99375 0.999961 0.993464 0.0151288
phoneme Master mlp, logit, normalization, rf, scaling, poly_features 0.907579 0.964898 0.886825 0.409138
  PR-Models rf, poly_features 0.903882 0.963878 0.8787 0.227145
  PR-Operations+Models logit, catboost, poly_features, resample, poly_features, rf, rf, scaling, poly_features 0.911275 0.967796 0.876593 0.29887
segment Master catboost 0.991342   0.991342 0.0691071
  PR-Models logit, catboost, rf, scaling, lgbm 0.982684   0.982684 0.0707873
  PR-Operations+Models logit, knn, resample, logit, scaling, rf 0.978355   0.978355 0.0878775
vehicle Master mlp, scaling, fast_ica, normalization, resample 0.894118   0.895022 0.279058
  PR-Models logit, scaling, normalization, poly_features, scaling 0.858824   0.86039 0.408728
  PR-Operations+Models logit, poly_features, scaling, resample 0.870588   0.872294 0.415623

Судя по всему, на метрике сказался не обновленный список моделей, а то, что композер чаще выбирал операции с данными.

@nicl-nno
Copy link
Collaborator

А сохранилась история оптимизации для credit-g?

@nicl-nno
Copy link
Collaborator

nicl-nno commented Mar 22, 2025

И для car тоже интересно, конечно. Вроде rf же есть в начальных приближениях?

Но с правками этого PR действительно не связано.

@dmitryglhf
Copy link
Collaborator Author

А сохранилась история оптимизации для credit-g?

Не могу найти где посмотреть историю оптимизации в бенчмарке, вижу только обычные логи и мета-данные. В целом, я зафиксировал сид экспериментов, попробую повторить его локально.

И для car тоже интересно, конечно. Вроде rf же есть в начальных приближениях?

Да интересно тут получилось, при этом и rf и catboost есть в начальных приближениях отдельно. Тоже посмотрю.

@nicl-nno
Copy link
Collaborator

Не могу найти где посмотреть историю оптимизации в бенчмарке

Там вроде можно сохранять произвольные артефакты дополнительно к обязательным.

This reverts commit e81c823.
…f-tabular-models-update
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request in progress task in progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

enh: Rework the list of tabular models and operations
2 participants