Description
Submitting Author: Syed Ali Mohsin Bukhari (@syedalimohsinbukhari)
Package Name: pymultifit
One-Line Description of Package: A python library for fitting data with multiple models.
Repository Link (if existing): https://github.com/syedalimohsinbukhari/pyMultiFit
EiC: Szymon Moliński (@SimonMolinsky)
Code of Conduct & Commitment to Maintain Package
- I agree to abide by pyOpenSci's Code of Conduct during the review process and in maintaining my package after should it be accepted.
- I have read and will commit to package maintenance after the review as per the pyOpenSci Policies Guidelines.
Description
- Include a brief paragraph describing what your package does:
pymultifit
is built primarily to solve one problem, to fit multiple models (and mixture models) to a given data. Be it multiple Gaussians, multiple Laplacians, or a mixture of such models, this package aims to deal with multi-model data fitting. The package also provides easy-to-use BaseDistribution
and BaseFitter
classes for respective user-defined functions.
Community Partnerships
We partner with communities to support peer review with an additional layer of
checks that satisfy community requirements. If your package fits into an
existing community please check below:
- Astropy: My package adheres to Astropy community standards
- Pangeo: My package adheres to the Pangeo standards listed in the pyOpenSci peer review guidebook
Scope
-
Please indicate which category or categories this package falls under:
- Data retrieval
- Data extraction
- Data processing/munging
- Data deposition
- Data validation and testing
- Data visualization
- Workflow automation
- Citation management and bibliometrics
- Scientific software wrappers
- Database interoperability
Domain Specific
- Geospatial
- Education
- Explain how and why the package falls under these categories (briefly, 1-2 sentences). For community partnerships, check also their specific guidelines as documented in the links above. Please note any areas you are unsure of:
This library falls under the "data processing/munging" category as it takes the given data and tries to fit the given model(s) to the data via minimization processes. It also allows the user to extract the parameters for further analysis of the data fitters via helpful functions. Visualization is done internally for the fitted model with options of separable views on total data fitting and individual fits via the fitter
module. On the other hand, the distribution
module provides pdf
, cdf
, and stats
functionality for any user-defined or pre-built distribution selected.
- Who is the target audience and what are the scientific applications of this package?
Researchers, data scientists, and statisticians who work with datasets requiring multi-model fitting for robust analysis and modeling.
- Are there other Python packages that accomplish similar things? If so, how does yours differ?
Apart from scipy
, lmfit
, and scikit-learn
the general purpose scientific packages, there exists PyAutoFit, a Python-based probabilistic programming language built on Bayesian inference. Another notable library is Mixture-Models, which specializes in advanced optimization techniques for fitting various families of mixture models, including Gaussian mixture models and their variants. Both libraries are powerful tools for specific use cases, and I recently came to know about them during my search of existing options.
While these libraries offer robust solutions for hierarchical modeling (PyAutoFit
) or a diverse array of pre-defined mixture models (Mixture-Models
), pyMultiFit
distinguishes itself through its simplicity of use and its focus on simplicity of use. Specifically, it is designed to provide a lightweight and user-friendly framework for fitting multi-model data, including custom mixture models (for example, gaussian
+ laplace
+ line
). pymultifit
also provides easy-to-use base classes that can be modified for any distribution/fitter purposes.
One of the more prominent features of pyMultiFit
is the BaseFitter
template class that provides custom fitting to any definable function with minimal boilerplate code. All the plotting and boundary functionalities are handled inside the template class so that the user can focus solely on running through multiple models quickly without thinking about how to manage multiple models of the same type or even of different types.
Additionally, the generators template function provides the user with an N-model data generator function with added noise capability to mimic real-life scenarios of whatever distribution the user might want.
- Any other questions or issues we should be aware of:
P.S. Have feedback/comments about our review process? Leave a comment here
Metadata
Metadata
Assignees
Type
Projects
Status
Status