Skip to content

pymultifit presubmission #221

Closed
Closed
@syedalimohsinbukhari

Description

@syedalimohsinbukhari

Submitting Author: Syed Ali Mohsin Bukhari (@syedalimohsinbukhari)
Package Name: pymultifit
One-Line Description of Package: A python library for fitting data with multiple models.
Repository Link (if existing): https://github.com/syedalimohsinbukhari/pyMultiFit
EiC: Szymon Moliński (@SimonMolinsky)


Code of Conduct & Commitment to Maintain Package

Description

  • Include a brief paragraph describing what your package does:

pymultifit is built primarily to solve one problem, to fit multiple models (and mixture models) to a given data. Be it multiple Gaussians, multiple Laplacians, or a mixture of such models, this package aims to deal with multi-model data fitting. The package also provides easy-to-use BaseDistribution and BaseFitter classes for respective user-defined functions.

Community Partnerships

We partner with communities to support peer review with an additional layer of
checks that satisfy community requirements. If your package fits into an
existing community please check below:

Scope

  • Please indicate which category or categories this package falls under:

    • Data retrieval
    • Data extraction
    • Data processing/munging
    • Data deposition
    • Data validation and testing
    • Data visualization
    • Workflow automation
    • Citation management and bibliometrics
    • Scientific software wrappers
    • Database interoperability

Domain Specific

  • Geospatial
  • Education

  • Explain how and why the package falls under these categories (briefly, 1-2 sentences). For community partnerships, check also their specific guidelines as documented in the links above. Please note any areas you are unsure of:

This library falls under the "data processing/munging" category as it takes the given data and tries to fit the given model(s) to the data via minimization processes. It also allows the user to extract the parameters for further analysis of the data fitters via helpful functions. Visualization is done internally for the fitted model with options of separable views on total data fitting and individual fits via the fitter module. On the other hand, the distribution module provides pdf, cdf, and stats functionality for any user-defined or pre-built distribution selected.

  • Who is the target audience and what are the scientific applications of this package?

Researchers, data scientists, and statisticians who work with datasets requiring multi-model fitting for robust analysis and modeling.

  • Are there other Python packages that accomplish similar things? If so, how does yours differ?

Apart from scipy, lmfit, and scikit-learn the general purpose scientific packages, there exists PyAutoFit, a Python-based probabilistic programming language built on Bayesian inference. Another notable library is Mixture-Models, which specializes in advanced optimization techniques for fitting various families of mixture models, including Gaussian mixture models and their variants. Both libraries are powerful tools for specific use cases, and I recently came to know about them during my search of existing options.

While these libraries offer robust solutions for hierarchical modeling (PyAutoFit) or a diverse array of pre-defined mixture models (Mixture-Models), pyMultiFit distinguishes itself through its simplicity of use and its focus on simplicity of use. Specifically, it is designed to provide a lightweight and user-friendly framework for fitting multi-model data, including custom mixture models (for example, gaussian + laplace + line). pymultifit also provides easy-to-use base classes that can be modified for any distribution/fitter purposes.

One of the more prominent features of pyMultiFit is the BaseFitter template class that provides custom fitting to any definable function with minimal boilerplate code. All the plotting and boundary functionalities are handled inside the template class so that the user can focus solely on running through multiple models quickly without thinking about how to manage multiple models of the same type or even of different types.

Additionally, the generators template function provides the user with an N-model data generator function with added noise capability to mimic real-life scenarios of whatever distribution the user might want.

  • Any other questions or issues we should be aware of:

P.S. Have feedback/comments about our review process? Leave a comment here

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    pre-submission

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions