Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pymultifit presubmission #221

Open
3 of 16 tasks
syedalimohsinbukhari opened this issue Dec 14, 2024 · 7 comments
Open
3 of 16 tasks

pymultifit presubmission #221

syedalimohsinbukhari opened this issue Dec 14, 2024 · 7 comments

Comments

@syedalimohsinbukhari
Copy link

syedalimohsinbukhari commented Dec 14, 2024

Submitting Author: Syed Ali Mohsin Bukhari (@syedalimohsinbukhari)
Package Name: pymultifit
One-Line Description of Package: A python library for fitting data with multiple models.
Repository Link (if existing): https://github.com/syedalimohsinbukhari/pyMultiFit
EiC: Szymon Moliński (@SimonMolinsky)


Code of Conduct & Commitment to Maintain Package

Description

  • Include a brief paragraph describing what your package does:

pymultifit is built primarily to solve one problem, to fit multiple models (and mixture models) to a given data. Be it multiple Gaussians, multiple Laplacians, or a mixture of such models, this package aims to deal with multi-model data fitting. The package also provides easy-to-use BaseDistribution and BaseFitter classes for respective user-defined functions.

Community Partnerships

We partner with communities to support peer review with an additional layer of
checks that satisfy community requirements. If your package fits into an
existing community please check below:

Scope

  • Please indicate which category or categories this package falls under:

    • Data retrieval
    • Data extraction
    • Data processing/munging
    • Data deposition
    • Data validation and testing
    • Data visualization
    • Workflow automation
    • Citation management and bibliometrics
    • Scientific software wrappers
    • Database interoperability

Domain Specific

  • Geospatial
  • Education

  • Explain how and why the package falls under these categories (briefly, 1-2 sentences). For community partnerships, check also their specific guidelines as documented in the links above. Please note any areas you are unsure of:

This library falls under the "data processing/munging" category as it takes the given data and tries to fit the given model(s) to the data via minimization processes. It also allows the user to extract the parameters for further analysis of the data fitters via helpful functions. Visualization is done internally for the fitted model with options of separable views on total data fitting and individual fits via the fitter module. On the other hand, the distribution module provides pdf, cdf, and stats functionality for any user-defined or pre-built distribution selected.

  • Who is the target audience and what are the scientific applications of this package?

Researchers, data scientists, and statisticians who work with datasets requiring multi-model fitting for robust analysis and modeling.

  • Are there other Python packages that accomplish similar things? If so, how does yours differ?

Apart from scipy, lmfit, and scikit-learn the general purpose scientific packages, there exists PyAutoFit, a Python-based probabilistic programming language built on Bayesian inference. Another notable library is Mixture-Models, which specializes in advanced optimization techniques for fitting various families of mixture models, including Gaussian mixture models and their variants. Both libraries are powerful tools for specific use cases, and I recently came to know about them during my search of existing options.

While these libraries offer robust solutions for hierarchical modeling (PyAutoFit) or a diverse array of pre-defined mixture models (Mixture-Models), pyMultiFit distinguishes itself through its simplicity of use and its focus on simplicity of use. Specifically, it is designed to provide a lightweight and user-friendly framework for fitting multi-model data, including custom mixture models (for example, gaussian + laplace + line). pymultifit also provides easy-to-use base classes that can be modified for any distribution/fitter purposes.

One of the more prominent features of pyMultiFit is the BaseFitter template class that provides custom fitting to any definable function with minimal boilerplate code. All the plotting and boundary functionalities are handled inside the template class so that the user can focus solely on running through multiple models quickly without thinking about how to manage multiple models of the same type or even of different types.

Additionally, the generators template function provides the user with an N-model data generator function with added noise capability to mimic real-life scenarios of whatever distribution the user might want.

  • Any other questions or issues we should be aware of:

P.S. Have feedback/comments about our review process? Leave a comment here

@SimonMolinsky
Copy link
Collaborator

Hi @syedalimohsinbukhari

Thanks for submitting your package! It's a great tool, but at this point, I wonder if it is within the scope of pyOpenSci. You have written that pymultifit overlaps with other packages in the ecosystem, but the differences are unclear. The package must fulfill one of those conditions:

- More open in licensing or development practices
- Broader in functionality (e.g., providing access to more data sets, providing a greater suite of functions), but not only by duplicating additional packages
- Better in usability and performance
- Actively maintained while alternatives are poorly or no longer actively maintained

You have stated that your package has better usability and performance; I've checked your package's documentation (https://pymultifit.readthedocs.io/index.html#), and you don't provide examples of how to use your package - thus, I wasn't able to compare functionalities. Could you prepare some comparisons of overlapping functionalities as the code examples? It could be a part of your README in the future. Then we can decide if pymultifit can be accepted to pyOpenSci.

@syedalimohsinbukhari
Copy link
Author

Hi @SimonMolinsky

Thank you for reaching out. I apologize for my oversight of incomplete documentation before presumission. Currently the documentation of the package is in development in the docs branch, which will be up ASAP, including proper usage examples and comparisons as well.

@SimonMolinsky
Copy link
Collaborator

@syedalimohsinbukhari

In this situation, please let me know when the docs build is ready!

@syedalimohsinbukhari
Copy link
Author

I will, and once again, thank you for your time and consideration @SimonMolinsky

@syedalimohsinbukhari
Copy link
Author

Hi @SimonMolinsky

Thank you for the wait; the documentation is now up with API references, tutorials, and benchmarks for speed and accuracy with scipy as well.

@SimonMolinsky
Copy link
Collaborator

@syedalimohsinbukhari

Thanks for the updates. Give me a few days, and I will come back with the final decision!

@syedalimohsinbukhari
Copy link
Author

@SimonMolinsky

That'd be great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: pre-submission
Status: No status
Development

No branches or pull requests

2 participants