Skip to content

Latest commit

 

History

History
113 lines (81 loc) · 7.05 KB

CONTRIBUTING.md

File metadata and controls

113 lines (81 loc) · 7.05 KB

How to contribute to Pandas-Profiling

Pandas-profiling aims to ease exploratory data analysis for structured datasets. Our focus is to provide users with useful and robust statistics for such datasets encountered in industry, academia and elsewhere. Pandas-profiling is open-source and stimulates contributions from passionate community users.

Themes to contribute

In line with our aim, we identify the following themes:

  • Exploratory data analysis: The core of the package is a dataset summarization by its main characteristics, which is complemented with warnings on data issues and visualisations.

    Suggestions for contribution: Extend the support of more data types (think of paths, location or GPS coordinates and ordinal data types), text data (e.g. encoding, vocabulary size, spelling errors, language detection), time series analysis, or even images (e.g. dimensions, EXIF).

    Related: #7, #129, #190, #204 or create one.

  • Stability, Performance and Restricted environment compatibility: Data exploration takes place in all kinds of conditions, on the latest machine learning platforms with enormous dataset to managed environments in large corporations. pandas-profiling helps analysts, researchers and engineers alike in these cases. We do this by fixing bugs, improving performance on big datasets and adding environment compatibility.

    Suggestions for contribution (Performance): Perform concurrency analysis or profile execution times and leverage the gained insights for improved performance (e.g. multiprocessing, cython, numba) or test the performance of pandas-profiling with big data sets and corresponding commonly used data formats (such as parquet).

    Suggestions for contribution (Stability): Either review the code and add tests or watch the issues page and Stackoverflow tag to find current issues.

    Related: #98, #122 or create one.

  • Interaction, presentation and user experience: As pandas-profiling eases exploratory data analysis, working with the package should reflect that. Interaction and user experience plays a central role in working with the package. Working on interactive and static features is possible through the modular nature of the package: the user can configure which features to use.

    Suggestions for contribution (interactivity): Interactivity allows for more user friendly applications, including but not limited to on demand analysis (don't compute what you don't want to see) and interactive histograms and correlations. This is ideal for smaller datasets, where we can compute this on-the-fly. ipywidgets would be a great place to start (e.g. widget based view).

    Suggestions for contribution (presentation): Other forms of distribution than HTML (for example PDF or packaged as an GUI application via PyQt) Users should be able to share reports (improve size of labels in graph, add explanations to correlation matrices and allow for styling/branding).

    Related: #161, #175, #191 or create one.

  • Community: The success of this package demonstrates the power of sharing and working together. You are welcome as part of this community.

    Suggestions for contribution: Share with us if this package is of value to you, let us know at this address. We are interested in how you use pandas-profiling in your work. Furthermore, we are always looking for contributions to the documentation, issue templates and discussions. Advocate, ambassador, share

    Related: #87 or create one.

  • Machine learning: pandas-profiling is not a machine learning package, even though many of our users use EDA as a step prior to developing their models. Our focus lies in the exploratory data analysis. Any functionality that enables machine learning applications by more effective data profiling, is welcome. Future work might include an extension to pandas-profiling, specific for profiling of target variables and machine learning predictions.

    Related: #124, #173, #198 or create one.

Support

Maintaining the package takes up quite some time, which is all donated voluntarily. We are driven to provide new highly-requested features in the area of machine learning. Unfortunately, we do not have capacity to actively develop new features. If you are willing to support us in this (industry partner or sponsorship), drop us a line. Another low-threshold way to support is to connect your company's logo to the package (e.g. let us place it in an "Used at" section, we know from contributions that people at IBM, Microsoft and various types of companies use pandas-profiling).

Did you find a bug?

  • Ensure the bug was not already reported by searching on Github under Issues.

  • If you're unable to find an open issue addressing the problem, open a new one. If possible, use the relevant bug report templates to create the issue.

Did you write a patch that fixes a bug?

  • Open a new Github pull request with the patch.

  • Ensure the PR description clearly describes the problem and solution. Include the relevant issue number if applicable.

Acknowledgements

We would like to thank everyone who has helped getting us to where we are now.

See the Contributor Graph