A visualization tool designed to help data scientists better examine their data sets.
pip install dshelperimport dshelper
dshelper.dshelp(df)- ✅ Default view with raw data and its statistics info
- ✅ Drag on the header to re-arrange columns
- ✅ Left click on the right panel to show/hide columns
- ✅ Plots: histogram, heatmap, correlation, scatter, box, violin, pair
- ✅ Bottom right buttons to hide panels and focus on data set
- ✅ Easy to see memory usage and logs in bottom status bar
- ✅ Easy to use in command line, jupyter notebook and docker
- ✅ Histogram
- ✅ Heatmap
- ✅ Correlation
- ✅ Scatter Plot
- ✅ Box Plot
- ✅ Violin Plot
- ✅ Pair plot
The default view, main panel displays the dataset. The bottom panel displays the statistics of the dataset The right panel has two tabs, the first one displays the stats for all the columns, the second one displays the system logs.
The bottom and right panels can be hidden by clicking the buttons located on the bottom right of the window. This will allow data scientists to focus on the dataset and plots
You can also drag and drop to re-arrange the column orders, click on the right column tab to hide columns in the main view.
And below are a few plots:
- wxpython
- matplotlib
- seaborn
- pandas
- numpy
- scikit-learn
- scipy
- statsmodels
git clone git@github.com:zmcddn/Data-Science-Helper.gitconda create -n py36 python=3.6or use virtualenv or pipenvactivate py36(windows) orsource activate py36(mac, linux)conda install --yes --file requirements.txtorpip install -r requirements.txt- In case the
PyPubSubis not installed with conda, you can dopip install PyPubSub cd dshelperpython main_gui.py(windows, linux) orpythonw main_gui.py(mac)
For help with any dataframe, you can follow the following steps:
import dshelperdshelper.dshelp(df)
- For running in Jupyter Notebook you need to add
%gui wxat the top of the file for the GUI to display properly
make buildto build the projectmake runlinuxto run in Linux- WIP for mac
- next version
- Sort by columns
- Import file (csv, excl)
- Add menu
- export file
- ability to change cells
- standalone version
- next big version
- correlation analysis
- feature importance
- support large file (sampling)
- next next big version
- Support for multiple index
- Time series analysis
- Optimization
If you like this project, please distribute it and star it for more people to see. Any suggestions and contributions are very welcomed.
ALL RIGHTS RESERVED