Skip to content

Instantly generate common exploratory data plots without worrying about cleaning your DataFrame.

License

Notifications You must be signed in to change notification settings

jlehrer1/InstantEDA

Repository files navigation

Instant EDA

Instantly generate common exploratory data plots without having to worry about cleaning your DataFrame.

The code is hosted on PyPi, the Python Package Index here

It can be installed by running

pip install quickplotter==1.0

To setup the proper development environment, run

conda env create -f environment.yml
conda update pip

To run the test suite, run pytest.

1. Usage:

plotter = quickplotter.QuickPlotter(df: pd.DataFrame) #creates a QuickPlotter object with the given DataFrame

plotter.common(subset=['correlation', 'percent_nan']) #plots correlation between features, and percent nan in each column

plotter.distribution(column_subset=df.columns[0:4]) #plots distributions for the first four columns in the DataFrame

plotter.common(column_subset=['body_mass_index', 'blood_type']) #plots common plots for the given columns

Remember, this is meant to be a quick and dirty tool for exploration, and not for being delicate with each data entry. Therefore, if the number of NaN values in the DataFrame is <= 5% of the total values, the NaN rows will be dropped and the plots will be generated without them.

2. subset & diff lists

The quickplot module works mainly with two specifications, subset and diff.

For any subset-like list, the items in the list will be used. For any diff-like list, all items except those in the list will be used.

The options are as follow:

  • subset: Use only the plots specified in the list
  • diff: Use all plots except those specified in the list
  • subset_columns: Use all columns specified in the list. Can either be df.columns slicing or by name
  • diff_columns: Use all columns except those specified in the list. Can either be df.columns slicing or by name.

3. Contributing

If you have read this far I hope you've found this tool useful. I am always looking to learn more and develop as a programmer, so if you have any ideas or contributions, feel free to write a feature or pull request.

About

Instantly generate common exploratory data plots without worrying about cleaning your DataFrame.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages