Want to learn useful nonlinear representations of your tabular data? Don't have time to mess with autoencoders? This library aims to simplify your life.
Currently under development.
We highly recommend using a virtual environment to install! This software has only been tested using python 3.6.
The bare-bones requirements are installed automatically by pip. You may also want to install jupyter and matplotlib to run notebooks and the ipynb logger, but these are not requirements to install.
Install using:
pip install dfencoder
Or, you can get the latest version by cloning this repository and installing from the home directory:
pip install .
Thorough documntation is still being written, but the demo notebook is available to show some of the features of this library.
The adult.csv
dataset is used in the testing script. Make sure the file (found in the root of this repo) is in the same directory as test.py
when you run the script.
Contributors are welcomed! Please reach out with PRs.
We'd like to release a stable version soon, so in the meantime please submit feature requests and bug reports on this repository's issues page.
dfencoder
does some manipulation to encode features to feed into the
feed-forward MLP. This HLD hopefully clears up how this looks.
This library is a personal project so progress is slow. The latest release as of this writing is v0.0.37
which introduces "inference mode"
that optimizes inference for single records, on json inputs.
v0.0.36
which introduces handling for timestamp data; will use cyclical encoding to encode time of day, day of week, day of month, day of year, as well as the raw timestamp scaled as a numeric feature to encode raw linear time.
Pre-process your timestamp columns by using pandas: pd.to_datetime()
so dfencoder
can infer the datatype and handle it accordingly.