Hic sunt leones
Latin phrase reported on many maps indicating Terra incognita, unexplored or harsh land.
Dataframes in Clojure. Through pandas. On Python.
This is very alpha, things will change fast, will break and the API is neither complete, nor settled. Since a few people have started playing with this there's a Clojars project available. Please give feedback if you're using this, every kind of contribution is appreciated (for more info check the Contributing section). At the moment everything is mostly undocumented and untested, I'm currently adding them.
Panthera uses the great libpython-clj as a backend to access Python and get pandas and numpy functionality.
If you usually don't develop in Python then a system level install might be a good solution (though always discouraged), if this is your case then follow the subsequent steps.
To get started you need python, pandas and numpy (the latter comes with the former) on your path. Usually a:
sudo apt install libpython3.6-dev
pip3 install numpy pandas xlrd # the latter is for Excel files, if you don't care you can do without
If you want to have different Python environments, then getting panthera to work correctly is a bit more tricky.
First create your new environment with at least python=3.6, numpy and pandas. (This was tested both on GNU/Linux and WSL with conda, but there's no reason why it shouldn't work with other env management tools. On other systems, Docker is your best bet):
conda create -n panthera python=3.6 numpy pandas
Then check the path to the newly created environment:
conda activate panthera
which python
Now you just have to add to one of your profiles the path to the wanted python executable:
{:dev {:resource-paths ["/home/user/miniconda3/envs/panthera"]}}
You can create different profiles with different paths according to what you need. Now if you want to make it possible to work with panthera without having to activate your environments you have 2 choices:
- assign
PYTHONHOME
env variable to your environment
PYTHONHOME="/home/user/miniconda3/envs/panthera" lein whatever
- assign
PYTHONHOME
env variable before requiring panthera
(System/setProperty "PYTONHOME" "/home/user/miniconda3/envs/panthera")
After this you can start playing around with panthera
(require '[panthera.panthera :as pt])
(-> (pt/read-csv "mycsv.csv")
(pt/subset-cols "Col1" "Col2" "Col3")
pt/median)
The above chain will read your csv file as a DataFrame, select only the given columns and then return a Series with the median of each column.
panthera.panthera
is the home of the main API, and you can find everything there. The advice is to never :use
or :refer :all
the namespace because there are some functions named as core Clojure functions such as mod
which in this case does the same thing as the core one, but in this case it is vectorized and it works only if the first argument is a Python object.
All of Numpy is wrapped and accessible through a single interface from panthera.numpy
.
(require '[panthera.numpy :refer [npy doc]])
(npy :power {:args [[1 2 3] 3]})
;=> [1 8 27]
(npy :power)
; This arity returns the actual numpy object that can be passed around to other functions as an argument
To access functions inside submodules pass to npy
a sequence of keys leading to the wanted function:
(npy [:linalg :svd] {:args [[1 2 3] [4 5 6]]})
You can check the original docstring for every module and function with the doc
helper
(doc :power)
(doc [:linalg :eigh])
To see what is available and how everything works check the official docs online.
Please let me know about any issues, quirks, ideas or even just to say that you're doing something cool with this! I accept issues, PRs or direct messages (you can find me also on https://clojurians.slack.com and on https://clojurians.zulipchat.com).
You can find some examples in the examples folder. At the moment that's the best way to start with panthera.
- panthera intro (nbviewer)
- basic concepts (serieses & data-frames) (nbviewer)
- general Python package wrapper - an example about how to use panthera to wrap other Python libraries
Pandas is derived from "panel data" and somehow is supposed to mean "Python data analysis library" as well. Though it shouldn't have nothing to do with the cute Chinese bears, there are logos showing a bear.
Panthera doesn't pretend to be a clever wordplay because it doesn't need to. First off panthera is latin and it literally means "large cat", second though pandas are surely cute, pantherae are way cooler (and snow leopards also happen to be among the very few predators of pandas, but that's just a case...).
Copyright © 2019 Alan Marazzi
This program and the accompanying materials are made available under the terms of the Eclipse Public License 2.0 which is available at http://www.eclipse.org/legal/epl-2.0.