forked from pandas-dev/pandas
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
2 changed files
with
213 additions
and
199 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,213 @@ | ||
# pandas: powerful Python data analysis toolkit | ||
|
||
 | ||
|
||
## What is it | ||
**pandas** is a Python package providing fast, flexible, and expressive data | ||
structures designed to make working with "relational" or "labeled" data both | ||
easy and intuitive. It aims to be the fundamental high-level building block for | ||
doing practical, **real world** data analysis in Python. Additionally, it has | ||
the broader goal of becoming **the most powerful and flexible open source data | ||
analysis / manipulation tool available in any language**. It is already well on | ||
its way toward this goal. | ||
|
||
## Main Features | ||
Here are just a few of the things that pandas does well: | ||
|
||
- Easy handling of [**missing data**][missing-data] (represented as | ||
`NaN`) in floating point as well as non-floating point data | ||
- Size mutability: columns can be [**inserted and | ||
deleted**][insertion-deletion] from DataFrame and higher dimensional | ||
objects | ||
- Automatic and explicit [**data alignment**][alignment]: objects can | ||
be explicitly aligned to a set of labels, or the user can simply | ||
ignore the labels and let `Series`, `DataFrame`, etc. automatically | ||
align the data for you in computations | ||
- Powerful, flexible [**group by**][groupby] functionality to perform | ||
split-apply-combine operations on data sets, for both aggregating | ||
and transforming data | ||
- Make it [**easy to convert**][conversion] ragged, | ||
differently-indexed data in other Python and NumPy data structures | ||
into DataFrame objects | ||
- Intelligent label-based [**slicing**][slicing], [**fancy | ||
indexing**][fancy-indexing], and [**subsetting**][subsetting] of | ||
large data sets | ||
- Intuitive [**merging**][merging] and [**joining**][joining] data | ||
sets | ||
- Flexible [**reshaping**][reshape] and [**pivoting**][pivot-table] of | ||
data sets | ||
- [**Hierarchical**][mi] labeling of axes (possible to have multiple | ||
labels per tick) | ||
- Robust IO tools for loading data from [**flat files**][flat-files] | ||
(CSV and delimited), [**Excel files**][excel], [**databases**][db], | ||
and saving/loading data from the ultrafast [**HDF5 format**][hdfstore] | ||
- [**Time series**][timeseries]-specific functionality: date range | ||
generation and frequency conversion, moving window statistics, | ||
moving window linear regressions, date shifting and lagging, etc. | ||
|
||
|
||
[missing-data]: http://pandas.pydata.org/pandas-docs/stable/missing_data.html#working-with-missing-data | ||
[insertion-deletion]: http://pandas.pydata.org/pandas-docs/stable/dsintro.html#column-selection-addition-deletion | ||
[alignment]: http://pandas.pydata.org/pandas-docs/stable/dsintro.html?highlight=alignment#intro-to-data-structures | ||
[groupby]: http://pandas.pydata.org/pandas-docs/stable/groupby.html#group-by-split-apply-combine | ||
[conversion]: http://pandas.pydata.org/pandas-docs/stable/dsintro.html#dataframe | ||
[slicing]: http://pandas.pydata.org/pandas-docs/stable/indexing.html#slicing-ranges | ||
[fancy-indexing]: http://pandas.pydata.org/pandas-docs/stable/indexing.html#advanced-indexing-with-ix | ||
[subsetting]: http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing | ||
[merging]: http://pandas.pydata.org/pandas-docs/stable/merging.html#database-style-dataframe-joining-merging | ||
[joining]: http://pandas.pydata.org/pandas-docs/stable/merging.html#joining-on-index | ||
[reshape]: http://pandas.pydata.org/pandas-docs/stable/reshaping.html#reshaping-and-pivot-tables | ||
[pivot-table]: http://pandas.pydata.org/pandas-docs/stable/reshaping.html#pivot-tables-and-cross-tabulations | ||
[mi]: http://pandas.pydata.org/pandas-docs/stable/indexing.html#hierarchical-indexing-multiindex | ||
[flat-files]: http://pandas.pydata.org/pandas-docs/stable/io.html#csv-text-files | ||
[excel]: http://pandas.pydata.org/pandas-docs/stable/io.html#excel-files | ||
[db]: http://pandas.pydata.org/pandas-docs/stable/io.html#sql-queries | ||
[hdfstore]: http://pandas.pydata.org/pandas-docs/stable/io.html#hdf5-pytables | ||
[timeseries]: http://pandas.pydata.org/pandas-docs/stable/timeseries.html#time-series-date-functionality | ||
|
||
## Where to get it | ||
The source code is currently hosted on GitHub at: | ||
http://github.com/pydata/pandas | ||
|
||
Binary installers for the latest released version are available at the Python | ||
package index | ||
|
||
http://pypi.python.org/pypi/pandas/ | ||
|
||
And via `easy_install`: | ||
|
||
```sh | ||
easy_install pandas | ||
``` | ||
|
||
or `pip`: | ||
|
||
```sh | ||
pip install pandas | ||
``` | ||
|
||
## Dependencies | ||
- [NumPy](http://www.numpy.org): 1.6.1 or higher | ||
- [python-dateutil](http://labix.org/python-dateutil): 1.5 or higher | ||
- [pytz](http://pytz.sourceforge.net) | ||
- Needed for time zone support with ``pandas.date_range`` | ||
|
||
### Highly Recommended Dependencies | ||
- [numexpr](http://code.google.com/p/numexpr/) | ||
- Needed to accelerate some expression evaluation operations | ||
- Required by PyTables | ||
- [bottleneck](http://berkeleyanalytics.com/bottleneck) | ||
- Needed to accelerate certain numerical operations | ||
|
||
### Optional dependencies | ||
- [Cython](http://www.cython.org): Only necessary to build development version. Version 0.17.1 or higher. | ||
- [SciPy](http://www.scipy.org): miscellaneous statistical functions | ||
- [PyTables](http://www.pytables.org): necessary for HDF5-based storage | ||
- [matplotlib](http://matplotlib.sourceforge.net/): for plotting | ||
- [statsmodels](http://statsmodels.sourceforge.net/) | ||
- Needed for parts of `pandas.stats` | ||
- [openpyxl](http://packages.python.org/openpyxl/), [xlrd/xlwt](http://www.python-excel.org/) | ||
- openpyxl version 1.6.1 or higher, for writing .xlsx files | ||
- xlrd >= 0.9.0 | ||
- Needed for Excel I/O | ||
- [boto](https://pypi.python.org/pypi/boto): necessary for Amazon S3 access. | ||
- One of the following combinations of libraries is needed to use the | ||
top-level [`pandas.read_html`][read-html-docs] function: | ||
- [BeautifulSoup4][BeautifulSoup4] and [html5lib][html5lib] (Any | ||
recent version of [html5lib][html5lib] is okay.) | ||
- [BeautifulSoup4][BeautifulSoup4] and [lxml][lxml] | ||
- [BeautifulSoup4][BeautifulSoup4] and [html5lib][html5lib] and [lxml][lxml] | ||
- Only [lxml][lxml], although see [HTML reading gotchas][html-gotchas] | ||
for reasons as to why you should probably **not** take this approach. | ||
|
||
#### Notes about HTML parsing libraries | ||
- If you install [BeautifulSoup4][BeautifulSoup4] you must install | ||
either [lxml][lxml] or [html5lib][html5lib] or both. | ||
`pandas.read_html` will **not** work with *only* `BeautifulSoup4` | ||
installed. | ||
- You are strongly encouraged to read [HTML reading | ||
gotchas][html-gotchas]. It explains issues surrounding the | ||
installation and usage of the above three libraries. | ||
- You may need to install an older version of | ||
[BeautifulSoup4][BeautifulSoup4]: | ||
- Versions 4.2.1, 4.1.3 and 4.0.2 have been confirmed for 64 and | ||
32-bit Ubuntu/Debian | ||
- Additionally, if you're using [Anaconda][Anaconda] you should | ||
definitely read [the gotchas about HTML parsing][html-gotchas] | ||
libraries | ||
- If you're on a system with `apt-get` you can do | ||
|
||
```sh | ||
sudo apt-get build-dep python-lxml | ||
``` | ||
|
||
to get the necessary dependencies for installation of [lxml][lxml]. | ||
This will prevent further headaches down the line. | ||
|
||
[html5lib]: https://github.com/html5lib/html5lib-python "html5lib" | ||
[BeautifulSoup4]: http://www.crummy.com/software/BeautifulSoup "BeautifulSoup4" | ||
[lxml]: http://lxml.de | ||
[Anaconda]: https://store.continuum.io/cshop/anaconda | ||
[NumPy]: http://numpy.scipy.org/ | ||
[html-gotchas]: http://pandas.pydata.org/pandas-docs/stable/gotchas.html#html-table-parsing | ||
[read-html-docs]: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.html.read_html.html#pandas.io.html.read_html | ||
|
||
## Installation from sources | ||
To install pandas from source you need Cython in addition to the normal | ||
dependencies above. Cython can be installed from pypi: | ||
|
||
```sh | ||
pip install cython | ||
``` | ||
|
||
In the `pandas` directory (same one where you found this file after | ||
cloning the git repo), execute: | ||
|
||
```sh | ||
python setup.py install | ||
``` | ||
|
||
or for installing in [development mode](http://www.pip-installer.org/en/latest/usage.html): | ||
|
||
```sh | ||
python setup.py develop | ||
``` | ||
|
||
Alternatively, you can use `pip` if you want all the dependencies pulled | ||
in automatically (the `-e` option is for installing it in [development | ||
mode](http://www.pip-installer.org/en/latest/usage.html)): | ||
|
||
```sh | ||
pip install -e . | ||
``` | ||
|
||
On Windows, you will need to install MinGW and execute: | ||
|
||
```sh | ||
python setup.py build --compiler=mingw32 | ||
python setup.py install | ||
``` | ||
|
||
See http://pandas.pydata.org/ for more information. | ||
|
||
## License | ||
BSD | ||
|
||
## Documentation | ||
The official documentation is hosted on PyData.org: http://pandas.pydata.org/ | ||
|
||
The Sphinx documentation should provide a good starting point for learning how | ||
to use the library. Expect the docs to continue to expand as time goes on. | ||
|
||
## Background | ||
Work on ``pandas`` started at AQR (a quantitative hedge fund) in 2008 and | ||
has been under active development since then. | ||
|
||
## Discussion and Development | ||
Since pandas development is related to a number of other scientific | ||
Python projects, questions are welcome on the scipy-user mailing | ||
list. Specialized discussions or design issues should take place on | ||
the pystatsmodels mailing list / Google group, where | ||
``scikits.statsmodels`` and other libraries will also be discussed: | ||
|
||
http://groups.google.com/group/pystatsmodels |
Oops, something went wrong.