Skip to content

Commit

Permalink
CLN: move README.rst to markdown
Browse files Browse the repository at this point in the history
  • Loading branch information
cpcloud committed Sep 19, 2013
1 parent ef5ca56 commit 0bc568f
Show file tree
Hide file tree
Showing 2 changed files with 213 additions and 199 deletions.
213 changes: 213 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,213 @@
# pandas: powerful Python data analysis toolkit

![Travis-CI Build Status](https://travis-ci.org/pydata/pandas.png)

## What is it
**pandas** is a Python package providing fast, flexible, and expressive data
structures designed to make working with "relational" or "labeled" data both
easy and intuitive. It aims to be the fundamental high-level building block for
doing practical, **real world** data analysis in Python. Additionally, it has
the broader goal of becoming **the most powerful and flexible open source data
analysis / manipulation tool available in any language**. It is already well on
its way toward this goal.

## Main Features
Here are just a few of the things that pandas does well:

- Easy handling of [**missing data**][missing-data] (represented as
`NaN`) in floating point as well as non-floating point data
- Size mutability: columns can be [**inserted and
deleted**][insertion-deletion] from DataFrame and higher dimensional
objects
- Automatic and explicit [**data alignment**][alignment]: objects can
be explicitly aligned to a set of labels, or the user can simply
ignore the labels and let `Series`, `DataFrame`, etc. automatically
align the data for you in computations
- Powerful, flexible [**group by**][groupby] functionality to perform
split-apply-combine operations on data sets, for both aggregating
and transforming data
- Make it [**easy to convert**][conversion] ragged,
differently-indexed data in other Python and NumPy data structures
into DataFrame objects
- Intelligent label-based [**slicing**][slicing], [**fancy
indexing**][fancy-indexing], and [**subsetting**][subsetting] of
large data sets
- Intuitive [**merging**][merging] and [**joining**][joining] data
sets
- Flexible [**reshaping**][reshape] and [**pivoting**][pivot-table] of
data sets
- [**Hierarchical**][mi] labeling of axes (possible to have multiple
labels per tick)
- Robust IO tools for loading data from [**flat files**][flat-files]
(CSV and delimited), [**Excel files**][excel], [**databases**][db],
and saving/loading data from the ultrafast [**HDF5 format**][hdfstore]
- [**Time series**][timeseries]-specific functionality: date range
generation and frequency conversion, moving window statistics,
moving window linear regressions, date shifting and lagging, etc.


[missing-data]: http://pandas.pydata.org/pandas-docs/stable/missing_data.html#working-with-missing-data
[insertion-deletion]: http://pandas.pydata.org/pandas-docs/stable/dsintro.html#column-selection-addition-deletion
[alignment]: http://pandas.pydata.org/pandas-docs/stable/dsintro.html?highlight=alignment#intro-to-data-structures
[groupby]: http://pandas.pydata.org/pandas-docs/stable/groupby.html#group-by-split-apply-combine
[conversion]: http://pandas.pydata.org/pandas-docs/stable/dsintro.html#dataframe
[slicing]: http://pandas.pydata.org/pandas-docs/stable/indexing.html#slicing-ranges
[fancy-indexing]: http://pandas.pydata.org/pandas-docs/stable/indexing.html#advanced-indexing-with-ix
[subsetting]: http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing
[merging]: http://pandas.pydata.org/pandas-docs/stable/merging.html#database-style-dataframe-joining-merging
[joining]: http://pandas.pydata.org/pandas-docs/stable/merging.html#joining-on-index
[reshape]: http://pandas.pydata.org/pandas-docs/stable/reshaping.html#reshaping-and-pivot-tables
[pivot-table]: http://pandas.pydata.org/pandas-docs/stable/reshaping.html#pivot-tables-and-cross-tabulations
[mi]: http://pandas.pydata.org/pandas-docs/stable/indexing.html#hierarchical-indexing-multiindex
[flat-files]: http://pandas.pydata.org/pandas-docs/stable/io.html#csv-text-files
[excel]: http://pandas.pydata.org/pandas-docs/stable/io.html#excel-files
[db]: http://pandas.pydata.org/pandas-docs/stable/io.html#sql-queries
[hdfstore]: http://pandas.pydata.org/pandas-docs/stable/io.html#hdf5-pytables
[timeseries]: http://pandas.pydata.org/pandas-docs/stable/timeseries.html#time-series-date-functionality

## Where to get it
The source code is currently hosted on GitHub at:
http://github.com/pydata/pandas

Binary installers for the latest released version are available at the Python
package index

http://pypi.python.org/pypi/pandas/

And via `easy_install`:

```sh
easy_install pandas
```

or `pip`:

```sh
pip install pandas
```

## Dependencies
- [NumPy](http://www.numpy.org): 1.6.1 or higher
- [python-dateutil](http://labix.org/python-dateutil): 1.5 or higher
- [pytz](http://pytz.sourceforge.net)
- Needed for time zone support with ``pandas.date_range``

### Highly Recommended Dependencies
- [numexpr](http://code.google.com/p/numexpr/)
- Needed to accelerate some expression evaluation operations
- Required by PyTables
- [bottleneck](http://berkeleyanalytics.com/bottleneck)
- Needed to accelerate certain numerical operations

### Optional dependencies
- [Cython](http://www.cython.org): Only necessary to build development version. Version 0.17.1 or higher.
- [SciPy](http://www.scipy.org): miscellaneous statistical functions
- [PyTables](http://www.pytables.org): necessary for HDF5-based storage
- [matplotlib](http://matplotlib.sourceforge.net/): for plotting
- [statsmodels](http://statsmodels.sourceforge.net/)
- Needed for parts of `pandas.stats`
- [openpyxl](http://packages.python.org/openpyxl/), [xlrd/xlwt](http://www.python-excel.org/)
- openpyxl version 1.6.1 or higher, for writing .xlsx files
- xlrd >= 0.9.0
- Needed for Excel I/O
- [boto](https://pypi.python.org/pypi/boto): necessary for Amazon S3 access.
- One of the following combinations of libraries is needed to use the
top-level [`pandas.read_html`][read-html-docs] function:
- [BeautifulSoup4][BeautifulSoup4] and [html5lib][html5lib] (Any
recent version of [html5lib][html5lib] is okay.)
- [BeautifulSoup4][BeautifulSoup4] and [lxml][lxml]
- [BeautifulSoup4][BeautifulSoup4] and [html5lib][html5lib] and [lxml][lxml]
- Only [lxml][lxml], although see [HTML reading gotchas][html-gotchas]
for reasons as to why you should probably **not** take this approach.

#### Notes about HTML parsing libraries
- If you install [BeautifulSoup4][BeautifulSoup4] you must install
either [lxml][lxml] or [html5lib][html5lib] or both.
`pandas.read_html` will **not** work with *only* `BeautifulSoup4`
installed.
- You are strongly encouraged to read [HTML reading
gotchas][html-gotchas]. It explains issues surrounding the
installation and usage of the above three libraries.
- You may need to install an older version of
[BeautifulSoup4][BeautifulSoup4]:
- Versions 4.2.1, 4.1.3 and 4.0.2 have been confirmed for 64 and
32-bit Ubuntu/Debian
- Additionally, if you're using [Anaconda][Anaconda] you should
definitely read [the gotchas about HTML parsing][html-gotchas]
libraries
- If you're on a system with `apt-get` you can do

```sh
sudo apt-get build-dep python-lxml
```

to get the necessary dependencies for installation of [lxml][lxml].
This will prevent further headaches down the line.

[html5lib]: https://github.com/html5lib/html5lib-python "html5lib"
[BeautifulSoup4]: http://www.crummy.com/software/BeautifulSoup "BeautifulSoup4"
[lxml]: http://lxml.de
[Anaconda]: https://store.continuum.io/cshop/anaconda
[NumPy]: http://numpy.scipy.org/
[html-gotchas]: http://pandas.pydata.org/pandas-docs/stable/gotchas.html#html-table-parsing
[read-html-docs]: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.html.read_html.html#pandas.io.html.read_html

## Installation from sources
To install pandas from source you need Cython in addition to the normal
dependencies above. Cython can be installed from pypi:

```sh
pip install cython
```

In the `pandas` directory (same one where you found this file after
cloning the git repo), execute:

```sh
python setup.py install
```

or for installing in [development mode](http://www.pip-installer.org/en/latest/usage.html):

```sh
python setup.py develop
```

Alternatively, you can use `pip` if you want all the dependencies pulled
in automatically (the `-e` option is for installing it in [development
mode](http://www.pip-installer.org/en/latest/usage.html)):

```sh
pip install -e .
```

On Windows, you will need to install MinGW and execute:

```sh
python setup.py build --compiler=mingw32
python setup.py install
```

See http://pandas.pydata.org/ for more information.

## License
BSD

## Documentation
The official documentation is hosted on PyData.org: http://pandas.pydata.org/

The Sphinx documentation should provide a good starting point for learning how
to use the library. Expect the docs to continue to expand as time goes on.

## Background
Work on ``pandas`` started at AQR (a quantitative hedge fund) in 2008 and
has been under active development since then.

## Discussion and Development
Since pandas development is related to a number of other scientific
Python projects, questions are welcome on the scipy-user mailing
list. Specialized discussions or design issues should take place on
the pystatsmodels mailing list / Google group, where
``scikits.statsmodels`` and other libraries will also be discussed:

http://groups.google.com/group/pystatsmodels
Loading

0 comments on commit 0bc568f

Please sign in to comment.