forked from pandas-dev/pandas
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathRELEASE.rst
250 lines (213 loc) · 10.4 KB
/
RELEASE.rst
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
************************
pandas 0.4 Release Notes
************************
==========
What is it
==========
**pandas** is a library of powerful labeled data structures, statistical tools,
and general code for working with time series and cross-sectional data. It was
designed with the practical needs of statistical modeling and large,
inhomogeneous data sets in mind. It is particularly well suited for, among other
things, financial data analysis applications.
===============
Where to get it
===============
Source code: http://github.com/wesm/pandas
Binary installers on PyPI: http://pypi.python.org/pypi/pandas
Documentation: http://pandas.sourceforge.net
=============
Release notes
=============
**Release date:** NOT YET RELEASED
**New features / modules**
* `pandas.core.sparse` module: "Sparse" (mostly-NA, or some other fill value)
versions of `Series`, `DataFrame`, and `WidePanel`. For low-density data, this
will result in significant performance boosts, and smaller memory
footprint. Added `to_sparse` methods to `Series`, `DataFrame`, and
`WidePanel`. See online documentation for more on these
* `Series.describe`, `DataFrame.describe`: produces an R-like table of summary
statistics about each data column
* `DataFrame.quantile`, `Series.quantile`
* Fancy indexing operator on Series / DataFrame, e.g. via .ix operator. Both
getting and setting of values is supported; however, setting values will only
currently work on homogeneously-typed DataFrame objects
* series.ix[[d1, d2, d3]]
* frame.ix[5:10, ['C', 'B', 'A']], frame.ix[5:10, 'A':'C']
* frame.ix[date1:date2]
* `Series` arithmetic methods with optional fill_value for missing data,
e.g. a.add(b, fill_value=0). If a location is missing for both it will still
be missing in the result though.
* Boolean indexing with `DataFrame` objects: data[data > 0.1] = 0.1
* `pytz` / tzinfo support in `DateRange`
* `tz_localize`, `tz_normalize`, and `tz_validate` methods added
* Added `ExcelFile` class to `pandas.io.parsers` for parsing multiple sheets out
of a single Excel 2003 document
* `GroupBy` aggregations can now optionally *broadcast*, e.g. produce an object
of the same size with the aggregated value propagated
* Added `select` function in all data structures: reindex axis based on
arbitrary criterion (function returning boolean value),
e.g. frame.select(lambda x: 'foo' in x, axis=1)
* `DataFrame.consolidate` method, API function relating to redesigned internals
* `HDFStore` class in `pandas.io.pytables` has been largely rewritten using
patches from Jeff Reback from others. It now supports mixed-type `DataFrame`
and `Series` data and can store `WidePanel` objects. It also has the option to
query `DataFrame` and `WidePanel` data. Loading data from legacy `HDFStore`
files is supported explicitly in the code
**Improvements to existing features**
* The 2-dimensional `DataFrame` and `DataMatrix` classes have been extensively
redesigned internally into a single class `DataFrame`, preserving where
possible their optimal performance characteristics. This should reduce
confusion from users about which class to use.
* Note that under ther hood there is a new essentially "lazy evaluation"
scheme within respect to adding columns to DataFrame. During some
operations, like-typed blocks will be "consolidated" but not before.
* Column ordering for mixed type data is now completely consistent in
`DataFrame`. In prior releases, there was inconsistent column ordering in
`DataMatrix`
* Improved console / string formatting of DataMatrix with negative numbers
* Added `skiprows` and `na_values` arguments to `pandas.io.parsers` functions
for more flexible IO
* Can slice `DataFrame` and get a view of the data (when homogeneously typed),
e.g. frame.xs(idx, copy=False) or frame.ix[idx]
* Many speed optimizations throughout `Series` and `DataFrame`
* Eager evaluation of groups when calling ``groupby`` functions, so if there is
an exception with the grouping function it will raised immediately versus
sometime later on when the groups are needed
* `datetools.WeekOfMonth` offset can be parameterized with `n` different than 1
or -1.
* `parseCSV` / `read_csv` functions and others in `pandas.io.parsers` now can
take a list of custom NA values, and also a list of rows to skip
* Statistical methods on DataFrame like `mean`, `std`, `var`, `skew` will now
ignore non-numerical data. Before a not very useful error message was
generated. A flag `numeric_only` has been added to `DataFrame.sum` and
`DataFrame.count` to enable this behavior in those methods if so desired
(disabled by default)
* `DataFrame.pivot` generalized to enable pivoting multiple columns into a
`WidePanel`
**API Changes**
* The `DataMatrix` variable now refers to `DataFrame`, will be removed within
two releases
* A `copy` argument has been added to the `DataFrame` constructor to avoid
unnecessary copying of data. Data is no longer copied by default when passed
into the constructor
* Handling of boolean dtype in `DataFrame` has been improved to support storage
of boolean data with NA / NaN values. Before it was being converted to float64
so this should not (in theory) cause API breakage
* Boolean indexing using Series must now have the same indices (labels)
* Backwards compatibility support for begin/end/nPeriods keyword arguments in
DateRange class has been removed
* More intuitive / shorter filling aliases `ffill` (for `pad`) and `bfill` (for
`backfill`) have been added to the functions that use them: `reindex`,
`asfreq`, `fillna`.
* `pandas.core.mixins` code moved to `pandas.core.generic`
* `buffer` keyword arguments (e.g. `DataFrame.toString`) renamed to `buf` to
avoid using Python built-in name
* `DataFrame.rows()` removed (use `DataFrame.index`)
* Added deprecation warning to `DataFrame.cols()`, to be removed in next release
* `DataFrame` deprecations and de-camelCasing: `merge`, `asMatrix`,
`toDataMatrix`, `_firstTimeWithValue`, `_lastTimeWithValue`, `toRecords`,
`fromRecords`
* `pandas.io.parsers` method deprecations
* `parseCSV` is now `read_csv` and keyword arguments have been de-camelCased
* `parseText` is now `read_table`
* `parseExcel` is replaced by the `ExcelFile` class and its `parse` method
* `fillMethod` arguments (deprecated in prior release) removed, should be
replaced with `method`
* `Series.fill`, `DataFrame.fill`, and `WidePanel.fill` removed, use `fillna`
instead
**Bug fixes**
* Column ordering in `pandas.io.parsers.parseCSV` will match CSV in the presence
of mixed-type data
* Fixed handling of Excel 2003 dates in `pandas.io.parsers`
* `DateRange` caching was happening with high resolution `DateOffset` objects,
e.g. `DateOffset(seconds=1)`. This has been fixed
* Fixed __truediv__ issue in `DataFrame`
* Fixed `DataFrame.toCSV` bug preventing IO round trips in some cases
* Fixed bug in `Series.plot` causing matplotlib to barf in exceptional cases
* Disabled `Index` objects from being hashable, like ndarrays
* Added `__ne__` implementation to `Index` so that operations like ts[ts != idx]
will work
Thanks
------
- Joon Ro
- Michael Pennington
- Chris Uga
- Chris Withers
- Jeff Reback
- William Ferreira
- Daniel Fortunov
- Martin Felder
************************
pandas 0.3 Release Notes
************************
=============
Release Notes
=============
This major release of pandas represents approximately 1 year of continuous
development work and brings with it many new features, bug fixes, speed
enhancements, and general quality-of-life improvements. The most significant
change from the 0.2 release has been the completion of a rigorous unit test
suite covering all of the core functionality.
==========
What is it
==========
**pandas** is a library of labeled data structures, statistical models, and
general code for working with time series and cross-sectional data. It was
designed with the practical needs of statistical modeling and large,
inhomogeneous data sets in mind.
===============
Where to get it
===============
Source code: http://github.com/wesm/pandas
Binary installers on PyPI: http://pypi.python.org/pypi/pandas
Documentation: http://pandas.sourceforge.net
pandas 0.3.0 release notes
==========================
**Release date:** February 20, 2011
**New features / modules**
* DataFrame / DataMatrix classes
* `corrwith` function to compute column- or row-wise correlations between two
objects
* Can boolean-index DataFrame objects, e.g. df[df > 2] = 2, px[px > last_px] = 0
* Added comparison magic methods (__lt__, __gt__, etc.)
* Flexible explicit arithmetic methods (add, mul, sub, div, etc.)
* Added `reindex_like` method
* WidePanel
* Added `reindex_like` method
* `pandas.io`: IO utilities
* `pandas.io.sql` module
* Convenience functions for accessing SQL-like databases
* `pandas.io.pytables` module
* Added (still experimental) HDFStore class for storing pandas data
structures using HDF5 / PyTables
* `pandas.core.datetools`
* Added WeekOfMonth date offset
* `pandas.rpy` (experimental) module created, provide some interfacing /
conversion between rpy2 and pandas
**Improvements**
* Unit test coverage: 100% line coverage of core data structures
* Speed enhancement to rolling_{median, max, min}
* Column ordering between DataFrame and DataMatrix is now consistent: before
DataFrame would not respect column order
* Improved {Series, DataFrame}.plot methods to be more flexible (can pass
matplotlib Axis arguments, plot DataFrame columns in multiple subplots, etc.)
**API Changes**
* Exponentially-weighted moment functions in `pandas.stats.moments`
have a more consistent API and accept a min_periods argument like
their regular moving counterparts.
* **fillMethod** argument in Series, DataFrame changed to **method**,
`FutureWarning` added.
* **fill** method in Series, DataFrame/DataMatrix, WidePanel renamed to
**fillna**, `FutureWarning` added to **fill**
* Renamed **DataFrame.getXS** to **xs**, `FutureWarning` added
* Removed **cap** and **floor** functions from DataFrame, renamed to
**clip_upper** and **clip_lower** for consistency with NumPy
**Bug fixes**
* Fixed bug in IndexableSkiplist Cython code that was breaking
rolling_max function
* Numerous numpy.int64-related indexing fixes
* Several NumPy 1.4.0 NaN-handling fixes
* Bug fixes to pandas.io.parsers.parseCSV
* Fixed `DateRange` caching issue with unusual date offsets
* Fixed bug in `DateRange.union`
* Fixed corner case in `IndexableSkiplist` implementation