Closed
Description
Contributing Guidlines / Help:
https://github.com/pydata/pandas/wiki
Dev Docs
http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html
Docs:
- Doc Strings examples/links: Flesh out examples included in docstrings #3439, Ongoing: Fixing docstrings #2916, DOC: in docstrings point users to valid offset aliases #3324
Docs on ipython startup files: DOC: add section about using python/ipython startup files to set options to FAQ #5748- Links to API docs in the tutorials: DOC: provide links to API documentation where possible #3705, DOC: add import prefix to all pandas imports #1967
- better docs on DataFrame.apply (examples): doc: DataFrame.apply can return several columns #5299
GA docs: google analytics docs #3508- doc groupby NA group work-around: DOC: update groupby NA group handing / workaround #5456
- consistent imports in all documentation: DOC: add import prefix to all pandas imports #1967
- DOC: improve groupby reference docs DOC: improve groupby reference docs #6944
- documenting cython class (eg Timestamp): "cyfunction is not a python function" DOC: documenting cython class (eg Timestamp): "cyfunction is not a python function" #5218
- Redesigning/reorganising the documentation website
Perf:
vbench on different group sizes: PERF: add vbenchs for groupby functions with different group sizes #6787- use seed in vbenches: BENCH: put in np.random.seed on vbenches #8144
- pandas + airspeed velocity Use airspeed velocity for benchmarking #8361
Tests:
- matplotlib lib to check plots: TST: Use matplotlib's compare_images to check plots #5379
- harmonize testing namespace: harmonize the testing namespace with tm.TestCase #8023
- verify timedelta algos: BUG: algos with timedelta #5986
- non_unique storage tests for HDFStore: TST: add tests for Series/Panel with non-unique index in to_hdf with fixed format #7813
Bugs:
- Make changes in numpy API for 1.7: CLN: start using numpy-1.7 API #8329
- accept scalar in Panel construction: BUG? Can construct constant Series and DataFrame, but not Panel or Panel4D #8285
- fillna bug: Panel.fillna with method='ffill' ignores the axis parameter and only fills along axis=1 #8251
- HDFStore modifying columns when passed: read_hdf / store.select modifies the passed columns parameters when multi-indexed #7212
- Plot label = None instead of provided label in line plot Plot label None in line plot #8905
- plot with kind=scatter fails when providing an array for the size BUG: plot with kind=scatter fails when checking if an array is in the DataFrame #8852
- Grouper(key='A') gives AttributeError when applying function BUG: Grouper(key='A') gives AttributeError when applying function #8795
- ValueError exception with pd.resample ValueError exception with pd.resample #8683
- to_csv issue with chunksize when large number of columns to_csv issue #8621
- EASY: pd.option_context without 'with' changes option values pd.option_context without 'with' changes option values #8514
Enhancements:
- ENH: support TimedeltaIndex plotting ENH/BUG: support TimedeltaIndex plotting #8711
- as suggested below, an enhancement to
read_csv
(or mayberead_repr/string
to allow round-triping of the repr (can also serve as a basis forread_clipboard
) - raise on invalid compression options in HDFStore: ER: raise on invalid compression options in HDFStore #4582
accept Period in DatetimeIndex for start/end: Cannot create DatetimeIndex using Period #6780- dont use bare Exceptions: ERR: catchall exception should be explicity and use Exception EAFP #7948
- add more Series/Index ops: API/CLN: more common ops to integrate with Series/index OpsMixin #6382
to_dict orient parm: DataFrame to_dict method should also provide orient parameter (like to_json) #7840- sort_index to generic.py: CLN/TST: move consoliate sort_index to core/generic.py #8283
- get to take an axis argument: ENH: allow get to take an axis argument #6703
- axis argument to append: ENH: allow axis argument to append / move append code to generic.py #8295
- better error on invalid input to cut: qcut() should make sure the bins bounderies are unique before passing them to _bins_to_cuts #7751
level kw to any/all: API: add level kwarg for Series.any/.all #8302- astype accepting a dict: ENH:
df.astype
could accept a dict of {col: type} #7271 clean up code by removing core/array.py: COMPAT/CLN: remove need for core/array.py #8359
IO:
- to_clipboard bug/improvements: to_clipboard() locks clipboard system-wide on exception #8304
- to_html to actually create links: Improvement: DataFrame.to_html() to create hyperlinks #2679, ENH: Html table export: Add of other attributes #6488, ENH: to_html improvement #4987
- max_colwidth with groupby: set_option('max_colwidth', N) not working on groupby output #7856
- date_formatting in to_csv not being passed thru: MultiIndex DataFrame to_csv() ignores date_format #7791
- generate gbq schema: to_gbq: Allow creation of new tables from DataFrame (and generate schema) #8325
- make
to_latex
work with multi-index: to_latex with MI column and index names #8336 Series.to_html
not working so well: Series do not display HTML repr #5563- to_csv issue with chunksize when large number of columns to_csv issue #8621
Excel Oriented:
- More customization of Excel input/output could be great, i.e. making it easier to specify per-column colors/formatting, float formats, etc. The code base isn't too complicated there (just a mixture of the formatter and the ExcelWriter stuff) and you could make rapid progress because it's really easy to test and create samples. I think the result would be very immediately rewarding (better looking things, easier to make reports, etc.). Plus for ENH: Excel - allow for multiple rows to be treated as hierarchical columns #4679 and when a column contains alpha numeric ending with 'e', pandas converts these to float64 #8272 you'd get a better sense of pandas internals too.
- dtypes per column in read in (ENH: read_excel dtypes and converts #8212 and when a column contains alpha numeric ending with 'e', pandas converts these to float64 #8272)
- Treat multiple rows/columns as MultiIndex/hierarchical columns ENH: Excel - allow for multiple rows to be treated as hierarchical columns #4679
- More flexible output formatting for Excel (mentioned in to_excel() float_format to accept this format string? #8191 , but I'm going to put up an issue about having something like per-column styles, also Styling in DataFrame.to_excel #1663).
- Allow writing multiple tables to same sheet and/or setting starting position for a particular sheet (mentioned by Wes in an older issue but I can't find it right now).
- extend excel writers to write to open document format Wish: Input / output for Open Document Spreadsheet (ODS) #2311
- allow ExcelWriter to automatically convert lists and dict to string Suggested improvement: allow ExcelWriter to automatically convert lists and dict to strings #8188
- Performance benchmarks for Excel writers and readers PERF: vbench for excel writers #7171
- Support BytesIO output in ExcelWriter ExcelWriter does not support BytesIO input #7074
SQL:
- ENH: specify dtype in
to_sql
per column: problem with to_sql with NA #8778 - BUG: to_sql fails with datetime.time values with sqlite fallback mode BUG: to_sql fails with datetime.time values with sqlite fallback mode #8341
More advanced:
- add write_index option to HDFStore: ENH: add option to HDFStore.put/append to optionally store the index of the object (default for back compat is True) #8319
- start/stop on HDFstore fixed: BUG: HDFStore.select ignores start and stop parameters #8287
- block splitting in HDFStore: Unable to write to HDF5 table if DataFrame has mixed object types (pd.Timestamp and str) #8284
- tz handling using HDFStore / fixed: BUG: round-trip of tz in an index using fixed-format for HDF5 #8165, BUG/ERR: raise on saving a multi-index with tz-info in fixed format for HDF5 #7775
- PeriodIndex in HDFStore: BUG: support/test of PeriodIndex in HDFStore #7796
- Write meta data as a CArray: ENH: write Table meta-data (non_index_axes) as a CArray (rather than as meta-data) #6245
- Modify methods for HDFStore: Updating HDFStore in place #6857
- Integrate multi-index reindexing a bit more: MultiIndex reindex should behave like Index. #7895
- Timedelta support in groupby: groupby.mean, etc, doesn't recognize timedelta64 #5724
- support numeric index ops: API: make '+' and '-' for Index either do numeric operations of raise TypeError (instead of setops) #8226
- allow index to referenced like a column: Allowing the index to be referenced by name, like a column #8162
Collaborative Efforts:
- Missing data support in numpy: Why does pandas work around numpy limitations with custom dtypes instead of fixing them upstream? #8350
@jorisvandenbossche @cpcloud @TomAugspurger @hayd
cc @shoyer
cc @immerrr
Metadata
Metadata
Assignees
Labels
No labels