Skip to content

ENH: Added method to pandas.data.Options to download all option data for... #5602

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 17, 2014
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
ENH: Added method to pandas.data.Options to download all option data …
…for a ticker.

Also added a few helper functions.  These functions could be applied to refactor some of the other methods.

ENH: In Options.get_all_data: Now checking for any option tag (instead of just mini)

Changed expiry to datetime from string.
Added tests for tick functions.

BUG: Fixed no sign in change column of option download.

BUG: Fix bugs in Options class

Dealt with situation of calculating expiry when symbol contains a hyphen
Fixed bug in finding current expiry month.

BUG: Fixed Options.get_forward_data expiry date

Method assumed expiry date is the same for all option in a given month.
Not the case for options with weekly's.  Also breaks with options that
have tags.

BUG: Fixed Option bug that didn't allow LEAP DL in January.

Option class was checking only the month to determine if the requested
month was the current month.  Changed to check year and month.  Now
allows downloads of next years LEAPS's in January.

ENH: Added option tag and underlying price to option data output.

Factored out URL parsing and error checking from individual methods.

ENH: Refactor of Option class in io.data.

 Consistently returns multi-index data frame.
 Improves speed of downloading combination of calls and puts by only accessing yahoo once per expiry month.

CLN: Fix out of date docstrings in io.data.Options

Moved _parse_row_values definition into _unpack.

CLN: Consistent capitalization in output data.

CLN: Remove Tag, leave Root in data frame output.

CLN: Remove unnecessary _tag_from_root method.

BUG: Fix different capitalizations of Rootexp in _process_data.

TST: Update tests for pandas.data.Options

TST: Remove test for helper function that no longer exists.

TST: Fix option test for change in output

TST: Changes io.data.Options tests to self.assertTrue

TST: Change tests raise nose.SkipTests on remote data errors

TST: Change nose.SkipTest on RemoteDataError instead of IndexError

ENH: Added quote time to outputs of data.Options.

DOC: Added documentation for io.data.Options

DOC: Added documentation of data.Options output.

DOC: Updated docstrings on data.io.Options

DOC: Added experimental tags to io.data.Options docstrings/documentation.

BUG: Bug fixes, added tests, cleanups on documentation

TST: Fix test_data Options tests.

TST: Add test yahoo finance option pages.

DOC: Update example to show slicing.

TST: Remove test for long for python 3 compatibility.

BUG: Fix quote time scraper

TST: Changed the error raised by no tables in data.Options

Tests were failing if the scraper got the webpage but there weren't any tables in it.  Changed from IndexError to RemoteDataError so that nose would skip it on failure.

DOC: Moved reference to new Options method to v0.14.1.txt

DOC: Updated release at 0.14.1.txt for io.data.Options
  • Loading branch information
davidastephens committed Jun 17, 2014
commit 2ba5ead1e3e713fec59458ab2165959e8886e70b
4 changes: 4 additions & 0 deletions doc/source/release.rst
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,10 @@ performance improvements along with a large number of bug fixes.

Highlights include:

Experimental Features
~~~~~~~~~~~~~~~~~~~~~
- ``pandas.io.data.Options`` has a get_all_data method and now consistently returns a multi-indexed ''DataFrame'' (:issue:`5602`)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be moved to v0.14.1.txt (or removed if it is already there)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, its in v0.14.1.txt, I will remove it here.

See the :ref:`v0.14.1 Whatsnew <whatsnew_0141>` overview or the issue tracker on GitHub for an extensive list
of all API changes, enhancements and bugs that have been fixed in 0.14.1.

Expand Down
37 changes: 37 additions & 0 deletions doc/source/remote_data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,43 @@ Yahoo! Finance
f=web.DataReader("F", 'yahoo', start, end)
f.ix['2010-01-04']

.. _remote_data.yahoo_Options:

Yahoo! Finance Options
----------------------
***Experimental***

The Options class allows the download of options data from Yahoo! Finance.

The ''get_all_data'' method downloads and caches option data for all expiry months
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use backticks ```` instead of ''? Then it renders as 'code'

and provides a formatted ''DataFrame'' with a hierarchical index, so its easy to get
to the specific option you want.

.. ipython:: python

from pandas.io.data import Options
aapl = Options('aapl', 'yahoo')
data = aapl.get_all_data()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fails here (on conversions you need to protect with a try/except) in general. you prob need to wrap all of the float conversions with a ',' replacement (or better yet, don't convert them individually), let them be object dtype.
Then on columns that should be numeric (to avoid accidently changing other stuff), df[column].replace(',',''). Need to do this kind of check in a test as well.

ipdb> l
    523 
    524 def _unpack(row, kind):
    525     def _parse_row_values(val):
    526         ret = val.text_content()
    527         if 'neg_arrow' in val.xpath('.//@class'):
--> 528             ret = float(ret)*(-1.0)
    529         return ret
    530 
    531     els = row.xpath('.//%s' % kind)
    532     return [_parse_row_values(val) for val in els]
    533 

ipdb> p ret
'2,240.10'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya, I had this issue in my code on the weekend. I did the replace - I'll push the update and add a test tonight.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you suggest you do on ValueError here? Raise or return the string with an appended '-'?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, you can try to replace the commas, then convert; on failure I would make it np.nan. If some values in general are string-like and some not then you are forced to leave it as object. However before u go down that road, see WHY its not converting; is it bogus data coming in or are misinterpreting the field (either case should make missing).

data.head()

#Show the $600 strike puts at all expiry dates:
data.loc[(600, slice(None), 'put'),:].head()

#Show the volume traded of $600 strike puts at all expiry dates:
data.loc[(600, slice(None), 'put'),'Vol'].head()

If you don't want to download all the data, more specific requests can be made.

.. ipython:: python

import datetime
expiry = datetime.date(2016, 1, 1)
data = aapl.get_call_data(expiry=expiry)
data.head()

Note that if you call ''get_all_data'' first, this second call will happen much faster, as the data is cached.


.. _remote_data.google:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this works but I think that you need an example of how to slice this, because of this unless the Symbol is included in the index, then you can't slice it

This works

In [48]: data.set_index(['Symbol'],append=True).loc[(330,slice(None),'call'),:]
Out[48]: 
                                               Last  Chg  Bid  Ask  Vol  Open Int   Root IsNonstandard Underlying  Underlying_Price          Quote_Time
Strike Expiry     Type Symbol                                                                                                                          
330    2016-01-15 call AAPL160115C00330000   258.17    0  NaN  NaN    4        43   AAPL         False       AAPL            585.54 2014-05-09 04:00:00
                       AAPL7160115C00330000  270.00    0  NaN  NaN    5        21  AAPL7          True       AAPL            585.54 2014-05-09 04:00:00

[2 rows x 11 columns]

but simply slicing will not (though using .xs on a specific level will work as well)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That code doesn't work for me, I get: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (3), lexsort depth (0)'

What about data.loc[(330,slice(None), 'call')]?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to do a df.sortlevel() on the created frame; always must be sorted to do any real indexing. Furthermore, I think the index should be ['Strike','Expiry','Type','Symbol'] as its completely unique and much more useful. Show a slicing example as well.


Google Finance
Expand Down
18 changes: 17 additions & 1 deletion doc/source/v0.14.1.txt
Original file line number Diff line number Diff line change
Expand Up @@ -148,7 +148,23 @@ Performance
Experimental
~~~~~~~~~~~~

There are no experimental changes in 0.14.1
``pandas.io.data.Options`` has a get_all_data method and now consistently returns a multi-indexed ''DataFrame'' (PR `#5602`)
See :ref:`the docs<remote_data.yahoo_Options>` ***Experimental***

.. ipython:: python

from pandas.io.data import Options
aapl = Options('aapl', 'yahoo')
data = aapl.get_all_data()
data.head()

.. ipython:: python

from pandas.io.data import Options
aapl = Options('aapl', 'yahoo')
data = aapl.get_all_data()
data.head()


.. _whatsnew_0141.bug_fixes:

Expand Down
Loading