Closed
Description
I was looking into expanding the pandas.io.data functionality to read options data from Google Finance. After reading
http://pandas.pydata.org/pandas-docs/stable/gotchas.html#html-gotchas
I tried various combinations for parsing
http://www.google.com/finance/option_chain?q=GOOG
without success. The page formats itself using javascript, so it has to be "executed" in a browser.
selenium/phantomjs seems to allow to process the page:
$ sudo aptitude install phantomjs
$ pip install selenium
$ ipython
In [1]: from selenium import webdriver
In [2]: browser=webdriver.PhantomJS()
In [3]: browser.get('http://www.google.com/finance/option_chain?q=IBM')
In [4]: exp=browser.find_element_by_id('expirations')
In [5]: exp.find_elements_by_tag_name('option')[2].text
Can they be considered for inclusion as parsing dependency?
Using phantomjs might also help with other HTML parsing issues experienced when using bs4/lxml/html5lib.