Skip to content

html parsing with phantomjs? #5404

Closed
Closed
@gliptak

Description

@gliptak

I was looking into expanding the pandas.io.data functionality to read options data from Google Finance. After reading

http://pandas.pydata.org/pandas-docs/stable/gotchas.html#html-gotchas

I tried various combinations for parsing

http://www.google.com/finance/option_chain?q=GOOG

without success. The page formats itself using javascript, so it has to be "executed" in a browser.

selenium/phantomjs seems to allow to process the page:

$ sudo aptitude install phantomjs
$ pip install selenium
$ ipython
In [1]: from selenium import webdriver
In [2]: browser=webdriver.PhantomJS()
In [3]: browser.get('http://www.google.com/finance/option_chain?q=IBM')
In [4]: exp=browser.find_element_by_id('expirations')
In [5]: exp.find_elements_by_tag_name('option')[2].text

Can they be considered for inclusion as parsing dependency?

Using phantomjs might also help with other HTML parsing issues experienced when using bs4/lxml/html5lib.

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementIO HTMLread_html, to_html, Styler.apply, Styler.applymap

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions