html parsing with phantomjs?

I was looking into expanding the pandas.io.data functionality to read options data from Google Finance. After reading

http://pandas.pydata.org/pandas-docs/stable/gotchas.html#html-gotchas

I tried various combinations for parsing

http://www.google.com/finance/option_chain?q=GOOG

without success. The page formats itself using javascript, so it has to be "executed" in a browser.

[selenium](http://seleniumhq.org/)/[phantomjs](http://phantomjs.org/) seems to allow to process the page:

```
$ sudo aptitude install phantomjs
$ pip install selenium
$ ipython
In [1]: from selenium import webdriver
In [2]: browser=webdriver.PhantomJS()
In [3]: browser.get('http://www.google.com/finance/option_chain?q=IBM')
In [4]: exp=browser.find_element_by_id('expirations')
In [5]: exp.find_elements_by_tag_name('option')[2].text
```

Can they be considered for inclusion as parsing dependency?

Using phantomjs might also help with other HTML parsing issues experienced when using bs4/lxml/html5lib.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

html parsing with phantomjs? #5404

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

html parsing with phantomjs? #5404

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions