-
Notifications
You must be signed in to change notification settings - Fork 22
Open
Description
your package works great but I had to modify it slightly..
self._insert(row_ind, col_ind, row_span, col_span, self._transformer(cell.get_text()))
This is fine if the content is text but if it contains links you want to keep then it's problematic
I have modified it to:
class Extractor(object):
def __init__(self, table, id_=None, cell_transformer=None):
...
self._cell_transformer = cell_transformer if cell_transformer else lambda x: x.get_text()
def parse(self):
...
self._insert(row_ind, col_ind, row_span, col_span, self._cell_transformer(cell))
this allows the callee to implement the cell extraction if required.
Also, having to do 3 lines..
ext = Extractor(html)
ext.parse()
print ext.return_list()
would be nicer to just do
result = Extractor().parse(html)
Thanks, this package is small but useful :)
Metadata
Metadata
Assignees
Labels
No labels