Skip to content

Commit

Permalink
fix posts with photo and long text (#914)
Browse files Browse the repository at this point in the history
Fix extraction of text from post with photos and long text
(therefore with the more button) during get_posts returns an identifier
to None. The resulting text will be null instead of the real content.
  • Loading branch information
Ianneee authored Oct 16, 2022
1 parent 6573722 commit e8387e9
Showing 1 changed file with 3 additions and 0 deletions.
3 changes: 3 additions & 0 deletions facebook_scraper/extractors.py
Original file line number Diff line number Diff line change
Expand Up @@ -274,6 +274,9 @@ def extract_text(self) -> PartialPost:
has_more = self.more_url_regex.search(element.html)
if has_more and self.full_post_html:
element = self.full_post_html.find('.story_body_container', first=True)
if not element and self.full_post_html.find("div.msg", first=True):
text = self.full_post_html.find("div.msg", first=True).text
return {"text": text, "post_text": text}

nodes = element.find('p, header, span[role=presentation]')
if nodes and len(nodes) > 1:
Expand Down

0 comments on commit e8387e9

Please sign in to comment.