Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Commit

Permalink
Use bot user agent for openGraph queries
Browse files Browse the repository at this point in the history
We found that some websites return opengraph information based
on the user agent.

In order to address this, using (bot) in the user agent string
may help the website behave correctly. When testing Twitter, we
found that the correct metadata is returned with the (bot) user
agent while it isn't for the default user agent.

Signed-off-by: Andrew Ryan <andrewryanchama@clover.club>
  • Loading branch information
AndrewRyanChama authored and tenpura-shrimp committed Feb 14, 2022
1 parent 55113dd commit bdb5e7f
Show file tree
Hide file tree
Showing 3 changed files with 9 additions and 6 deletions.
1 change: 1 addition & 0 deletions changelog.d/11985.misc
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Use bot user agent for openGraph queries.
4 changes: 1 addition & 3 deletions synapse/res/providers.json
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,11 @@
"endpoints": [
{
"schemes": [
"https://twitter.com/*/status/*",
"https://*.twitter.com/*/status/*",
"https://twitter.com/*/moments/*",
"https://*.twitter.com/*/moments/*"
],
"url": "https://publish.twitter.com/oembed"
}
]
}
]
]
10 changes: 7 additions & 3 deletions synapse/rest/media/v1/preview_url_resource.py
Original file line number Diff line number Diff line change
Expand Up @@ -326,8 +326,9 @@ async def _do_preview(self, url: str, user: UserID, ts: int) -> bytes:

# Compile the Open Graph response by using the scraped
# information from the HTML and overlaying any information
# from the oEmbed response.
og = {**og_from_html, **og_from_oembed}
# from the oEmbed response. og tags from the original html
# have priority over oEmbed data.
og = {**og_from_oembed, **og_from_html}

await self._precache_image_url(user, media_info, og)
else:
Expand Down Expand Up @@ -402,7 +403,10 @@ async def _download_url(self, url: str, output_stream: BinaryIO) -> DownloadResu
url,
output_stream=output_stream,
max_size=self.max_spider_size,
headers={"Accept-Language": self.url_preview_accept_language},
headers={
b"Accept-Language": self.url_preview_accept_language,
b"User-Agent": ["Synapse (bot)"],
},
is_allowed_content_type=_is_previewable,
)
except SynapseError:
Expand Down

0 comments on commit bdb5e7f

Please sign in to comment.