Conversation
- implement ddgs/engines/sogou.py
|
@deedy5 Sogou is a well-known search engine in China. Because of the GFW, many users hope to use ddgs without a proxy, but none of the existing search backends support this. This backend addresses that gap and makes ddgs usable in China. Occasionally, the first result may be missing (since it is Sogou’s own sponsored recommendation), but this has little impact on overall functionality. |
Moved the import of Mapping from collections.abc into a TYPE_CHECKING block to optimize runtime imports and improve type checking performance.
Expanded .gitignore to match all .venv* directories. Enhanced the README.md with a more detailed and nested Table of Contents for better navigation.
deedy5
left a comment
There was a problem hiding this comment.
Thanks for the contribution — your other changes look good. Could you please revert the last commit that modified .gitignore and README? Those edits aren’t necessary and should be left unchanged.
This reverts commit d3daacd.
Reverted now. Thanks! |
Removed redirect resolution and caching from Sogou search engine. The post_extract_results method now only normalizes hrefs to absolute URLs and filters results with valid titles and hrefs, reducing complexity.
|
@scarletkc |
Enhanced the Sogou search engine to extract real target URLs from the data-url attribute, avoiding unnecessary requests to resolve wrapper links. Added helper methods for XPath extraction and updated post-processing to filter out unresolved wrapper links. Included tests to verify correct behavior when data-url is present or missing.
There was a problem hiding this comment.
Please remove the tests: they don't make sense in their current form.
| @@ -0,0 +1,94 @@ | |||
| """Sogou search engine implementation.""" | |||
|
|
|||
| from __future__ import annotations | |||
There was a problem hiding this comment.
please delete from __future__ import annotations
| items_xpath = "//div[contains(@class, 'vrwrap') and not(contains(@class, 'hint'))]" | ||
| elements_xpath: ClassVar[Mapping[str, str]] = { | ||
| "title": ".//h3//a//text()", | ||
| "href": ".//h3//a/@href", |
There was a problem hiding this comment.
"href": use xpath from _data_url_xpath
| return payload | ||
|
|
||
| @staticmethod | ||
| def _xpath_join(item: Any, xpath: str) -> str: # noqa: ANN401 |
| return " ".join(x.strip() for x in item.xpath(xpath) if x and x.strip()) | ||
|
|
||
| @staticmethod | ||
| def _xpath_first(item: Any, xpath: str) -> str: # noqa: ANN401 |
| return "" | ||
|
|
||
| @staticmethod | ||
| def _is_wrapper_link(href: str) -> bool: |
| def _is_wrapper_link(href: str) -> bool: | ||
| return "/link?url=" in href or "sogou.com/link?url=" in href | ||
|
|
||
| def extract_results(self, html_text: str) -> list[TextResult]: |
There was a problem hiding this comment.
def extract_results is unnecessary
| "body": ".//div[contains(@class, 'space-txt')]//text()", | ||
| } | ||
|
|
||
| _data_url_xpath = ".//*[@data-url]/@data-url" |
| results.append(TextResult(title=title, href=href, body=body)) | ||
| return results | ||
|
|
||
| def post_extract_results(self, results: list[TextResult]) -> list[TextResult]: |
There was a problem hiding this comment.
validate href in post_extract_results
There was a problem hiding this comment.
Everything works—thanks!
Could you optimize the code and rename the branch containing your changes to a name other than 'main'?
You only need to implement the correct xpath's and functions build_payload and post_extract_results.
Check Contributing.md and how the BaseSearchEngine class — the one Sogou inherits from — works:
Lines 19 to 122 in 7962c03
Feature
ddgs/engines/sogou.pyimplementing text search with timelimit + pagination supportddgs text -b sogou) and list it in README’s engines tableNotes
https://www.sogou.com/web(GET)items_xpath = //div[contains(@class,'vrwrap') and not(contains(@class,'hint'))]_normalize_hreffollows redirects via inline JS/meta refresh and caches the decoded hrefsTests
pytest tests/→ 17/18 (pre-existingtest_books_commandfailure)pre-commit run --all-filesDDGS().text('python', backend='sogou', max_results=5)