Skip to content

Conversation

frinkleko
Copy link
Member

Description

This pull request introduces a new SearchSource that enables fetching and processing content from the DuckDuckGo search engine. This new source is designed to be a robust and flexible way to incorporate web search results into the QuantMind framework. This PR complete part of the demand in #35

The key features of this implementation are:

  • A SearchSource class that queries the DuckDuckGo search engine.
  • A corresponding SearchContent data model to standardize the search results.
  • Support for advanced search operators, including site:, filetype:, and date-range filtering, which can be configured or passed directly to the search method.
  • Comprehensive integration tests to ensure the reliability of both basic and advanced search functionalities.

Checklist

  • The PR title starts with $CATEGORY(xx): xxx (such as feat(tool): xxx, fix(source): xxx, docs(README): xxx)
  • Related issue is referred in this PR
  • The markdown and latex are rendered correctly.
  • The code in PR is well-documented.
  • x] The PR is complete and small, read the [Google eng practice (CL equals to PR) (https://google.github.io/eng-practices/review/developer/small-cls.html) to understand more about small PR.

Copy link
Member

@wanghaoxue0 wanghaoxue0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good from my side

snippet: str
source: str = "search"
query: Optional[str] = None
meta_info: Dict[str, Any] = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the function of meta_info?

the value in the config.
site: Restrict search to a specific domain.
filetype: Search for specific file types.
start_date: Start date for search results (YYYY-MM-DD).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the datetime format is wrong?

Add something like this?

@field_validator("start_date", "end_date")
@classmethod
def _validate_date(cls, v):
    if v is None:
        return v
    if not re.match(r"^\d{4}-\d{2}-\d{2}$", v):
        raise ValueError("date must be YYYY-MM-DD")
    return v

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants