Skip to content

Conversation

@blefo
Copy link
Member

@blefo blefo commented Aug 20, 2025

Refactors the API’s web search integration by migrating from DuckDuckGo to Brave Search and adding async support

  • Backend Migration: Replaces DuckDuckGo with Brave Search API; adds aiolimiter for rate limiting; updates config and dependencies.
  • Async & Refactor: Uses async HTTP requests via httpx; robust error handling; parses results into new SearchResult model.
  • API Model: Introduces SearchResult for structured output; updates public API exports.
  • Query Generation: Improves LLM-based query prompts, enforcing clarity and length; adds error handling.

@blefo blefo force-pushed the feat-update-web-search-browser branch from 836fb6c to e1f9180 Compare August 20, 2025 16:04
@blefo blefo requested a review from jcabrero August 20, 2025 17:17
Copy link
Member

@jcabrero jcabrero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we're in a bit of a rush here today. The PR in itself seems to be working in the general functionality, though there are some errors where the rate limits are not the expected.

The changes to the unit tests are not essential now.

If it is tremendously urgent, we merge it, but you should come back to it and address all the changes.

Comment on lines 29 to 32
brave_rate_limiter = AsyncLimiter(WEB_SEARCH_API_RPS, 1)

def perform_web_search_sync(query: str) -> WebSearchContext:
"""Synchronously query Brave and build a contextual prompt.
_http_client: httpx.AsyncClient | None = None
_client_lock = asyncio.Lock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the maximum concurrency limit for Brave API is WEB_SEARCH_API_RPS, then you should consider that there will be a single brave_rate_limiter per gunicorn worker. This means that, if there are 20 gunicorn workers and each has a WEB_SEARCH_API_RPS of 30, there may be up to 20 * 30 = 600 requests per second, and 20 happening concurrently from the different processes.

I would consider adding this to the request or the app state in FastAPI so that it is shared among the different processes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now manage this with a Redis bucket. It also performs a check directly at the chat.completion level rather than via a web search query as this is easier to manage. We ensure that the number of queries with web search enabled does not exceed max_concurrent_requests//count so that we don't exceed the 20 queries/s allowed by the Brave API.


def perform_web_search_sync(query: str) -> WebSearchContext:
"""Synchronously query Brave and build a contextual prompt.
_http_client: httpx.AsyncClient | None = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_http_client: httpx.AsyncClient | None = None
_http_client: Optional[httpx.AsyncClient] = None

Comment on lines 31 to 43
# Brave Search API configuration
BRAVE_SEARCH_API: str | None = os.getenv("BRAVE_SEARCH_API")

# Web Search API configuration
WEB_SEARCH_API_PATH: Dict[str, str] = {
"web": "https://api.search.brave.com/res/v1/web/search",
}
WEB_SEARCH_COUNT: int = 3
WEB_SEARCH_LANG: str = "en"
WEB_SEARCH_COUNTRY: str = "us"
WEB_SEARCH_TIMEOUT: float = 20.0
WEB_SEARCH_API_MAX_CONCURRENT_REQUESTS: int = 20
WEB_SEARCH_API_RPS: int = 20
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I'd love this to be a single type called WebSearchSettings with each field being one member.
  • The WEB_SEARCH_API_PATH, it's not great to have Dict[str, str] because it is just one element and you don't seem to use more than one.
Suggested change
# Brave Search API configuration
BRAVE_SEARCH_API: str | None = os.getenv("BRAVE_SEARCH_API")
# Web Search API configuration
WEB_SEARCH_API_PATH: Dict[str, str] = {
"web": "https://api.search.brave.com/res/v1/web/search",
}
WEB_SEARCH_COUNT: int = 3
WEB_SEARCH_LANG: str = "en"
WEB_SEARCH_COUNTRY: str = "us"
WEB_SEARCH_TIMEOUT: float = 20.0
WEB_SEARCH_API_MAX_CONCURRENT_REQUESTS: int = 20
WEB_SEARCH_API_RPS: int = 20
# Brave Search API configuration
BRAVE_SEARCH_API: Optional[str] = os.getenv("BRAVE_SEARCH_API")
# Web Search API configuration
WEB_SEARCH_API_PATH: Dict[str, str] = {
"web": "https://api.search.brave.com/res/v1/web/search",
}
WEB_SEARCH_COUNT: int = 3
WEB_SEARCH_LANG: str = "en"
WEB_SEARCH_COUNTRY: str = "us"
WEB_SEARCH_TIMEOUT: float = 20.0
WEB_SEARCH_API_MAX_CONCURRENT_REQUESTS: int = 20
WEB_SEARCH_API_RPS: int = 20

raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail="Web search requested with an empty query",
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be 503 SERVICE NOT AVAILABLE, because it is not an internal server error but rather that the websearch service is not available.

Comment on lines 72 to 83
params = {
"q": q,
"summary": 1,
"count": WEB_SEARCH_COUNT,
"country": WEB_SEARCH_COUNTRY,
"lang": WEB_SEARCH_LANG,
}
headers = {
"X-Subscription-Token": BRAVE_SEARCH_API,
"Api-Version": "2023-10-11",
"Accept": "application/json",
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: With q being the only thing that you need to produce at runtime (because the rest you know beforehand), you could decide to cache params and headers as a global variable. Though not tremendous in speedup.

Comment on lines 35 to 50
async def _get_http_client() -> httpx.AsyncClient:
"""Get or create a shared HTTP client instance to avoid creating it on every request.
Returns:
An AsyncClient configured with timeouts and connection limits
"""
if not query or not query.strip():
global _http_client
async with _client_lock:
if _http_client is None:
_http_client = httpx.AsyncClient(
timeout=WEB_SEARCH_TIMEOUT,
limits=httpx.Limits(
max_connections=WEB_SEARCH_API_MAX_CONCURRENT_REQUESTS
),
)
return _http_client
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This is not bad, but instead of doing global, which can be considered not great practice, I would instead do something like @lru_cache(max_size=1) which permits there to exist a single http_client. Then you simplify the behaviour of the function just creating one function.

else:
print("All retries failed.")
raise last_exception
pytest.skip("web_search could not be validated due to environment")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it ever get here? Otherwise, what is the purpose? I can't really make full sense of it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Brave API key was not set in the CI pipeline and the test was breaking. Now that it's integrated, I've removed it

Comment on lines 105 to 106
assert "[1]" in ctx.prompt
assert "[2]" in ctx.prompt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is testing for the substring "[1]" and "[2]" being in the prompt? Do not really understand.

Comment on lines 72 to 106
@pytest.mark.asyncio
async def test_perform_web_search_async_success():
"""Test successful web search with mock response"""
mock_data = {
"web": {
"results": [
{
"title": "Latest AI Developments",
"description": "OpenAI announces GPT-5 with improved capabilities and better performance across various tasks.",
"url": "https://example.com/ai1",
},
{
"title": "AI Breakthrough in Robotics",
"description": "New neural network architecture improves robot learning efficiency by 40% in recent studies.",
"url": "https://example.com/ai2",
},
]
}
}

with (
patch("nilai_api.handlers.web_search.BRAVE_SEARCH_API", "test-key"),
patch(
"nilai_api.handlers.web_search._make_brave_api_request",
return_value=mock_data,
),
):
ctx = await perform_web_search_async("AI developments")

assert ctx.sources is not None
assert len(ctx.sources) > 0
assert "GPT-5" in ctx.prompt
assert "40%" in ctx.prompt
assert ctx.sources[0].source == "https://example.com/ai1"
assert ctx.sources[1].source == "https://example.com/ai2"
assert "[1]" in ctx.prompt
assert "[2]" in ctx.prompt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not fully sure this test does much as you're not even keeping the format of what websearch would return as it would return an api_modelresult.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would check all or most of these changes again. Some of the tests because you mocking values that do not really have to do with the SearchResult API model, I am not even sure they make sense because it's like I do a test where I say: "I mock sum function always returns the string '3' and I check if it returns 3. By itself, it is checking pretty much nothing. What you should try to mock is the Brave API calls that you make, and make sure that returns something that you mock. In that way, you make sure that the functionality you plug is doing what it should do.

@blefo blefo force-pushed the feat-update-web-search-browser branch from 35a32bd to 83b6d46 Compare August 22, 2025 10:38
@blefo blefo requested a review from jcabrero August 22, 2025 11:03
Copy link
Member

@jcabrero jcabrero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one last comment, that you may decide to keep. The AsyncLimiter is still in web_search.py. Can it conflict in any way with the Redis rate limiter? Is it really required?

In any case, I think the PR looks much much better now. Great job! 💪

@blefo blefo merged commit 0320aa9 into main Aug 22, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants