Skip to content

Conversation

alzarei
Copy link

@alzarei alzarei commented Sep 28, 2025

Modernize GoogleTextSearch connector with ITextSearch interface

Problem Statement

The GoogleTextSearch connector currently only implements the legacy ITextSearch interface, forcing users to use clause-based TextSearchFilter instead of modern type-safe LINQ expressions. This creates runtime errors from property name typos and lacks compile-time validation for Google search operations.

Technical Approach

This PR modernizes the GoogleTextSearch connector to implement the generic ITextSearch interface alongside the existing legacy interface. The implementation provides LINQ-to-Google-API conversion with support for equality, contains, NOT operations, FileFormat filtering, and compound AND expressions.

Implementation Details

Core Changes

  • Implement ITextSearch interface with full generic method support
  • Add LINQ expression analysis supporting equality, contains, NOT operations, and compound AND expressions
  • Map LINQ expressions to Google Custom Search API parameters (exactTerms, orTerms, excludeTerms, fileType, siteSearch)
  • Support advanced filtering patterns with type-safe property access

Property Mapping Strategy
The Google Custom Search API supports substantial filtering through predefined parameters:

  • exactTerms: Exact title/content match
  • siteSearch: Site/domain filtering
  • fileType: File extension filtering
  • excludeTerms: Negation filtering
  • Additional parameters: country restrict, language, date filtering

Code Examples

Before (Legacy Interface)

var options = new TextSearchOptions
{
    Filter = new TextSearchFilter().Equality("siteSearch", "microsoft.com")
};

After (Generic Interface)

// Simple filtering
var options = new TextSearchOptions<GoogleWebPage>
{
    Filter = page => page.DisplayLink.Contains("microsoft.com")
};

// Complex filtering
var complexOptions = new TextSearchOptions<GoogleWebPage>
{
    Filter = page => page.DisplayLink.Contains("microsoft.com") &&
                    page.Title.Contains("AI") &&
                    page.FileFormat == "pdf" &&
                    !page.Snippet.Contains("deprecated")
};

Implementation Benefits

Type Safety & Developer Experience

  • Compile-time validation of GoogleWebPage property access
  • IntelliSense support for all GoogleWebPage properties
  • Eliminates runtime errors from property name typos in filters

Enhanced Filtering Capabilities

  • Equality filtering: page.Property == "value"
  • Contains filtering: page.Property.Contains("text")
  • NOT operations: !page.Property.Contains("text")
  • FileFormat filtering: page.FileFormat == "pdf"
  • Compound AND expressions with multiple conditions

Validation Results

Build Verification

  • Command: dotnet build --configuration Release --interactive
  • Result: Build succeeded in 3451.8s (57.5 minutes) - all projects compiled successfully
  • Status: ✅ PASSED (0 errors, 0 warnings)

Test Results
Full Test Suite:

  • Passed: 7,177 (core functionality tests)
  • Failed: 2,421 (external API configuration issues)
  • Skipped: 31
  • Duration: 4 minutes 57 seconds

Core Unit Tests:

  • Semantic Kernel unit tests: 1,574/1,574 tests passed (100%)
  • Google Connector Tests: 29 tests passed (23 legacy + 6 generic)

Test Failure Analysis
The 2,421 test failures are infrastructure/configuration issues, not code defects:

  • Azure OpenAI API Configuration: Missing API keys for external service integration tests
  • AWS Bedrock Configuration: Integration tests requiring live AWS services
  • Docker Dependencies: Vector database containers not available in development environment
  • External Service Dependencies: Integration tests requiring live API services (Bing, Google, etc.)

These failures are expected in development environments without external API configurations.

Method Ambiguity Resolution
Fixed compilation issues when both legacy and generic interfaces are implemented:

// Before (ambiguous):
await textSearch.SearchAsync("query", new() { Top = 4, Skip = 0 });

// After (explicit):
await textSearch.SearchAsync("query", new TextSearchOptions { Top = 4, Skip = 0 });

Files Modified

dotnet/src/Plugins/Plugins.Web/Google/GoogleWebPage.cs (NEW)
dotnet/src/Plugins/Plugins.Web/Google/GoogleTextSearch.cs (MODIFIED)
dotnet/samples/Concepts/TextSearch/Google_TextSearch.cs (ENHANCED)
dotnet/samples/GettingStartedWithTextSearch/Step1_Web_Search.cs (FIXED)

Breaking Changes

None. All existing GoogleTextSearch functionality preserved. Method ambiguity issues resolved through explicit typing.

Multi-PR Context

This is PR 4 of 6 in the structured implementation approach for Issue #10456. This PR extends LINQ filtering support to the GoogleTextSearch connector, following the established pattern from BingTextSearch modernization.

@moonbox3 moonbox3 added the .NET Issue or Pull requests regarding .NET code label Sep 28, 2025
…c interface tests

- Fix CS0121 compilation errors by explicitly specifying TextSearchOptions instead of new()
- Add 3 comprehensive tests for ITextSearch<GoogleWebPage> generic interface:
  * GenericSearchAsyncReturnsSuccessfullyAsync
  * GenericGetTextSearchResultsReturnsSuccessfullyAsync
  * GenericGetSearchResultsReturnsSuccessfullyAsync
- All 22 Google tests now pass (19 legacy + 3 generic)
- Validates both backward compatibility and new type-safe functionality
- Add Contains() operation support for string properties (Title, Snippet, Link)
- Implement intelligent mapping: Contains() -> orTerms for flexible matching
- Add 2 new test methods to validate LINQ filtering with Contains and equality
- Fix method ambiguity (CS0121) in GoogleTextSearchTests by using explicit TextSearchOptions types
- Fix method ambiguity in Google_TextSearch.cs sample by specifying explicit option types
- Enhance error messages with clear guidance on supported LINQ patterns and properties

This enhancement extends the basic LINQ filtering (equality only) to include
string Contains operations, providing more natural and flexible filtering
patterns while staying within Google Custom Search API capabilities.

All tests passing: 25/25 Google tests (22 existing + 3 new)
- Add ITextSearch<GoogleWebPage> interface implementation
- Support equality, contains, NOT operations, and compound AND expressions
- Map LINQ expressions to Google Custom Search API parameters
- Add GoogleWebPage strongly-typed model for search results
- Support FileFormat filtering via Google's fileType parameter
- Add comprehensive test coverage (29 tests) for all filtering patterns
- Include practical examples demonstrating enhanced filtering capabilities
- Maintain backward compatibility with existing ITextSearch interface

Resolves enhanced LINQ filtering requirements for Google Text Search plugin.
- Add UsingGoogleTextSearchWithEnhancedLinqFilteringAsync method to Google_TextSearch.cs
  * Demonstrates 6 practical LINQ filtering patterns
  * Includes equality, contains, NOT operations, FileFormat, compound AND examples
  * Shows real-world usage of ITextSearch<GoogleWebPage> interface

- Fix method ambiguity in Step1_Web_Search.cs
  * Explicitly specify TextSearchOptions type instead of target-typed new()
  * Resolves CS0121 compilation error when both legacy and generic interfaces implemented
  * Maintains tutorial clarity for getting started guide

These enhancements complete the sample code demonstrating the new LINQ filtering
capabilities while ensuring all existing tutorials continue to compile correctly.
@alzarei alzarei marked this pull request as draft October 3, 2025 07:58
@alzarei alzarei force-pushed the feature-text-search-linq-pr4 branch from 87cc4ba to 7576d94 Compare October 4, 2025 08:01
@alzarei alzarei marked this pull request as ready for review October 4, 2025 08:25
{
Top = 4,
Skip = 0,
Filter = page => page.FileFormat == "pdf" && page.Title != null && page.Title.Contains("AI") && page.Snippet != null && !page.Snippet.Contains("deprecated")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some tests to verify the filter url that is created from the different linq expressions would be good.
I'm assuming these tests are just checking that the code doesn't fail, but doesn't actually verify the output filter query is correct?

}

/// <summary>
/// The title of the webpage.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These property summaries should start with Gets or sets the to conform to the documentation standard.

#region ITextSearch<GoogleWebPage> Implementation

/// <inheritdoc/>
public async Task<KernelSearchResults<object>> GetSearchResultsAsync(string query, TextSearchOptions<GoogleWebPage>? searchOptions = null, CancellationToken cancellationToken = default)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to have the return type here be Task<KernelSearchResults<GoogleWebPage>>.
So on ITextSearch<TRecord> it would be Task<KernelSearchResults<TRecord>>

}

// Generate helpful error message with supported patterns
var supportedPatterns = new[]
Copy link
Contributor

@westey-m westey-m Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider making this a static field on the class. No need to allocate a new array of strings for each failed invocation.

"page.Prop1 == \"val1\" && page.Prop2.Contains(\"val2\") (compound AND)"
};

var supportedProperties = s_queryParameters.Select(p =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this one be a static field too?

}

// Handle string Contains: record.PropertyName.Contains("value")
if (linqExpression.Body is MethodCallExpression methodCall &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code seems very similar to that in CollectAndCombineFilters. Can this be consolidated?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

.NET Issue or Pull requests regarding .NET code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants