feat: Complete LinkedIn Profile Fetching Integration by alvin-reyes · Pull Request #116 · gopher-lab/tee-worker

alvin-reyes · 2025-06-22T15:27:43Z

🔗 LinkedIn Profile Fetching Integration

This PR adds comprehensive LinkedIn profile search and fetching capabilities to the tee-worker, integrating with the new linkedin-scraper SDK v1.0.0.

📋 What's New

LinkedIn Job Types

searchbyquery - Search LinkedIn profiles by keywords with advanced filtering
getprofile - Fetch detailed LinkedIn profile information by public identifier

Rich Profile Data

Complete profile information - Name, headline, location, summary
Work experience - Full employment history with dates and descriptions
Education history - Schools, degrees, and academic background
Skills - Professional skills and competencies
Profile pictures - High-quality profile image URLs

Smart Capability Detection

Auto-detection - LinkedIn capabilities automatically detected when credentials are present
Credential validation - Requires all three LinkedIn credentials (li_at_cookie, csrf_token, jsessionid)
Graceful fallback - Workers operate normally without LinkedIn credentials

🛠️ Technical Implementation

Dependencies

Updated to linkedin-scraper v1.0.0
Updated to tee-types v1.0.0
Uses new LinkedInArguments and LinkedInFullProfileResult structures

Error Handling

Authentication errors - Proper handling of expired/invalid credentials
Rate limiting - Graceful handling of LinkedIn API limits
Not found errors - Clean error messages for invalid profiles
Stats tracking - Comprehensive metrics for all operation types

Job Arguments

{
  "type": "linkedin-scraper",
  "arguments": {
    "type": "searchbyquery",
    "query": "software engineer",
    "network_filters": ["F", "S", "O"],
    "max_results": 10
  }
}

{
  "type": "linkedin-scraper", 
  "arguments": {
    "type": "getprofile",
    "public_identifier": "john-doe-123"
  }
}

🧪 Testing

5/5 LinkedIn tests passing with real API integration
Comprehensive test coverage for both search and profile fetching
Error scenario testing - Invalid credentials, timeouts, not found cases
Integration tests - End-to-end workflow validation

🔧 Configuration

Environment Variables

LINKEDIN_LI_AT_COOKIE=your_li_at_cookie
LINKEDIN_CSRF_TOKEN=your_csrf_token  
LINKEDIN_JSESSIONID=your_jsessionid

Capability Detection

LinkedIn capabilities (searchbyquery, getprofile) are automatically detected when all three credentials are present.

📊 Statistics

New LinkedIn-specific metrics:

linkedin_scrapes - Total LinkedIn operations
linkedin_returned_profiles - Profiles successfully returned
linkedin_errors - General errors
linkedin_auth_errors - Authentication failures
linkedin_ratelimit_errors - Rate limit errors

🚀 Benefits

Expands data collection - Access to professional LinkedIn profiles
High-quality data - Comprehensive profile information
Reliable operation - Robust error handling and credential validation
Scalable architecture - Follows existing job patterns and conventions
No breaking changes - Fully backward compatible

📈 Performance

Efficient profile fetching - Optimized API calls
Timeout handling - Configurable timeouts for all operations
Memory efficient - Streams large profile datasets
Stats tracking - Real-time monitoring of operation success rates

This implementation provides a solid foundation for LinkedIn data collection while maintaining the tee-worker's reliability and performance standards.

Fixes https://github.com/masa-finance/tee-indexer/issues/226

This puts the ground of the main tee-worker component. It is composed of a simple http server which acts as a job server, a client to interact with it, and the scaffolding required to run tests and build signed binaries. Signed-off-by: mudler <mudler@localai.io>