Skip to content

Commit 60fce88

Browse files
authored
Merge pull request #12 from vzucher/master
🚀 feat: Batch Operations Fix, Amazon Search, Auto-Zones, and Comprehensive Improvements
2 parents 4108b23 + 928b2b4 commit 60fce88

File tree

27 files changed

+512
-1473
lines changed

27 files changed

+512
-1473
lines changed

README.md

Lines changed: 99 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,54 @@
11
# Bright Data Python SDK 🐍
22

3-
[![Tests](https://img.shields.io/badge/tests-502%2B%20passing-brightgreen)](https://github.com/vzucher/brightdata-sdk-python)
3+
[![Tests](https://img.shields.io/badge/tests-502%2B%20passing-brightgreen)](https://github.com/brightdata/sdk-python)
44
[![Python](https://img.shields.io/badge/python-3.9%2B-blue)](https://www.python.org/)
55
[![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
6-
[![Code Quality](https://img.shields.io/badge/quality-enterprise--grade-gold)](https://github.com/vzucher/brightdata-sdk-python)
6+
[![Code Quality](https://img.shields.io/badge/quality-enterprise--grade-gold)](https://github.com/brightdata/sdk-python)
77
[![Notebooks](https://img.shields.io/badge/jupyter-5%20notebooks-orange)](notebooks/)
88

99
Modern async-first Python SDK for [Bright Data](https://brightdata.com) APIs with **dataclass payloads**, **Jupyter notebooks**, comprehensive platform support, and **CLI tool** - built for data scientists and developers.
1010

1111
---
1212

13+
## 📑 Table of Contents
14+
15+
- [✨ Features](#-features)
16+
- [📓 Jupyter Notebooks](#-jupyter-notebooks-new)
17+
- [📦 Installation](#-installation)
18+
- [🚀 Quick Start](#-quick-start)
19+
- [Authentication](#authentication)
20+
- [Simple Web Scraping](#simple-web-scraping)
21+
- [Using Dataclass Payloads](#using-dataclass-payloads-type-safe-)
22+
- [Pandas Integration](#pandas-integration-for-data-scientists-)
23+
- [Platform-Specific Scraping](#platform-specific-scraping)
24+
- [Search Engine Results (SERP)](#search-engine-results-serp)
25+
- [Async Usage](#async-usage)
26+
- [🆕 What's New in v2.0.0](#-whats-new-in-v2-200)
27+
- [🏗️ Architecture](#️-architecture)
28+
- [📚 API Reference](#-api-reference)
29+
- [Client Initialization](#client-initialization)
30+
- [Connection Testing](#connection-testing)
31+
- [Zone Management](#zone-management)
32+
- [Result Objects](#result-objects)
33+
- [🖥️ CLI Usage](#️-cli-usage)
34+
- [🐼 Pandas Integration](#-pandas-integration)
35+
- [🎨 Dataclass Payloads](#-dataclass-payloads)
36+
- [🔧 Advanced Usage](#-advanced-usage)
37+
- [🧪 Testing](#-testing)
38+
- [🏛️ Design Philosophy](#️-design-philosophy)
39+
- [📖 Documentation](#-documentation)
40+
- [🔧 Troubleshooting](#-troubleshooting)
41+
- [🤝 Contributing](#-contributing)
42+
- [📊 Project Stats](#-project-stats)
43+
- [📝 License](#-license)
44+
- [🔗 Links](#-links)
45+
- [💡 Examples](#-examples)
46+
- [🎯 Roadmap](#-roadmap)
47+
- [🙏 Acknowledgments](#-acknowledgments)
48+
- [🌟 Why Choose This SDK?](#-why-choose-this-sdk)
49+
50+
---
51+
1352
## ✨ Features
1453

1554
### 🎯 **For Data Scientists**
@@ -44,11 +83,11 @@ Modern async-first Python SDK for [Bright Data](https://brightdata.com) APIs wit
4483

4584
Perfect for data scientists! Interactive tutorials with examples:
4685

47-
1. **[01_quickstart.ipynb](notebooks/01_quickstart.ipynb)** - Get started in 5 minutes [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vzucher/brightdata-sdk-python/blob/master/notebooks/01_quickstart.ipynb)
48-
2. **[02_pandas_integration.ipynb](notebooks/02_pandas_integration.ipynb)** - Work with DataFrames [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vzucher/brightdata-sdk-python/blob/master/notebooks/02_pandas_integration.ipynb)
49-
3. **[03_amazon_scraping.ipynb](notebooks/03_amazon_scraping.ipynb)** - Amazon deep dive [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vzucher/brightdata-sdk-python/blob/master/notebooks/03_amazon_scraping.ipynb)
50-
4. **[04_linkedin_jobs.ipynb](notebooks/04_linkedin_jobs.ipynb)** - Job market analysis [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vzucher/brightdata-sdk-python/blob/master/notebooks/04_linkedin_jobs.ipynb)
51-
5. **[05_batch_processing.ipynb](notebooks/05_batch_processing.ipynb)** - Scale to 1000s of URLs [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vzucher/brightdata-sdk-python/blob/master/notebooks/05_batch_processing.ipynb)
86+
1. **[01_quickstart.ipynb](notebooks/01_quickstart.ipynb)** - Get started in 5 minutes [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/brightdata/sdk-python/blob/master/notebooks/01_quickstart.ipynb)
87+
2. **[02_pandas_integration.ipynb](notebooks/02_pandas_integration.ipynb)** - Work with DataFrames [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/brightdata/sdk-python/blob/master/notebooks/02_pandas_integration.ipynb)
88+
3. **[03_amazon_scraping.ipynb](notebooks/03_amazon_scraping.ipynb)** - Amazon deep dive [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/brightdata/sdk-python/blob/master/notebooks/03_amazon_scraping.ipynb)
89+
4. **[04_linkedin_jobs.ipynb](notebooks/04_linkedin_jobs.ipynb)** - Job market analysis [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/brightdata/sdk-python/blob/master/notebooks/04_linkedin_jobs.ipynb)
90+
5. **[05_batch_processing.ipynb](notebooks/05_batch_processing.ipynb)** - Scale to 1000s of URLs [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/brightdata/sdk-python/blob/master/notebooks/05_batch_processing.ipynb)
5291

5392
---
5493

@@ -61,8 +100,8 @@ pip install brightdata-sdk
61100
Or install from source:
62101

63102
```bash
64-
git clone https://github.com/vzucher/brightdata-sdk-python.git
65-
cd brightdata-sdk-python
103+
git clone https://github.com/brightdata/sdk-python.git
104+
cd sdk-python
66105
pip install -e .
67106
```
68107

@@ -198,6 +237,21 @@ result = client.scrape.amazon.reviews(
198237
result = client.scrape.amazon.sellers(
199238
url="https://amazon.com/sp?seller=AXXXXXXXXX"
200239
)
240+
241+
# NEW: Search Amazon by keyword and filters
242+
result = client.search.amazon.products(
243+
keyword="laptop",
244+
min_price=50000, # $500 in cents
245+
max_price=200000, # $2000 in cents
246+
prime_eligible=True,
247+
condition="new"
248+
)
249+
250+
# Search by category
251+
result = client.search.amazon.products(
252+
keyword="wireless headphones",
253+
category="electronics"
254+
)
201255
```
202256

203257
#### LinkedIn Data
@@ -235,8 +289,8 @@ result = client.search.linkedin.profiles(
235289

236290
result = client.search.linkedin.posts(
237291
profile_url="https://linkedin.com/in/johndoe",
238-
start_date="2024-01-01",
239-
end_date="2024-12-31"
292+
start_date="2025-01-01",
293+
end_date="2025-12-31"
240294
)
241295
```
242296

@@ -264,8 +318,8 @@ result = client.scrape.chatgpt.prompts(
264318
result = client.scrape.facebook.posts_by_profile(
265319
url="https://facebook.com/profile",
266320
num_of_posts=10,
267-
start_date="01-01-2024",
268-
end_date="12-31-2024",
321+
start_date="01-01-2025",
322+
end_date="12-31-2025",
269323
timeout=240
270324
)
271325

@@ -286,8 +340,8 @@ result = client.scrape.facebook.posts_by_url(
286340
result = client.scrape.facebook.comments(
287341
url="https://facebook.com/post/123456",
288342
num_of_comments=100,
289-
start_date="01-01-2024",
290-
end_date="12-31-2024",
343+
start_date="01-01-2025",
344+
end_date="12-31-2025",
291345
timeout=240
292346
)
293347

@@ -330,8 +384,8 @@ result = client.scrape.instagram.reels(
330384
result = client.search.instagram.posts(
331385
url="https://instagram.com/username",
332386
num_of_posts=10,
333-
start_date="01-01-2024",
334-
end_date="12-31-2024",
387+
start_date="01-01-2025",
388+
end_date="12-31-2025",
335389
post_type="reel",
336390
timeout=240
337391
)
@@ -340,8 +394,8 @@ result = client.search.instagram.posts(
340394
result = client.search.instagram.reels(
341395
url="https://instagram.com/username",
342396
num_of_posts=50,
343-
start_date="01-01-2024",
344-
end_date="12-31-2024",
397+
start_date="01-01-2025",
398+
end_date="12-31-2025",
345399
timeout=240
346400
)
347401
```
@@ -403,7 +457,16 @@ asyncio.run(scrape_multiple())
403457

404458
---
405459

406-
## 🆕 What's New in v26.11.24
460+
## 🆕 What's New in v2 2.0.0
461+
462+
### 🆕 **Latest Updates (December 2025)**
463+
-**Amazon Search API** - NEW parameter-based product discovery
464+
-**LinkedIn Job Search Fixed** - Now builds URLs from keywords internally
465+
-**Trigger Interface** - Manual trigger/poll/fetch control for all platforms
466+
-**Auto-Create Zones** - Now enabled by default (was opt-in)
467+
-**Improved Zone Names** - `sdk_unlocker`, `sdk_serp`, `sdk_browser`
468+
-**26 Sync Wrapper Fixes** - All platform scrapers now work without context managers
469+
-**Zone Manager Tests Fixed** - All 22 tests passing
407470

408471
### 🎓 **For Data Scientists**
409472
-**5 Jupyter Notebooks** - Complete interactive tutorials
@@ -422,17 +485,18 @@ asyncio.run(scrape_multiple())
422485

423486
### 🖥️ **CLI Tool**
424487
-**`brightdata` command** - Use SDK from terminal
425-
-**Scrape operations** - `brightdata scrape amazon products --url ...`
426-
-**Search operations** - `brightdata search linkedin jobs --keyword ...`
488+
-**Scrape operations** - `brightdata scrape amazon products ...`
489+
-**Search operations** - `brightdata search amazon products --keyword ...`
427490
-**Output formats** - JSON, pretty-print, minimal
428491

429492
### 🏗️ **Architecture Improvements**
430493
-**Single AsyncEngine** - Shared across all scrapers (8x efficiency)
431494
-**Resource Optimization** - Reduced memory footprint
432495
-**Enhanced Error Messages** - Clear, actionable error messages
433-
-**502+ Tests** - Comprehensive test coverage
496+
-**500+ Tests Passing** - Comprehensive test coverage (99.4%)
434497

435-
### 🆕 **New Platforms**
498+
### 🆕 **Platforms & Features**
499+
-**Amazon Search** - Keyword-based product discovery
436500
-**Facebook Scraper** - Posts (profile/group/URL), Comments, Reels
437501
-**Instagram Scraper** - Profiles, Posts, Comments, Reels
438502
-**Instagram Search** - Posts and Reels discovery with filters
@@ -456,6 +520,7 @@ client.scrape.instagram.profiles(url="...")
456520
client.scrape.generic.url(url="...")
457521

458522
# Parameter-based discovery (search namespace)
523+
client.search.amazon.products(keyword="...", min_price=..., max_price=...)
459524
client.search.linkedin.jobs(keyword="...", location="...")
460525
client.search.instagram.posts(url="...", num_of_posts=10)
461526
client.search.google(query="...")
@@ -490,11 +555,11 @@ client = BrightDataClient(
490555
token="your_token", # Auto-loads from BRIGHTDATA_API_TOKEN if not provided
491556
customer_id="your_customer_id", # Auto-loads from BRIGHTDATA_CUSTOMER_ID (optional)
492557
timeout=30, # Default timeout in seconds
493-
web_unlocker_zone="sdk_unlocker", # Web Unlocker zone name
494-
serp_zone="sdk_serp", # SERP API zone name
495-
browser_zone="sdk_browser", # Browser API zone name
496-
auto_create_zones=False, # Auto-create missing zones
497-
validate_token=False # Validate token on init
558+
web_unlocker_zone="sdk_unlocker", # Web Unlocker zone name (default)
559+
serp_zone="sdk_serp", # SERP API zone name (default)
560+
browser_zone="sdk_browser", # Browser API zone name (default)
561+
auto_create_zones=True, # Auto-create missing zones (default: True)
562+
validate_token=False # Validate token on init (default: False)
498563
)
499564
```
500565

@@ -639,6 +704,7 @@ brightdata scrape generic \
639704
- `brightdata scrape generic url`
640705

641706
**Search Operations:**
707+
- `brightdata search amazon products`
642708
- `brightdata search linkedin jobs/profiles/posts`
643709
- `brightdata search instagram posts/reels`
644710
- `brightdata search google/bing/yandex`
@@ -1079,8 +1145,8 @@ Contributions are welcome! Please see [CONTRIBUTING.md](docs/contributing.md) fo
10791145
### Development Setup
10801146

10811147
```bash
1082-
git clone https://github.com/vzucher/brightdata-sdk-python.git
1083-
cd brightdata-sdk-python
1148+
git clone https://github.com/brightdata/sdk-python.git
1149+
cd sdk-python
10841150

10851151
# Install with dev dependencies
10861152
pip install -e ".[dev]"
@@ -1120,8 +1186,8 @@ MIT License - see [LICENSE](LICENSE) file for details.
11201186

11211187
- [Bright Data](https://brightdata.com) - Get your API token
11221188
- [API Documentation](https://docs.brightdata.com)
1123-
- [GitHub Repository](https://github.com/vzucher/brightdata-sdk-python)
1124-
- [Issue Tracker](https://github.com/vzucher/brightdata-sdk-python/issues)
1189+
- [GitHub Repository](https://github.com/brightdata/sdk-python)
1190+
- [Issue Tracker](https://github.com/brightdata/sdk-python/issues)
11251191

11261192
---
11271193

0 commit comments

Comments
 (0)