-
-
Notifications
You must be signed in to change notification settings - Fork 745
Feat: Lightpanda extension #2192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
nrigaudiere
wants to merge
12
commits into
triggerdotdev:main
Choose a base branch
from
nrigaudiere:feat/lightpanda-extension
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
b3c5b9b
feat: add lightpanda structure
nrigaudiere 85198b8
chore: add lightpanda doc links
nrigaudiere 29dbf66
fix: lightpanda extension instructions
nrigaudiere 33ffbcf
feat: add Lightpanda guide and examples
nrigaudiere d5505b5
feat: lightpanda - add 3rd example
nrigaudiere f43f8af
feat: add lightpandaTask
nrigaudiere 11efe5c
fix: lightpanda 3rd example
nrigaudiere 806aeff
fix: lightpanda 1st example
nrigaudiere 76d936c
chore: add changeset
nrigaudiere 72225e7
add v4 tag to guide
nicktrn fcc50a6
fix: merge lightpanda docker instructions
nrigaudiere 60f3357
fix: add failsafes
nrigaudiere File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
--- | ||
"@trigger.dev/build": patch | ||
"trigger.dev": patch | ||
"@trigger.dev/core": patch | ||
"@trigger.dev/sdk": patch | ||
--- | ||
|
||
Adding Lightpanda extension |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
--- | ||
title: "Lightpanda" | ||
sidebarTitle: "lightpanda" | ||
description: "Use the lightpanda build extension to be able to use Lightpanda Browser in your project" | ||
--- | ||
|
||
<ScrapingWarning /> | ||
|
||
To use Lightpanda in your project, add these build settings to your `trigger.config.ts` file: | ||
|
||
```ts trigger.config.ts | ||
import { defineConfig } from "@trigger.dev/sdk/v3"; | ||
import { lightpanda } from "@trigger.dev/build/extensions/lightpanda"; | ||
|
||
export default defineConfig({ | ||
project: "<project ref>", | ||
// Your other config settings... | ||
build: { | ||
extensions: [lightpanda()], | ||
}, | ||
}); | ||
``` | ||
|
||
And add the following environment variable in your Trigger.dev dashboard on the Environment Variables page: | ||
|
||
```bash | ||
LIGHTPANDA_BROWSER_PATH: "/usr/bin/lightpanda", | ||
``` | ||
|
||
Follow [this example](/guides/examples/lightpanda) to get setup with Trigger.dev and Lightpanda in your project. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,245 @@ | ||
--- | ||
title: "Get a webpage's content using Lightpanda browser" | ||
sidebarTitle: "Lightpanda" | ||
description: "In these examples, we will show you how to crawl using Lightpanda browser and Trigger.dev." | ||
tag: "v4" | ||
--- | ||
|
||
## Overview | ||
|
||
Lightpanda is a purpose-built browser for AI and automation workflows. It is 10x faster, uses 10x less RAM than Chrome headless. | ||
|
||
You will find here are a couple of examples of how to use Lightpanda with Trigger.dev. | ||
|
||
<Warning> | ||
When using Lightpanda, we recommend that you respect robots.txt files and avoid high frequency requesting websites. | ||
DDOS could happen fast for small infrastructures. | ||
</Warning> | ||
|
||
## Prerequisites | ||
|
||
- A project with [Trigger.dev initialized](/quick-start) | ||
- A [Lightpanda](https://lightpanda.io/) cloud token (for the 1st example) | ||
|
||
## Example \#1 - Get links from a website using Lightpanda cloud & Puppeteer | ||
|
||
In this task, we use Lightpanda browser to get links from a provided URL. | ||
You will have to pass the URL as a payload when triggering the task. | ||
|
||
Make sure to add `$LIGHTPANDA_TOKEN` to your Trigger.dev dashboard on the Environment Variables page: | ||
```bash | ||
LIGHTPANDA_TOKEN: "<your-token>", | ||
``` | ||
|
||
```ts trigger/lightpanda-cloud-puppeteer.ts | ||
import { logger, task } from '@trigger.dev/sdk/v3' | ||
import puppeteer from 'puppeteer' | ||
|
||
export const lightpandaCloudPuppeteer = task({ | ||
id: 'lightpanda-cloud-puppeteer', | ||
machine: { | ||
preset: 'micro', | ||
}, | ||
run: async (payload: { url: string }, { ctx }) => { | ||
logger.log("Lets get a page's links with Lightpanda!", { payload, ctx }) | ||
if (!payload.url) { | ||
logger.warn('Please define the payload url') | ||
throw new Error('payload.url is undefined') | ||
} | ||
nicktrn marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
if (typeof process.env.LIGHTPANDA_TOKEN === 'undefined') { | ||
logger.warn('Please define the env variable $LIGHTPANDA_TOKEN', { | ||
env: process.env, | ||
}) | ||
throw new Error('$LIGHTPANDA_TOKEN is undefined') | ||
} | ||
|
||
// Connect to Lightpanda's cloud | ||
const browser = await puppeteer.connect({ | ||
browserWSEndpoint: `wss://cloud.lightpanda.io/ws?browser=lightpanda&token=${process.env.LIGHTPANDA_TOKEN}`, | ||
}) | ||
const context = await browser.createBrowserContext() | ||
const page = await context.newPage() | ||
|
||
// Dump all the links from the page. | ||
await page.goto(payload.url) | ||
|
||
const links = await page.evaluate(() => { | ||
return Array.from(document.querySelectorAll('a')).map(row => { | ||
return row.getAttribute('href') | ||
}) | ||
}) | ||
|
||
logger.info('Processing done') | ||
logger.info('Shutting down…') | ||
|
||
await page.close() | ||
await context.close() | ||
await browser.disconnect() | ||
|
||
logger.info('✅ Completed') | ||
|
||
return { | ||
links, | ||
} | ||
}, | ||
}) | ||
``` | ||
### Proxies | ||
|
||
Proxies can be used with your browser via the proxy query string parameter. By default, the proxy used is "datacenter" which is a pool of shared datacenter IPs. | ||
`datacenter` accepts an optional `country` query string parameter, an [ISO 3166-1 alpha-2](https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2) country code. | ||
|
||
_Example using a German IP :_ | ||
|
||
```wss://cloud.lightpanda.io/ws?proxy=datacenter&country=de&token=TOKEN``` | ||
|
||
|
||
### Session | ||
A session is alive until you close it or the connection is closed. The max time duration of a session is 15 min. | ||
|
||
|
||
## Example \#2 - Get a webpage using Lightpanda | ||
|
||
Using the Lightpanda binary we will dump the HTML for a provided URL. | ||
You will have to pass the URL as a payload when triggering the task. | ||
|
||
|
||
### Prerequisites | ||
- Setup the [Lightpanda build extension](/config/extensions/lightpanda) | ||
|
||
### Task | ||
```ts trigger/lightpanda-lightpanda-fetch.ts | ||
import { logger, task } from '@trigger.dev/sdk/v3' | ||
import { execSync } from 'node:child_process' | ||
|
||
export const lightpandaFetch = task({ | ||
id: 'lightpanda-fetch', | ||
machine: { | ||
preset: "micro", | ||
}, | ||
run: async (payload: { url: string }, { ctx }) => { | ||
logger.log("Lets get a page's content with Lightpanda!", { payload, ctx }) | ||
|
||
if (!payload.url) { | ||
logger.warn('Please define the payload url') | ||
throw new Error('payload.url is undefined') | ||
} | ||
|
||
if (typeof process.env.LIGHTPANDA_BROWSER_PATH === 'undefined') { | ||
logger.warn('Please define the env variable $LIGHTPANDA_BROWSER_PATH', { | ||
env: process.env, | ||
}) | ||
throw new Error('$LIGHTPANDA_BROWSER_PATH is undefined') | ||
} | ||
|
||
const e = execSync(`${process.env.LIGHTPANDA_BROWSER_PATH} fetch --dump ${payload.url}`) | ||
nicktrn marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
logger.info('✅ Completed') | ||
|
||
return { | ||
message: e.toString(), | ||
} | ||
}, | ||
}) | ||
``` | ||
|
||
## Example \#3 - Launch and use a Lightpanda CDP server | ||
|
||
This task initialises a Lightpanda CDP server to allow you to scrape directly via Trigger.dev. | ||
|
||
### Prerequisites | ||
- Setup the [Lightpanda build extension](/config/extensions/lightpanda) | ||
|
||
### Task | ||
Your task will have to launch a child process in order to have the websocket available to scrape using Puppeteer. | ||
|
||
```ts trigger/lightpandaCDP.ts | ||
import { logger, task } from '@trigger.dev/sdk/v3' | ||
import { spawn, type ChildProcessWithoutNullStreams } from 'node:child_process' | ||
import puppeteer from 'puppeteer' | ||
|
||
const spawnLightpanda = async (log: typeof logger) => | ||
new Promise<ChildProcessWithoutNullStreams>((resolve, reject) => { | ||
const child = spawn(process.env.LIGHTPANDA_BROWSER_PATH as string, [ | ||
'serve', | ||
'--host', | ||
'127.0.0.1', | ||
'--port', | ||
'9222', | ||
'--log_level', | ||
'info', | ||
]) | ||
|
||
child.on('spawn', async () => { | ||
log.info("Running Lightpanda's CDP server…", { | ||
pid: child.pid, | ||
}) | ||
|
||
await new Promise(resolve => setTimeout(resolve, 250)) | ||
resolve(child) | ||
}) | ||
child.on('error', e => reject(e)) | ||
}) | ||
|
||
export const lightpandaCDP = task({ | ||
id: 'lightpanda-cdp', | ||
machine: { | ||
preset: 'micro', | ||
}, | ||
run: async (payload: { url: string }, { ctx }) => { | ||
logger.log("Lets get a page's links with Lightpanda!", { payload, ctx }) | ||
|
||
if (!payload.url) { | ||
logger.warn('Please define the payload url') | ||
throw new Error('payload.url is undefined') | ||
} | ||
|
||
if (typeof process.env.LIGHTPANDA_BROWSER_PATH === 'undefined') { | ||
logger.warn('Please define the env variable $LIGHTPANDA_BROWSER_PATH', { | ||
env: process.env, | ||
}) | ||
throw new Error('$LIGHTPANDA_BROWSER_PATH is undefined') | ||
} | ||
|
||
try { | ||
// Launch Lightpanda's CDP server | ||
const lpProcess = await spawnLightpanda(logger) | ||
|
||
const browser = await puppeteer.connect({ | ||
browserWSEndpoint: 'ws://127.0.0.1:9222', | ||
}) | ||
const context = await browser.createBrowserContext() | ||
const page = await context.newPage() | ||
|
||
// Dump all the links from the page. | ||
await page.goto(payload.url) | ||
|
||
const links = await page.evaluate(() => { | ||
return Array.from(document.querySelectorAll('a')).map(row => { | ||
return row.getAttribute('href') | ||
}) | ||
}) | ||
|
||
logger.info('Processing done') | ||
logger.info('Shutting down…') | ||
|
||
// Close Puppeteer instance | ||
await browser.close() | ||
|
||
// Stop Lightpanda's CDP Server | ||
lpProcess.stdout.destroy() | ||
lpProcess.stderr.destroy() | ||
lpProcess.kill() | ||
nicktrn marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
logger.info('✅ Completed') | ||
|
||
return { | ||
links, | ||
} | ||
} catch (e: any) { | ||
throw new Error(e) | ||
} | ||
}, | ||
}) | ||
``` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.