Skip to content

Feat: Lightpanda extension #2192

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .changeset/rare-mails-fail.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
"@trigger.dev/build": patch
"trigger.dev": patch
"@trigger.dev/core": patch
"@trigger.dev/sdk": patch
---

Adding Lightpanda extension
Original file line number Diff line number Diff line change
Expand Up @@ -715,6 +715,7 @@ function HelpfulInfoHasTasks({ onClose }: { onClose: () => void }) {
isExternal
/>
<LinkWithIcon to={docsPath("/examples/puppeteer")} description="Puppeteer" isExternal />
<LinkWithIcon to={docsPath("/examples/lightpanda")} description="Lightpanda" isExternal />
<LinkWithIcon to={docsPath("/examples/react-pdf")} description="React to PDF" isExternal />
<LinkWithIcon
to={docsPath("/examples/resend-email-sequence")}
Expand Down
4 changes: 4 additions & 0 deletions docs/config/config-file.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -428,6 +428,10 @@ See the [syncEnvVars documentation](/config/extensions/syncEnvVars) for more inf

See the [puppeteer documentation](/config/extensions/puppeteer) for more information.

#### lightpanda

See the [Lightpanda documentation](/config/extensions/lightpanda) for more information.

#### ffmpeg

See the [ffmpeg documentation](/config/extensions/ffmpeg) for more information.
Expand Down
30 changes: 30 additions & 0 deletions docs/config/extensions/lightpanda.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
---
title: "Lightpanda"
sidebarTitle: "lightpanda"
description: "Use the lightpanda build extension to be able to use Lightpanda Browser in your project"
---

<ScrapingWarning />

To use Lightpanda in your project, add these build settings to your `trigger.config.ts` file:

```ts trigger.config.ts
import { defineConfig } from "@trigger.dev/sdk/v3";
import { lightpanda } from "@trigger.dev/build/extensions/lightpanda";

export default defineConfig({
project: "<project ref>",
// Your other config settings...
build: {
extensions: [lightpanda()],
},
});
```

And add the following environment variable in your Trigger.dev dashboard on the Environment Variables page:

```bash
LIGHTPANDA_BROWSER_PATH: "/usr/bin/lightpanda",
```

Follow [this example](/guides/examples/lightpanda) to get setup with Trigger.dev and Lightpanda in your project.
1 change: 1 addition & 0 deletions docs/config/extensions/overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ Trigger.dev provides a set of built-in extensions that you can use to customize
| [pythonExtension](/config/extensions/pythonExtension) | Execute Python scripts in your project |
| [playwright](/config/extensions/playwright) | Use Playwright in your Trigger.dev tasks |
| [puppeteer](/config/extensions/puppeteer) | Use Puppeteer in your Trigger.dev tasks |
| [lightpanda](/config/extensions/lightpanda) | Use Lightpanda in your Trigger.dev tasks |
| [ffmpeg](/config/extensions/ffmpeg) | Use FFmpeg in your Trigger.dev tasks |
| [aptGet](/config/extensions/aptGet) | Install system packages in your build image |
| [additionalFiles](/config/extensions/additionalFiles) | Copy additional files to your build image |
Expand Down
2 changes: 2 additions & 0 deletions docs/docs.json
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,7 @@
"config/extensions/pythonExtension",
"config/extensions/playwright",
"config/extensions/puppeteer",
"config/extensions/lightpanda",
"config/extensions/ffmpeg",
"config/extensions/aptGet",
"config/extensions/additionalFiles",
Expand Down Expand Up @@ -358,6 +359,7 @@
"guides/examples/fal-ai-realtime",
"guides/examples/ffmpeg-video-processing",
"guides/examples/firecrawl-url-crawl",
"guides/examples/lightpanda",
"guides/examples/libreoffice-pdf-conversion",
"guides/examples/open-ai-with-retrying",
"guides/examples/pdf-to-image",
Expand Down
245 changes: 245 additions & 0 deletions docs/guides/examples/lightpanda.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,245 @@
---
title: "Get a webpage's content using Lightpanda browser"
sidebarTitle: "Lightpanda"
description: "In these examples, we will show you how to crawl using Lightpanda browser and Trigger.dev."
tag: "v4"
---

## Overview

Lightpanda is a purpose-built browser for AI and automation workflows. It is 10x faster, uses 10x less RAM than Chrome headless.

You will find here are a couple of examples of how to use Lightpanda with Trigger.dev.

<Warning>
When using Lightpanda, we recommend that you respect robots.txt files and avoid high frequency requesting websites.
DDOS could happen fast for small infrastructures.
</Warning>

## Prerequisites

- A project with [Trigger.dev initialized](/quick-start)
- A [Lightpanda](https://lightpanda.io/) cloud token (for the 1st example)

## Example \#1 - Get links from a website using Lightpanda cloud & Puppeteer

In this task, we use Lightpanda browser to get links from a provided URL.
You will have to pass the URL as a payload when triggering the task.

Make sure to add `$LIGHTPANDA_TOKEN` to your Trigger.dev dashboard on the Environment Variables page:
```bash
LIGHTPANDA_TOKEN: "<your-token>",
```

```ts trigger/lightpanda-cloud-puppeteer.ts
import { logger, task } from '@trigger.dev/sdk/v3'
import puppeteer from 'puppeteer'

export const lightpandaCloudPuppeteer = task({
id: 'lightpanda-cloud-puppeteer',
machine: {
preset: 'micro',
},
run: async (payload: { url: string }, { ctx }) => {
logger.log("Lets get a page's links with Lightpanda!", { payload, ctx })
if (!payload.url) {
logger.warn('Please define the payload url')
throw new Error('payload.url is undefined')
}

if (typeof process.env.LIGHTPANDA_TOKEN === 'undefined') {
logger.warn('Please define the env variable $LIGHTPANDA_TOKEN', {
env: process.env,
})
throw new Error('$LIGHTPANDA_TOKEN is undefined')
}

// Connect to Lightpanda's cloud
const browser = await puppeteer.connect({
browserWSEndpoint: `wss://cloud.lightpanda.io/ws?browser=lightpanda&token=${process.env.LIGHTPANDA_TOKEN}`,
})
const context = await browser.createBrowserContext()
const page = await context.newPage()

// Dump all the links from the page.
await page.goto(payload.url)

const links = await page.evaluate(() => {
return Array.from(document.querySelectorAll('a')).map(row => {
return row.getAttribute('href')
})
})

logger.info('Processing done')
logger.info('Shutting down…')

await page.close()
await context.close()
await browser.disconnect()

logger.info('✅ Completed')

return {
links,
}
},
})
```
### Proxies

Proxies can be used with your browser via the proxy query string parameter. By default, the proxy used is "datacenter" which is a pool of shared datacenter IPs.
`datacenter` accepts an optional `country` query string parameter, an [ISO 3166-1 alpha-2](https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2) country code.

_Example using a German IP :_

```wss://cloud.lightpanda.io/ws?proxy=datacenter&country=de&token=TOKEN```


### Session
A session is alive until you close it or the connection is closed. The max time duration of a session is 15 min.


## Example \#2 - Get a webpage using Lightpanda

Using the Lightpanda binary we will dump the HTML for a provided URL.
You will have to pass the URL as a payload when triggering the task.


### Prerequisites
- Setup the [Lightpanda build extension](/config/extensions/lightpanda)

### Task
```ts trigger/lightpanda-lightpanda-fetch.ts
import { logger, task } from '@trigger.dev/sdk/v3'
import { execSync } from 'node:child_process'

export const lightpandaFetch = task({
id: 'lightpanda-fetch',
machine: {
preset: "micro",
},
run: async (payload: { url: string }, { ctx }) => {
logger.log("Lets get a page's content with Lightpanda!", { payload, ctx })

if (!payload.url) {
logger.warn('Please define the payload url')
throw new Error('payload.url is undefined')
}

if (typeof process.env.LIGHTPANDA_BROWSER_PATH === 'undefined') {
logger.warn('Please define the env variable $LIGHTPANDA_BROWSER_PATH', {
env: process.env,
})
throw new Error('$LIGHTPANDA_BROWSER_PATH is undefined')
}

const e = execSync(`${process.env.LIGHTPANDA_BROWSER_PATH} fetch --dump ${payload.url}`)

logger.info('✅ Completed')

return {
message: e.toString(),
}
},
})
```

## Example \#3 - Launch and use a Lightpanda CDP server

This task initialises a Lightpanda CDP server to allow you to scrape directly via Trigger.dev.

### Prerequisites
- Setup the [Lightpanda build extension](/config/extensions/lightpanda)

### Task
Your task will have to launch a child process in order to have the websocket available to scrape using Puppeteer.

```ts trigger/lightpandaCDP.ts
import { logger, task } from '@trigger.dev/sdk/v3'
import { spawn, type ChildProcessWithoutNullStreams } from 'node:child_process'
import puppeteer from 'puppeteer'

const spawnLightpanda = async (log: typeof logger) =>
new Promise<ChildProcessWithoutNullStreams>((resolve, reject) => {
const child = spawn(process.env.LIGHTPANDA_BROWSER_PATH as string, [
'serve',
'--host',
'127.0.0.1',
'--port',
'9222',
'--log_level',
'info',
])

child.on('spawn', async () => {
log.info("Running Lightpanda's CDP server…", {
pid: child.pid,
})

await new Promise(resolve => setTimeout(resolve, 250))
resolve(child)
})
child.on('error', e => reject(e))
})

export const lightpandaCDP = task({
id: 'lightpanda-cdp',
machine: {
preset: 'micro',
},
run: async (payload: { url: string }, { ctx }) => {
logger.log("Lets get a page's links with Lightpanda!", { payload, ctx })

if (!payload.url) {
logger.warn('Please define the payload url')
throw new Error('payload.url is undefined')
}

if (typeof process.env.LIGHTPANDA_BROWSER_PATH === 'undefined') {
logger.warn('Please define the env variable $LIGHTPANDA_BROWSER_PATH', {
env: process.env,
})
throw new Error('$LIGHTPANDA_BROWSER_PATH is undefined')
}

try {
// Launch Lightpanda's CDP server
const lpProcess = await spawnLightpanda(logger)

const browser = await puppeteer.connect({
browserWSEndpoint: 'ws://127.0.0.1:9222',
})
const context = await browser.createBrowserContext()
const page = await context.newPage()

// Dump all the links from the page.
await page.goto(payload.url)

const links = await page.evaluate(() => {
return Array.from(document.querySelectorAll('a')).map(row => {
return row.getAttribute('href')
})
})

logger.info('Processing done')
logger.info('Shutting down…')

// Close Puppeteer instance
await browser.close()

// Stop Lightpanda's CDP Server
lpProcess.stdout.destroy()
lpProcess.stderr.destroy()
lpProcess.kill()

logger.info('✅ Completed')

return {
links,
}
} catch (e: any) {
throw new Error(e)
}
},
})
```
1 change: 1 addition & 0 deletions docs/guides/introduction.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@ Task code you can copy and paste to use in your project. They can all be extende
| [FFmpeg video processing](/guides/examples/ffmpeg-video-processing) | Use FFmpeg to process a video in various ways and save it to Cloudflare R2. |
| [Firecrawl URL crawl](/guides/examples/firecrawl-url-crawl) | Learn how to use Firecrawl to crawl a URL and return LLM-ready markdown. |
| [LibreOffice PDF conversion](/guides/examples/libreoffice-pdf-conversion) | Convert a document to PDF using LibreOffice. |
| [Lightpanda](/guides/examples/lightpanda) | Use Lightpanda browser (or cloud version) to get a webpage's content. |
| [OpenAI with retrying](/guides/examples/open-ai-with-retrying) | Create a reusable OpenAI task with custom retry options. |
| [PDF to image](/guides/examples/pdf-to-image) | Use `MuPDF` to turn a PDF into images and save them to Cloudflare R2. |
| [Puppeteer](/guides/examples/puppeteer) | Use Puppeteer to generate a PDF or scrape a webpage. |
Expand Down
Binary file added docs/images/intro-lightpanda.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 2 additions & 0 deletions docs/introduction.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,7 @@ We provide everything you need to build and manage background tasks: a CLI and S
<Card title="Supabase" img="/images/intro-supabase.jpg" href="/guides/examples/supabase-database-operations"/>
<Card title="DALL•E" img="/images/intro-openai.jpg" href="/guides/examples/dall-e3-generate-image"/>
<Card title="Firecrawl" img="/images/intro-firecrawl.jpg" href="/guides/examples/firecrawl-url-crawl"/>
<Card title="Lightpanda" img="/images/intro-lightpanda.jpg" href="/guides/examples/lightpanda"/>
</CardGroup>

## Explore by build extension
Expand All @@ -92,6 +93,7 @@ We provide everything you need to build and manage background tasks: a CLI and S
| prismaExtension | Use Prisma with Trigger.dev | [Learn more](/config/extensions/prismaExtension) |
| pythonExtension | Execute Python scripts in Trigger.dev | [Learn more](/config/extensions/pythonExtension) |
| puppeteer | Use Puppeteer with Trigger.dev | [Learn more](/config/extensions/puppeteer) |
| lightpanda | Use Lightpanda Browser with Trigger.dev | [Learn more](/config/extensions/lightpanda) |
| ffmpeg | Use FFmpeg with Trigger.dev | [Learn more](/config/extensions/ffmpeg) |
| aptGet | Install system packages with aptGet | [Learn more](/config/extensions/aptGet) |
| additionalFiles | Copy additional files to the build directory | [Learn more](/config/extensions/additionalFiles) |
Expand Down
17 changes: 16 additions & 1 deletion packages/build/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,8 @@
"./extensions/audioWaveform": "./src/extensions/audioWaveform.ts",
"./extensions/typescript": "./src/extensions/typescript.ts",
"./extensions/puppeteer": "./src/extensions/puppeteer.ts",
"./extensions/playwright": "./src/extensions/playwright.ts"
"./extensions/playwright": "./src/extensions/playwright.ts",
"./extensions/lightpanda": "./src/extensions/lightpanda.ts"
},
"sourceDialects": [
"@triggerdotdev/source"
Expand Down Expand Up @@ -61,6 +62,9 @@
],
"extensions/playwright": [
"dist/commonjs/extensions/playwright.d.ts"
],
"extensions/lightpanda": [
"dist/commonjs/extensions/lightpanda.d.ts"
]
}
},
Expand Down Expand Up @@ -188,6 +192,17 @@
"types": "./dist/commonjs/extensions/playwright.d.ts",
"default": "./dist/commonjs/extensions/playwright.js"
}
},
"./extensions/lightpanda": {
"import": {
"@triggerdotdev/source": "./src/extensions/lightpanda.ts",
"types": "./dist/esm/extensions/lightpanda.d.ts",
"default": "./dist/esm/extensions/lightpanda.js"
},
"require": {
"types": "./dist/commonjs/extensions/lightpanda.d.ts",
"default": "./dist/commonjs/extensions/lightpanda.js"
}
}
},
"main": "./dist/commonjs/index.js",
Expand Down
Loading