Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix[SSRF]: Arbitrary File Read Vulnerability #5541

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

gkhngyk
Copy link
Contributor

@gkhngyk gkhngyk commented May 25, 2024

  • Check if the URL has a valid HTTP or HTTPS protocol

Vulnerability:huntr
Found By: evrenyal

Vulnerability Details

Description
Langchain “PlaywrightWebBaseLoader” allows reading arbitrary files from the server.

Proof Of Concept

import { PlaywrightWebBaseLoader } from "langchain/document_loaders/web/playwright";

/**
 * Loader uses `page.content()`
 * as default evaluate function
 **/
const loader = new PlaywrightWebBaseLoader("file://");

const docs = await loader.load();
console.log(docs);

Response :

[
  Document {
    pageContent: '<html><head><script>start("/");</script>\n' +
      '<script>addRow("bin","bin",1,4096,"4.0 kB",1715852092,"5/16/24, 9:34:52 AM");</script>\n' +
      '<script>addRow("boot","boot",1,4096,"4.0 kB",1650277739,"4/18/22, 10:28:59 AM");</script>\n' +
      '<script>addRow("dev","dev",1,360,"360 B",1715851806,"5/16/24, 9:30:06 AM");</script>\n' +
      '<script>addRow("etc","etc",1,4096,"4.0 kB",1715852092,"5/16/24, 9:34:52 AM");</script>\n' +
      '<script>addRow("home","home",1,4096,"4.0 kB",1715040473,"5/7/24, 12:07:53 AM");</script>\n' +
      '<script>addRow("lib","lib",1,4096,"4.0 kB",1715040501,"5/7/24, 12:08:21 AM");</script>\n' +
      '<script>addRow("lib32","lib32",1,4096,"4.0 kB",1714183354,"4/27/24, 2:02:34 AM");</script>\n' +
      '<script>addRow("lib64","lib64",1,4096,"4.0 kB",1714183549,"4/27/24, 2:05:49 AM");</script>\n' +
      '<script>addRow("libx32","libx32",1,4096,"4.0 kB",1714183354,"4/27/24, 2:02:34 AM");</script>\n' +
      '<script>addRow("media","media",1,4096,"4.0 kB",1714183355,"4/27/24, 2:02:35 AM");</script>\n' +
      '<script>addRow("mnt","mnt",1,4096,"4.0 kB",1714183355,"4/27/24, 2:02:35 AM");</script>\n' +
      '<script>addRow("ms-playwright","ms-playwright",1,4096,"4.0 kB",1715853455,"5/16/24, 9:57:35 AM");</script>\n' +
      '<script>addRow("opt","opt",1,4096,"4.0 kB",1714183355,"4/27/24, 2:02:35 AM");</script>\n' +
      '<script>addRow("proc","proc",1,0,"0 B",1715851806,"5/16/24, 9:30:06 AM");</script>\n' +
      '<script>addRow("root","root",1,4096,"4.0 kB",1715852343,"5/16/24, 9:39:03 AM");</script>\n' +
      '<script>addRow("run","run",1,4096,"4.0 kB",1715040489,"5/7/24, 12:08:09 AM");</script>\n' +
      '<script>addRow("sbin","sbin",1,4096,"4.0 kB",1715040501,"5/7/24, 12:08:21 AM");</script>\n' +
      '<script>addRow("srv","srv",1,4096,"4.0 kB",1714183355,"4/27/24, 2:02:35 AM");</script>\n' +
      '<script>addRow("sys","sys",1,0,"0 B",1715851806,"5/16/24, 9:30:06 AM");</script>\n' +
      '<script>addRow("tmp","tmp",1,4096,"4.0 kB",1715853458,"5/16/24, 9:57:38 AM");</script>\n' +
      '<script>addRow("usr","usr",1,4096,"4.0 kB",1714183355,"4/27/24, 2:02:35 AM");</script>\n' +
      '<script>addRow("var","var",1,4096,"4.0 kB",1714183554,"4/27/24, 2:05:54 AM");</script>\n' +
      '<script>addRow(".dockerenv",".dockerenv",0,0,"0 B",1715851806,"5/16/24, 9:30:06 AM");</script>\n' +
      '</head><body></body></html>',
    metadata: { source: 'file://' }
  }
]

Impact
An attacker can use the “file://” url scheme and retrieve the content of arbitrary files on the system, that leads to sensitive information exposure.

- Check if the URL has a valid HTTP or HTTPS protocol

Vulnerability: https://huntr.com/bounties/23f45984-7336-48d8-a373-75b39bcd6367
Vulnerability Reporter: https://github.com/evrenyal
@dosubot dosubot bot added the size:S This PR changes 10-29 lines, ignoring generated files. label May 25, 2024
Copy link

vercel bot commented May 25, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
langchainjs-api-refs ✅ Ready (Inspect) Visit Preview 💬 Add feedback May 25, 2024 0:23am
langchainjs-docs ✅ Ready (Inspect) Visit Preview May 25, 2024 0:23am

@jacoblee93
Copy link
Collaborator

jacoblee93 commented May 25, 2024

I responded on the original bug - this is a thin proxy over Playwright, and this issue feels like a feature of the URI spec?

What if someone is using this feature?

Wouldn't it be more suitable to open a PR on Playwright?

@jacoblee93 jacoblee93 added the question Further information is requested label May 25, 2024
@gkhngyk
Copy link
Contributor Author

gkhngyk commented May 26, 2024

I responded on the original bug - this is a thin proxy over Playwright, and this issue feels like a feature of the URI spec?

What if someone is using this feature?

Wouldn't it be more suitable to open a PR on Playwright?

We offer Playwright as a document loader in Langchain.
PlaywrightWebBaseLoader is our class. And our class can forward sensitive files on the server to users. So I think it's right to open PR here.

I think we should take security measures for the codes we write, at least as much as we can.

@jacoblee93
Copy link
Collaborator

jacoblee93 commented May 26, 2024

Seems like a bug on Playwright to me. Perhaps there's a setting on their end?

@gkhngyk
Copy link
Contributor Author

gkhngyk commented May 26, 2024

SSRF is mentioned in the Langchain Python library. Instead of a url controller like in the commit, it would be useful to provide information like in the link.

Screenshot 2024-05-26 at 16 58 03

@jacoblee93
Copy link
Collaborator

Yeah I'm fine with adding a docstring, slightly more wary about making a code change (again, this is a proxy/translation layer on top of Playwright).

@gkhngyk
Copy link
Contributor Author

gkhngyk commented May 26, 2024

Great, I'm going to delete the code in the commit and add docstring.

@gkhngyk gkhngyk marked this pull request as draft May 28, 2024 10:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto:bug Related to a bug, vulnerability, unexpected error with an existing feature question Further information is requested size:S This PR changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants