Skip to content

Commit

Permalink
Support GCS Objects with / in GCS Loaders (langchain-ai#3356)
Browse files Browse the repository at this point in the history
So, this is basically fixing the same things as langchain-ai#1517 but for GCS.

### Problem
When loading GCS Objects with `/` in the object key (eg.
folder/some-document.txt) using `GCSFileLoader`, the objects are
downloaded into a temporary directory and saved as a file.

This errors out when the parent directory does not exist within the
temporary directory.

### What this pr does
Creates parent directories based on object key.

This also works with deeply nested keys:
folder/subfolder/some-document.txt
  • Loading branch information
vieiralucas authored Apr 25, 2023
1 parent a4d85f7 commit e6c1c32
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 0 deletions.
4 changes: 4 additions & 0 deletions langchain/document_loaders/gcs_directory.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,10 @@ def load(self) -> List[Document]:
client = storage.Client(project=self.project_name)
docs = []
for blob in client.list_blobs(self.bucket, prefix=self.prefix):
# we shall just skip directories since GCSFileLoader creates
# intermediate directories on the fly
if blob.name.endswith("/"):
continue
loader = GCSFileLoader(self.project_name, self.bucket, blob.name)
docs.extend(loader.load())
return docs
2 changes: 2 additions & 0 deletions langchain/document_loaders/gcs_file.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
"""Loading logic for loading documents from a GCS file."""
import os
import tempfile
from typing import List

Expand Down Expand Up @@ -34,6 +35,7 @@ def load(self) -> List[Document]:
blob = bucket.blob(self.blob)
with tempfile.TemporaryDirectory() as temp_dir:
file_path = f"{temp_dir}/{self.blob}"
os.makedirs(os.path.dirname(file_path), exist_ok=True)
# Download the file to a destination
blob.download_to_filename(file_path)
loader = UnstructuredFileLoader(file_path)
Expand Down

0 comments on commit e6c1c32

Please sign in to comment.