Skip to content

Commit 0185c7f

Browse files
committed
feat: limit batch size to 1!
As a temporary fix to llama-index first loading into vectorstore issue, we limit the batch size to 1. The issue described: In llama-index pipeline when trying to load documents into vectorstore, it first loads into docstore and then into vectorstore. In any case problems raised while loading into docstore the data would be missed to be loaded into vectorstore. So we limit the batch size to 1 meaning the data will be 1 by 1 loaded into docstore + vectorstore.
1 parent f6f3d49 commit 0185c7f

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

hivemind_etl/mediawiki/etl.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -103,7 +103,7 @@ def load(self, documents: list[Document]) -> None:
103103
)
104104

105105
# Process batches in parallel using ThreadPoolExecutor
106-
batch_size = 1000
106+
batch_size = 1
107107
batches = [documents[i:i + batch_size] for i in range(0, len(documents), batch_size)]
108108

109109
with ThreadPoolExecutor(max_workers=10) as executor:

0 commit comments

Comments
 (0)