-
Notifications
You must be signed in to change notification settings - Fork 13
Description
Hi, thank you for your great app, unfortunately it does not work properly for me.
Description
The local folder adapter successfully uploads PDF files during the first sync, but skips them on all subsequent syncs, classifying them as "binary files".
Steps to Reproduce
- Configure local folder adapter with a folder containing PDF files
- Start the container - PDFs are uploaded successfully ✅
- Restart the container or wait for next scheduled sync
- Check logs - PDFs are now skipped ❌
Expected Behavior
PDF files should be uploaded consistently on every sync, as they are valid document types for knowledge bases and Open WebUI's RAG system can process them.
Actual Behavior
time="2025-11-15T19:19:16Z" level=debug msg="Skipping binary file: /sync-folder/document.pdf"
time="2025-11-15T19:19:16Z" level=debug msg="Skipping binary file: /sync-folder/contract.pdf"
PDFs are skipped due to the isBinaryFile() check in internal/adapter/local.go (lines 126-129 and 237-242).
Root Cause
The isBinaryFile() function checks for null bytes (0x00) in file content. PDF files naturally contain null bytes as part of their binary structure, causing them to be incorrectly classified as "binary files to skip".
// internal/adapter/local.go:237
func (l *LocalFolderAdapter) isBinaryFile(content []byte) bool {
for i := 0; i < len(content) && i < 1024; i++ {
if content[i] == 0 { // PDFs contain null bytes!
return true
}
}
// ...
}Environment
- Version:
ghcr.io/castai/openwebui-content-sync:latest(as of 2025-11-15) - Docker: Latest
- OS: Synology DSM
Proposed Solution
Option 1 (Recommended): Remove binary file check entirely, as Open WebUI can handle all document types:
// Remove lines 126-129 in local.goOption 2: Whitelist common document formats:
allowedExtensions := []string{".pdf", ".docx", ".doc", ".txt", ".md", ".csv"}
if l.isBinaryFile(content) && !hasAllowedExtension(path, allowedExtensions) {
skip()
}Option 3: Check file extension instead of content:
skipExtensions := []string{".exe", ".zip", ".tar", ".gz", ".jpg", ".png", ".gif"}
if hasSkipExtension(path, skipExtensions) {
skip()
}Impact
This bug prevents automatic synchronization of PDF documents, which are essential for legal, technical, and business knowledge bases.
Additional Context
First sync works because the file index is empty. Subsequent syncs read from the index but still run the binary check during file scanning, causing PDFs to be skipped.