-
Notifications
You must be signed in to change notification settings - Fork 0
feat: Add file filtering with glob patterns (include/exclude) #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Implements configurable file filtering to exclude or include specific DBF files from processing based on glob patterns. Features: - Whitelist (include_patterns): Only process files matching patterns - Blacklist (exclude_patterns): Skip files matching patterns - Case-insensitive glob matching (*.dbf matches FILE.DBF) - Patterns support wildcards: *, ?, [abc], etc. - Validates patterns at config load time - Auto-reload when config changes - Detailed logging of filtered files Configuration example in config.toml: ```toml [src] source_dir = "C:\data" exclude_patterns = ["nsfcli.DBF", "NsfMod.dbf", "temp_*.dbf"] # include_patterns = ["nsf*.DBF"] # Optional whitelist ``` Changes: - Added globset dependency for pattern matching - Extended SourceConfig with include_patterns/exclude_patterns - Created filter module with FileFilter struct - Updated scanner to apply filtering during directory walk - Updated all tests to include new optional fields - Added comprehensive filter tests (6 test cases) Resolves issue with problematic DBF files (nsfcli.DBF, NsfMod.dbf). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Provides comprehensive example configuration including: - File filtering patterns (include/exclude) - Pattern syntax examples and best practices - Comments explaining all configuration options 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // Clear old cache and set new one | ||
| // Note: OnceLock doesn't support clearing, so we just overwrite | ||
| // This is called once per batch, so it's acceptable | ||
| let _ = CACHED_FILTER.set(filter); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allow filter cache to refresh after config reloads
The global filter cache is stored in a OnceLock, and set_global_filter calls CACHED_FILTER.set(filter) on every scan while discarding the result. OnceLock::set succeeds only once, so after the first batch all subsequent scans keep using the initial include/exclude patterns even if the config is reloaded (e.g., via BatchScheduler::reload_config for the next scheduled run). This leaves updated patterns ignored for the rest of the process lifetime, contradicting the intent to rebuild filters each batch.
Useful? React with 👍 / 👎.
Apply rustfmt's preferred formatting for chained method calls in glob pattern validation (map_err on new line). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
globset 0.4.18 requires edition2024 which is not available in Rust 1.82.0. Pin to 0.4.15 to maintain compatibility. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
c38761e to
ee60748
Compare
Описание
Добавлена возможность фильтрации DBF файлов с использованием glob паттернов.
Функциональность
Whitelist (включающий список)
include_patterns- обрабатывать ТОЛЬКО файлы, соответствующие паттернамBlacklist (исключающий список)
exclude_patterns- пропускать файлы, соответствующие паттернамОсобенности
✅ Case-insensitive matching -
*.DBFсоответствуетfile.dbf,FILE.DBF✅ Glob patterns - поддержка
*,?,[abc]и других wildcards✅ Валидация - проверка синтаксиса паттернов при загрузке конфига
✅ Автоматическая перезагрузка - изменения применяются при следующем батче
✅ Подробное логирование - количество отфильтрованных файлов
Пример использования
Тестирование
Изменённые файлы
Cargo.toml- добавлена зависимостьglobsetsrc/models/config.rs- расширенSourceConfigsrc/processor/filter.rs- новый модуль фильтрацииsrc/processor/scanner.rs- применение фильтров при сканированииconfig.example.toml- пример конфигурацииРешает проблему
Позволяет исключить проблемные файлы
nsfcli.DBFиNsfMod.dbf, которые вызывали ошибки при обработке.