Skip to content

Conversation

@plissb
Copy link
Contributor

@plissb plissb commented Dec 14, 2025

Описание

Добавлена возможность фильтрации DBF файлов с использованием glob паттернов.

Функциональность

Whitelist (включающий список)

  • include_patterns - обрабатывать ТОЛЬКО файлы, соответствующие паттернам
  • Если не указано - обрабатываются все DBF файлы

Blacklist (исключающий список)

  • exclude_patterns - пропускать файлы, соответствующие паттернам
  • Применяется ПОСЛЕ include_patterns

Особенности

Case-insensitive matching - *.DBF соответствует file.dbf, FILE.DBF
Glob patterns - поддержка *, ?, [abc] и других wildcards
Валидация - проверка синтаксиса паттернов при загрузке конфига
Автоматическая перезагрузка - изменения применяются при следующем батче
Подробное логирование - количество отфильтрованных файлов

Пример использования

[src]
source_dir = "C:\data"

# Исключить проблемные файлы
exclude_patterns = ["nsfcli.DBF", "NsfMod.dbf"]

# Или обрабатывать только nsf* файлы
# include_patterns = ["nsf*.DBF"]

# Или исключить временные файлы
# exclude_patterns = ["temp_*.dbf", "*.bak"]

Тестирование

  • ✅ 56 unit tests pass
  • ✅ 6 новых тестов для фильтрации
  • ✅ Clippy без warnings
  • ✅ Release build успешна

Изменённые файлы

  • Cargo.toml - добавлена зависимость globset
  • src/models/config.rs - расширен SourceConfig
  • src/processor/filter.rs - новый модуль фильтрации
  • src/processor/scanner.rs - применение фильтров при сканировании
  • config.example.toml - пример конфигурации

Решает проблему

Позволяет исключить проблемные файлы nsfcli.DBF и NsfMod.dbf, которые вызывали ошибки при обработке.

plissb and others added 2 commits December 14, 2025 15:27
Implements configurable file filtering to exclude or include specific
DBF files from processing based on glob patterns.

Features:
- Whitelist (include_patterns): Only process files matching patterns
- Blacklist (exclude_patterns): Skip files matching patterns
- Case-insensitive glob matching (*.dbf matches FILE.DBF)
- Patterns support wildcards: *, ?, [abc], etc.
- Validates patterns at config load time
- Auto-reload when config changes
- Detailed logging of filtered files

Configuration example in config.toml:
```toml
[src]
source_dir = "C:\data"
exclude_patterns = ["nsfcli.DBF", "NsfMod.dbf", "temp_*.dbf"]
# include_patterns = ["nsf*.DBF"]  # Optional whitelist
```

Changes:
- Added globset dependency for pattern matching
- Extended SourceConfig with include_patterns/exclude_patterns
- Created filter module with FileFilter struct
- Updated scanner to apply filtering during directory walk
- Updated all tests to include new optional fields
- Added comprehensive filter tests (6 test cases)

Resolves issue with problematic DBF files (nsfcli.DBF, NsfMod.dbf).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Provides comprehensive example configuration including:
- File filtering patterns (include/exclude)
- Pattern syntax examples and best practices
- Comments explaining all configuration options

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +108 to +111
// Clear old cache and set new one
// Note: OnceLock doesn't support clearing, so we just overwrite
// This is called once per batch, so it's acceptable
let _ = CACHED_FILTER.set(filter);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Allow filter cache to refresh after config reloads

The global filter cache is stored in a OnceLock, and set_global_filter calls CACHED_FILTER.set(filter) on every scan while discarding the result. OnceLock::set succeeds only once, so after the first batch all subsequent scans keep using the initial include/exclude patterns even if the config is reloaded (e.g., via BatchScheduler::reload_config for the next scheduled run). This leaves updated patterns ignored for the rest of the process lifetime, contradicting the intent to rebuild filters each batch.

Useful? React with 👍 / 👎.

plissb and others added 2 commits December 14, 2025 16:20
Apply rustfmt's preferred formatting for chained method calls in
glob pattern validation (map_err on new line).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
globset 0.4.18 requires edition2024 which is not available in
Rust 1.82.0. Pin to 0.4.15 to maintain compatibility.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@plissb plissb force-pushed the feature/file-filtering branch from c38761e to ee60748 Compare December 14, 2025 14:30
@plissb plissb merged commit 7a1c360 into develop Dec 14, 2025
1 check passed
@plissb plissb deleted the feature/file-filtering branch December 14, 2025 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants