Refactor scheduler retriever#1
Conversation
…iever - Split SchedulerRetriever into SchedulerSearchService and MemoryPostProcessor - SchedulerSearchService: Unified search interface delegating to Searcher - MemoryPostProcessor: Handles all post-retrieval processing (enhance, filter, rerank) - Fix: Use memory_type='All' to prevent 2*top_k bug in search results - Add robust search_method conversion with logging for unknown methods - Update all scheduler call sites (base, general, optimized, eval) - Add comprehensive unit tests for both new services Breaking changes: None (backward compatible) Fixes: Search result count bug (was returning up to 2*top_k)
Problem: - enhance_memories_with_query and recall_for_missing_memories were in MemoryPostProcessor - This caused SearchHandler to depend on Scheduler (cross-layer coupling) - These are general search enhancement features, not scheduler-specific Solution: - Move both methods from MemoryPostProcessor to AdvancedSearcher - Update SingleCubeView to use self.searcher instead of self.mem_scheduler.post_processor - Simplify MemoryPostProcessor to only handle filtering and reranking Benefits: - Better separation of concerns (search enhancement belongs to Searcher) - Removes cross-layer dependency (API layer no longer depends on Scheduler) - Enables reuse (both Scheduler and SearchHandler share same implementation) - Improves testability (AdvancedSearcher can be tested independently) Changes: - advanced_searcher.py: +291 lines (added enhancement methods) - post_processor.py: -320 lines (removed enhancement, kept filter/rerank) - single_cube.py: 6 changes (updated call sites) - test_post_processor.py: -57 lines (removed obsolete tests) Type safety: Verified via 'AdvancedSearcher as Searcher' alias in tree.py
Fixes 3 critical issues identified in code review: 1. Configuration behavior change (HIGH RISK) Problem: enhance_memories_with_query called without batch_size/retries Impact: Behavior inconsistent with historical settings Fix: Pass DEFAULT_SCHEDULER_RETRIEVER_BATCH_SIZE and RETRIES explicitly 2. Module dependency inversion (ARCHITECTURE REGRESSION) Problem: AdvancedSearcher depended on memos.mem_scheduler.utils Impact: Violated decoupling goal (memories should not depend on scheduler) Fix: Move extract_json_obj/extract_list_items_in_answer to memos.utils Changes: - src/memos/utils.py: +163 lines (add utility functions) - misc_utils.py: -163 lines, +3 lines (re-export for backward compat) - Update imports in 6 files 3. Test coverage migration (NOTED) Note: _split_batches tests removed but not migrated Action: Defer to future test enhancement Benefits: - Maintains historical batch/retry behavior - Restores proper layering (no memories→scheduler dependency) - Backward compatible (misc_utils re-exports from memos.utils) Verification: ✓ No scheduler imports in AdvancedSearcher ✓ Config params passed to enhance_memories_with_query ✓ All existing imports work via re-export
Critical fix for misc_utils.py: - Removed empty 'def extract_json_obj(text: str):' declaration - This was leftover from sed deletion and caused IndentationError - Functions are properly re-exported from memos.utils Verification: python3 -m py_compile passes
- Fix test_search_service.py: call_count assertion (2 -> 1) SearchService now uses memory_type='All' for single search call - Fix test_post_processor.py: remove incorrect 'with Mock()' usage - Fix ruff E712: use truthiness check instead of '== True' - Apply ruff formatting to all modified files All 23 tests now pass successfully.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| prompt_name = "memory_rewrite_enhancement" | ||
| else: | ||
| text_memories = "\n".join([f"- {mem}" for i, mem in enumerate(batch_texts)]) | ||
| prompt_name = "memory_recreate_enhancement" |
There was a problem hiding this comment.
Missing prompt templates in AdvancedSearcher will cause KeyError
High Severity
The enhance_memories_with_query and recall_for_missing_memories methods were moved to AdvancedSearcher, but the required prompt templates are missing from advanced_search_prompts.py. The PROMPT_MAPPING in that file doesn't include "memory_rewrite_enhancement" (used when FINE_STRATEGY == FineStrategy.REWRITE) or "enlarge_recall". When load_template attempts PROMPT_MAPPING[template_name] for these keys, a KeyError is raised at runtime.
Additional Locations (1)
| finally: | ||
| # Restore original setting | ||
| if original_manual_close is not None: | ||
| self.searcher.manual_close_internet = original_manual_close |
There was a problem hiding this comment.
Race condition when modifying shared searcher state
Medium Severity
The _search_with_searcher method modifies self.searcher.manual_close_internet and restores it in a finally block. Since the Searcher instance is shared across all concurrent requests, two threads calling search() with different internet_search values can interfere with each other. One thread's setting could affect another thread's search, causing incorrect internet search behavior. The old code path via TreeTextMemory.search() created a new Searcher per call, avoiding this issue.
| session_id=target_session_id, | ||
| internet_search=search_req.internet_search, | ||
| chat_history=search_req.chat_history, | ||
| mem_cube_id=user_context.mem_cube_id, |
There was a problem hiding this comment.
Search fails silently for NaiveTextMemory in optimized scheduler
Medium Severity
The search_memories method accepts NaiveMemCube (which may use NaiveTextMemory), but NaiveTextMemory lacks get_searcher, so self.searcher is None. When search_service.search() falls back to _search_with_text_mem, an assertion requires TreeTextMemory, causing AssertionError. The exception is caught and returns an empty list, silently failing searches that previously worked when the old code called mem_cube.text_mem.search() directly.
Additional Locations (1)
…nsor#1004) * refactor(scheduler): modularize handlers and search - extract scheduler handlers into dedicated modules - split retriever into pipelines (search/enhance/rerank/filter) - centralize text search logic for API and scheduler Refs MemTensor#1003 * fix: resolve ruff lint errors and address Copilot review feedback - Fix TC001/TC002/TC003: move type-only imports into TYPE_CHECKING blocks - Fix RUF059: prefix unused variables with underscore - Fix typos: "Memorires" -> "Memories", "exeption" -> "exception" - Remove self-assignment: `text_mem_base = text_mem_base` - Remove unused `user_context` param from `build_search_context` - Restore original `QUERY_TASK_LABEL` in activation memory update - Apply ruff format to all modified files * fix(redis): serialize schedule messages for streams - json-encode list/dict fields for Redis XADD - decode chat_history safely when reading from streams * fix(examples): use scheduler handlers - avoid private _memory_update_consumer call - delegate mem update handler to built-in handler * style(ruff): fix isinstance union syntax - apply X | Y form to satisfy UP038 * style(ruff): format message schemas - align datetime line breaks with ruff format * refactor: integrate architectural improvements from refactor-scheduler-stage2 This commit merges key modularization benefits and bug fixes from the refactor-scheduler-stage2 branch into fancy-scheduler: Modularize activation memory logic into ActivationMemoryManager Introduce SchedulerSearchService for unified memory search coordination Extract filtering and reranking logic into MemoryPostProcessor Maintain intentional search scope of LongTermMemory and UserMemory in SearchPipeline and SchedulerSearchService Update BaseScheduler to initialize and manage lifecycle of new modules Refactor BaseSchedulerMemoryMixin to delegate tasks to specialized managers * refactor: extract common logic to BaseSchedulerHandler and fix import paths * refactor: rename ctx to scheduler_context in mem_scheduler Rename abbreviation 'ctx' to 'scheduler_context' in GeneralScheduler and SchedulerHandlerRegistry to improve code readability and clarity. * refactor: sync orchestrator config and fix scheduler imports - Sync orchestrator config removal when unregistering handlers in dispatcher - Fix missing TaskPriorityLevel import in dispatcher - Fix register_handlers signature in BaseSchedulerQueueMixin - Fix handler registry imports and initialization map - Fix relative imports in handlers package * fix: resolve PR #1 review issues (P0 imports, P3 types/init) * chore(ruff): fix unused unpacked vars - prefix unused unpacked vars with underscore - apply ruff format changes * perf(search): include embeddings for mmr * fix: Pass user_context in mem_read and pref_add handlers - Update MemReadMessageHandler to extract user_context from message and pass it to _process_memories_with_reader and transfer_mem. - Update PrefAddMessageHandler to extract user_context from message and pass it to pref_mem.add. - This ensures user context information is available during memory reading and preference adding operations. --------- Co-authored-by: chentang <travistang@foxmail.com> Co-authored-by: glin1993@outlook.com <>
Description
Summary: (summary)
Fix: #(issue)
Docs Issue/PR: (docs-issue-or-pr-link)
Reviewer: @(reviewer)
Checklist:
Note
Major refactor to decouple retrieval and post-processing, and to standardize search.
SchedulerRetrieverwithSchedulerSearchService(unified search) andMemoryPostProcessor(filter + rerank); update schedulers to usepost_processorandsearch_servicesearch_method→SearchMode) and route searches throughsearch_service(with NaiveTextMemory fallback)memory_type="All"in one call (avoids returning2*top_k); preserve searcher internet toggle stateenhance_memories_with_query,recall_for_missing_memories) intoAdvancedSearchermemos.utils; update imports accordinglyoptimized_scheduler,general_scheduler, and eval path to new services; minor signature cleanuptest_search_service.py,test_search_service_bug_fix.py,test_post_processor.pyWritten by Cursor Bugbot for commit 1663831. This will update automatically on new commits. Configure here.