fix: BioEngineDatasets lazily re-resolves data server URL (0.9.3)#71
Merged
Conversation
BioEngineDatasets resolved the data server URL exactly once, at construction time, by reading ~/.bioengine/datasets/bioengine_current_server. If the file did not exist when the instance was created (typical for a Ray Serve deployment that starts before the data server), service_url was set to None and every subsequent dataset operation short-circuited — even after the data server later wrote the file. The only escape was to recreate the client, which a long-lived injected deployment cannot do. Fix: * Remember the user's intent (``"auto"`` / explicit URL / ``None``) on ``self._requested_data_server_url`` instead of collapsing it at init. * Add ``_resolve_service_url()`` — a cheap "are we attached yet?" check that re-reads the discovery file the first time service_url is still None on any call. ``"auto"`` keeps re-trying until the file appears; explicit URLs are attached once; ``None`` stays disabled. * Call ``_resolve_service_url()`` at the top of every public method that uses ``service_url`` (``ping_data_server``, ``list_datasets``, ``list_files``, ``save_file``, ``list_saved_files``, ``get_saved_file``; ``get_file`` benefits transitively via ``list_datasets``). * Add ``async refresh()`` — explicit re-discovery for the second scenario the original code couldn't handle: the data server has restarted on a different URL. Closes the old httpx.AsyncClient before re-attaching. * Move the discovery file path to ``_CURRENT_SERVER_FILE`` class attribute so the new code and tests can target it uniformly (monkeypatchable). Adds tests/test_datasets_lazy_resolution.py — six focused unit tests that exercise the lazy path without touching the network. Bumps version to 0.9.3 (0.9.2 is reserved by PR #70 — the Ray Client stale-handle fix). Both PRs are independent; whichever lands first takes the lower version, the other rebases. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Resolves the reported issue where
BioEngineDatasetspermanently locks inservice_url = Noneif the discovery file~/.bioengine/datasets/bioengine_current_serveris missing at construction time, and never re-checks even after the data server starts and writes the file. Long-lived injected clients in Ray Serve deployments (e.g.chiron-manager) could not see datasets that became available after deploy.Approach
The issue suggested two options: lazy re-read in
list_datasets(), or an explicitrefresh()method. I implemented both because they cover different scenarios:_resolve_service_url) handles the common case the issue describes: the client was constructed before the data server. No caller change required. Every public method now calls_resolve_service_url()first; the fast path is a single attribute check, the slow path reads one small file.refresh()handles the second-class case the lazy path can't: the data server has restarted on a new URL. The lazy path doesn't fire whileservice_urlis already set, so it can't detect a server change.refresh()closes the oldhttpx.AsyncClientand re-runs the resolution.Changes
bioengine/datasets/datasets.pyself._requested_data_server_url("auto"/ explicit URL /None) instead of collapsing it at init._resolve_service_url(*, initial=False)— lazy attach. Re-read the discovery file whenservice_url is Noneand the request was"auto". Initial-call warning only at construction so a never-attached client doesn't spam logs.async refresh()— close old client, clear state, re-resolve. Returns the new URL orNone._CURRENT_SERVER_FILEclass attribute so the new code and tests can target it uniformly (monkeypatchable)._resolve_service_url()at the top of every public method touchingservice_url:ping_data_server,list_datasets,list_files,save_file,list_saved_files,get_saved_file.get_filebenefits transitively vialist_datasets.tests/test_datasets_lazy_resolution.py— six focused unit tests, no network, no data server required. Cover the lazy path, the explicit-None-stays-disabled invariant, the explicit-URL-doesn't-re-resolve invariant, andrefresh()swapping URLs.pyproject.toml:0.9.1→0.9.3.0.9.2is reserved by PR fix: auto-reconnect Ray Client on stale Serve handle (0.9.2) #70 (Ray Client stale-handle reconnect, still open). The two PRs are independent; whichever lands first takes the lower version, the other rebases.Out of scope
BioEngineDatasetsis single-event-loop; concurrent calls from within one loop cannot race in the sync resolve method. Adding a lock is straightforward if a future caller introduces threads.Test plan
python -m pytest tests/test_datasets_lazy_resolution.py -v --noconftest— 6 passed.BioEngineDatasets-using app with no data server, start the data server, calllist_datasets()— should now return the populated dictionary without redeploying.🤖 Generated with Claude Code