Skip to content

Conversation

valentin-pinkau
Copy link
Member

@valentin-pinkau valentin-pinkau commented Aug 26, 2025

Description:

  • Bumps API version to 11
  • moves dataset_properties one level up as it is now used by the dataset and the client api
  • They share the same schema as we now receive and structure the datasource-properties.json from webknossos.
  • add publish_to_webknossos method to the dataset.
  • various imports changed, users need to update their imports.
  • Adds new use_zarr_streaming flag to open_remote

Issues:

Todos:

Make sure to delete unnecessary points or to check all before merging:

  • Updated Changelog
  • Updated Documentation
  • Added / Updated Tests
  • Considered adding this to the Examples

@valentin-pinkau valentin-pinkau self-assigned this Aug 26, 2025
@valentin-pinkau valentin-pinkau changed the title [WIP] making the path optional in dataset, and layers Read directly from remote paths, upload Dataset with reserve_manual_upload Sep 2, 2025
@fm3 fm3 mentioned this pull request Sep 11, 2025
4 tasks
Copy link

github-actions bot commented Sep 17, 2025

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines Covered Coverage Threshold Status
9483 7899 83% 80% 🟢

New Files

File Coverage Status
webknossos/webknossos/dataset_properties/init.py 100% 🟢
webknossos/webknossos/dataset_properties/dataset_properties.py 96% 🟢
webknossos/webknossos/dataset_properties/structuring.py 98% 🟢
webknossos/webknossos/ssl_context.py 100% 🟢
TOTAL 99% 🟢

Modified Files

File Coverage Status
webknossos/webknossos/_init_.py 69% 🟢
webknossos/webknossos/_nml/parameters.py 98% 🟢
webknossos/webknossos/annotation/annotation.py 81% 🟢
webknossos/webknossos/cli/convert.py 90% 🟢
webknossos/webknossos/cli/convert_knossos.py 37% 🟢
webknossos/webknossos/cli/convert_raw.py 40% 🟢
webknossos/webknossos/cli/convert_zarr.py 37% 🟢
webknossos/webknossos/cli/copy_dataset.py 100% 🟢
webknossos/webknossos/client/_init_.py 100% 🟢
webknossos/webknossos/client/_download_dataset.py 95% 🟢
webknossos/webknossos/client/_upload_dataset.py 86% 🟢
webknossos/webknossos/client/api_client/_abstract_api_client.py 92% 🟢
webknossos/webknossos/client/api_client/_serialization.py 96% 🟢
webknossos/webknossos/client/api_client/datastore_api_client.py 90% 🟢
webknossos/webknossos/client/api_client/models.py 100% 🟢
webknossos/webknossos/client/api_client/wk_api_client.py 84% 🟢
webknossos/webknossos/dataset/_init_.py 100% 🟢
webknossos/webknossos/dataset/_array.py 88% 🟢
webknossos/webknossos/dataset/_downsampling_utils.py 85% 🟢
webknossos/webknossos/dataset/_upsampling_utils.py 92% 🟢
webknossos/webknossos/dataset/_utils/segmentation_recognition.py 91% 🟢
webknossos/webknossos/dataset/attachments.py 85% 🟢
webknossos/webknossos/dataset/dataset.py 78% 🟢
webknossos/webknossos/dataset/defaults.py 100% 🟢
webknossos/webknossos/dataset/layer.py 87% 🟢
webknossos/webknossos/dataset/mag_view.py 91% 🟢
webknossos/webknossos/dataset/ome_metadata.py 95% 🟢
webknossos/webknossos/dataset/view.py 87% 🟢
webknossos/webknossos/skeleton/skeleton.py 92% 🟢
webknossos/webknossos/utils.py 71% 🟢
TOTAL 84% 🟢

updated for commit: 2a1ffc2 by action🐍

@@ -0,0 +1,32 @@
import webknossos as wk
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import as wk or import directly. I wouldn't mix both.

@@ -0,0 +1,32 @@
import webknossos as wk
from webknossos import LayerToLink
from webknossos.dataset.remote_folder import RemoteFolder
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from webknossos.dataset.remote_folder import RemoteFolder
from webknossos import RemoteFolder

)
copytree(attachment.path, new_path)
self._add_attachment(new_attachment)
if self._layer.path is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we define a bool for virtual/remote datasets?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered making subclasses for remote layers?

_UNSET = make_sentinel("UNSET", var_name="_UNSET")


class LayerToLink(NamedTuple):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

separate file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would lead to cyclic imports as LayerToLink requires dataset and dataset requires LayerToLink

def announce_manual_upload(
cls,
def publish_to_preliminary_dataset(
self,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the naming a bit confusing. a) I wouldn't use publish, because that is not really used in wklibs. Better upload. b) Is this announcing a dataset upload or moving files after announcing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the name could be better. What we do here is, announcing the dataset properties to wk, so that it can reserve the necessary paths. We get back the paths and upload/copy/symlink the data. Finally, we let wk now that the data is now available.

"""
Iterates over the mags and attachments and copies or symlinks them to the target location.
"""
# This needs to be set to make sure the encoding is not chunked when uploading
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit surprising here. We can do this in the lib init.

fm3 added a commit to scalableminds/webknossos that referenced this pull request Sep 22, 2025
…8844)

### Description

This PR changes how clients with direct access to the underlying data
storage can get datasets registered into WK.

Corresponding libs client PR:
scalableminds/webknossos-libs#1359

<details>
<summary>New ReserveDatasetUploadToPaths Protocol</summary>

- Client sends reserveDatasetUploadToPaths request to wk side, including
a preliminary datasource plus layersToLink.
- Server generates id and paths for all mags and attachments where the
client must put the stuff. This is returned to the client.
- Server also validates layersToLink, and creates a dataset in the
database, marked as not yet fully uploaded to the target paths.
- After the client has actually put the data there, it calls
finishDatasetUploadToPath, marking the dataset as complete. It can now
be used.
- Note that layersToLink here are by datasetId+layerName, but for normal
dataset upload it’s still datasetDirectoryName+layerName (we’ll change
that to id later)

</details>

<details>
<summary>New ReserveAttachmentUploadToPath Protocol</summary>

- Client sends reserveAttachmentUploadToPath request with dataset id,
attachment name, type etc. WK returns the path where the client must put
the attachment. Wk inserts this path in the DB, marked as not yet fully
uploaded
- After the client has actually put the data there, it calls
finishAttachmentUploadToPath with the same parameters (as there is no
attachment id), marking the attachment as complete. It can now be used.

</details>

<details>
<summary>New ReserveUploadToPathForPreliminary Protocol</summary>
This extra route supports a new workflow, which will be used only by the
convert_to_wkw worker job, which webknossos starts when a user uploads a
dataset that is not yet in wkw/zarr format. When starting this upload, a
dataset is already inserted in the database (via reserveUpload – not
manual). Then after finishUpload, the conversion worker job is started.
Now after the worker has converted the dataset to wkw/zarr, it needs to
call this new reserveUploadToPathsForPreliminary route, which works only
on datasets in this particular state. It returns paths, just like the
normal, user-facing reserveUploadToPaths described above. But it is
different in that it needs to operate on the datasetId of the
already-created, preliminary dataset from the original reserveUpload.

</details>

<details>
<summary>Changes to Editing DataSources</summary>

 - Dataset settings view no longer allows changing layer names
- It also no longer allows editing the json directly (textarea is hidden
for now, to be removed in follow-up)
- The backend now only applies the allowed changes on save. (category,
boundingBox, coordinatesTransformations, defaultViewConfiguration,
adminViewConfiguration, largestSegmentId, voxelSize, deleted layers)

</details>

<details>
<summary>Misc Changes</summary>

- localDirectoryWhitelist is no longer checked in data loading, only in
add/exploreAndAdd/upload
 - exploreAndAdd now returns dataset id
- full dataSource now included in dataset publicWrites, with source data
paths if requested by includePaths

</details>

<details>
<summary>Refactoring: UPath</summary>

- New class UPath handles paths that can be either remote (s3, https,
gcs) or local (file or no schem)
- Attachments and mags have this UPath class now instead of URI and
String respectively
 - VaultPath also wraps a UPath (plus its DataVault with credentials)

</details>

<details>
<summary>Refactoring: New DataLayer Class Hierarchy</summary>

 - Simplified DataSource and DataLayer classes.
<img width="800" alt="image"
src="https://github.com/user-attachments/assets/4de47da2-ffae-4149-8e70-4653ca86d630"
/>

</details>

### Steps to test:
<details>
<summary>Test against regressions</summary>

- Browse some data, annotate some, explore remote dataset, explore local
dataset in directoryWhitelist, should all still work
- Test that both the newest and also older libs versions can still
open_remote and list existing datasets (including ones with unusual
data, like remote N5)

</details>

<details>
<summary>Test new features</summary>

- With new libs client code, go through the reserveUploadToPaths flow
scalableminds/webknossos-libs#1359
- Also test reserveAttachmentUploadToPath via libs
- Test that libs can still upload a dataset.
- Attempt to change paths of an existing dataset in wk, or add a dataset
with paths outside of itself, should not be possible (security!)
- Change the application.conf
`webKnossos.datasets.uploadToPathsPrefixes` to our managed s3 object
storage, give both wk and the client credentials, should also work with
this.

</details>

### TODOs:
<details>
<summary>Backend</summary>

- [x] Decide how the directory name should look: just the id? or
prefix-id? (might get outdated on renamings)
- [x] is the reserve response the full datasource again? or just a path
prefix?
- [x] how to configure this in the dev setup
- [x] clean up the whole DataSource mess?
  - [x] ensure we include resolutions;
  - [x] never leak credentialIds (except during explore/add flow);
- [x] settings view reads InboxDataSource, which includes credentialIds.
    - [x] Needs new Api for changing relevant fields
  - [x] compact json version? (resolutions, no paths?)
  - [x] do not write resolutions to disk?
- [x] directory scan overwrites status to No longer available on the
datastore
- [x] reserveAttachmentUploadToPaths
- [x] what’s up with the zarr streaming snapshots?
- [x] evolutions
- [x] does upload still check unique directory name?
- [x] ensure that add route still checks localDirectoryWhiteList (also
for exploreAndAdd)
- [x] move stuff to DatasetService, make createDataset private again
- [x] re-test upload with layersToLink from libs
- [x] return readable failures on now-unsupported routes? (old
datastore-based reserveManualUpload)
- [x] do not require status field in datasource in reserveUpload request
body.
- [x] introduce api version 11
- [x] re-test that editing DS in DB + reload propagates to datastore
- [x] on finishAttachmentUploadToPath, update datasource on disk too if
isVirtual=False.
- [x] datasource update call via wk backend instead of
frontend->datastore? skip datastore if isVirtual=true
- [x] more unit tests for UPath
- [x] double check that dataset info route is backwards compatible for
libs
- [x] add remote UI layer name no longer editable, but explore returns
empty, that’s a problem. test with first entry from notion remote
datasets table.

</details>
<details>
<summary>Frontend</summary>

- [x] dataset settings view: include paths, remove advanced view
- [x] updating datasource should go via wk first

</details>

### Follow-Ups
<details>
<summary>Possible Follow-Up Issues</summary>

 - #8939
 - #8940
 - #8941
 - #8942
 - #8943
 - #8944
 - #8945
 - #8946

</details>

### Issues:

- fixes #8827
- fixes #8859
- fixes #8857 
- fixes #8851
- fixes #8762
- fixes #8881 
- fixes #8929
- contributes to #8749


------
- [x] Added changelog entry (create a `$PR_NUMBER.md` file in
`unreleased_changes` or use `./tools/create-changelog-entry.py`)
- [x] Added migration guide entry if applicable (edit the same file as
for the changelog)
- [ ] Adapted [wk-libs python
client](https://github.com/scalableminds/webknossos-libs/tree/master/webknossos/webknossos/client)
if relevant API parts change
scalableminds/webknossos-libs#1359
- [x] Removed dev-only changes like prints and application.conf edits
- [x] Considered [common edge
cases](../blob/master/.github/common_edge_cases.md)
- [x] Needs datastore update after deployment

---------

Co-authored-by: valentin-pinkau <valentin.pinkau@scalableminds.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

read directly from remote paths with zarr streaming fallback
3 participants