Skip to content

Conversation

@danpoletaev
Copy link
Collaborator

Description

Currently, when opening a KV store, dataset or request queue we only store the id and name from the get() response.

This PR updates the logic to store the entire storage_object, allowing access to additional attributes without requiring extra requests.

More context in slack

P.S. Python Crawlee storage constructors are a bit different than JS storage consturctors, that's why I've added storage_object to all of the classes - KV store, dataset & request queue

Similar PR in Crawlee JS

Testing

I've added one test, which checks that KV store class has storage_object after open.

@danpoletaev danpoletaev self-assigned this Feb 18, 2025
@github-actions github-actions bot added this to the 109th sprint - Platform team milestone Feb 18, 2025
@github-actions github-actions bot added t-core-services Issues with this label are in the ownership of the core services team. tested Temporary label used only programatically for some analytics. labels Feb 18, 2025
@danpoletaev danpoletaev added the adhoc Ad-hoc unplanned task added during the sprint. label Feb 18, 2025
@vdusek vdusek requested review from Pijukatel and removed request for B4nan February 25, 2025 09:18
Copy link
Collaborator

@Pijukatel Pijukatel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more comment to the whole change. Seems to me that same thing is refereed under different names in different places. It would be nice to unify it and use only one name variant. For example one of following storage_object or storage_info or maybe storage_metadata since it is instance of _StorageMetadata.


@property
@abstractmethod
def storage_object(self) -> _StorageMetadata:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is public property, that returns private type. I guess we should expose _StorageMetadata if we use it as type hint for public property now.

@danpoletaev danpoletaev requested a review from Pijukatel March 3, 2025 10:09
Copy link
Collaborator

@Pijukatel Pijukatel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add 3 simple test cases(one for each storage class) that will just call from_storage_object constructor and assert that the created object is as expected.


@property
@abstractmethod
def storage_object(self) -> StorageMetadata:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this be more precise?

Suggested change
def storage_object(self) -> StorageMetadata:
def storage_object(self) -> DatasetMetadata | KeyValueStoreMetadata | RequestQueueMetadata:

and _StorageMetadata could remain private.

I am also ok with having StorageMetadata as a type alias for the union mentioned above. And the _StorageMetadata could be renamed to _CommonStorageMetadata, _BaseStorageMetadata, ...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I actually did in the beginning, but I was getting type error in _creation_managment.py

Screenshot 2025-03-03 at 14 52 34

@danpoletaev danpoletaev requested review from Pijukatel and vdusek March 3, 2025 13:57
Copy link
Collaborator

@janbuchar janbuchar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues here, just make sure that the new SDK works with this.

Copy link
Collaborator

@vdusek vdusek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, I'm OK with the common StorageMetadata model for every storage. LGTM.

@danpoletaev danpoletaev merged commit 43c5c96 into master Mar 4, 2025
23 checks passed
@danpoletaev danpoletaev deleted the feat/set-storage-object-on-kv-open branch March 4, 2025 14:37
danpoletaev added a commit that referenced this pull request Mar 7, 2025
### Description

This PR follows up on #993.
While aligning changes with the Apify SDK, I noticed that Pydantic
models only select explicitly defined fields. However, we also need
access to `urlSigningSecretKey`, so I’ve updated the `StorageMetadata`
config **to allow extra fields.**

### Testing

Added tests for all storage types (datasets, key-value stores, and
request queues) to ensure extra attributes are correctly handled.

### Checklist

- [x] CI passed
vdusek pushed a commit to apify/apify-sdk-python that referenced this pull request Mar 7, 2025
This PR is part of [Issue
#19363](apify/apify-core#19363), which updates
`KeyValueStore.getPublicUrl(recordKey)` to generate signed links using
HMAC.

**This PR**:
- creates signature and appends it to the public URL of the record in KV
store

P.S. Before merging, we need to wait for the release of Crawlee, so we
can update version of Crawlee.

P.P.S It's hard to test the changes, since master branch of SDK and
master branch of Crawlee are out of sync. Crawlee has some breaking
changes in version 6.0, which are not yet addressed in master branch of
SDK.

Previous PR that adds storageObject to Crawlee Python is
[here](apify/crawlee-python#993)

Same PR in SDK JS is
[here](apify/apify-sdk-js#358)
@fnesveda fnesveda added the validated Issues that are resolved and their solutions fulfill the acceptance criteria. label Mar 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

adhoc Ad-hoc unplanned task added during the sprint. t-core-services Issues with this label are in the ownership of the core services team. tested Temporary label used only programatically for some analytics. validated Issues that are resolved and their solutions fulfill the acceptance criteria.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants