Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Nov 21, 2025

When serialization formats change (e.g., primary_keys vs primary_key), previously serialized schemas/collections become unreadable. Schema deserialization already handles this via a strict parameter, but collections did not. Additionally, when reading serialized data, using non-strict schema deserialization in all cases that would allow validation to be run allows to recover from unreadable metadata where it is not needed.

Changes

  • Added strict parameter to deserialize_collection: Mirrors existing deserialize_schema behavior—when strict=False, returns None on deserialization errors instead of raising exceptions
  • Propagated strict through deserialization chain: Updated _deserialize_types to accept and forward the parameter
  • Connected to scan_parquet/read_parquet validation modes: Both Collection._read and Schema._validate_if_needed now pass strict=False when validation is "allow", "skip" or "warn", allowing automatic fallback to validation when old formats are detected
Original prompt

This section details on the original issue you should resolve

<issue_title>Catch errors when reading serialized schemas also for collections</issue_title>
<issue_description>If the format of serialized schema or collections change, previously serialized schemas might not be readable with new dataframely versions (e.g. primary_keys vs. primary_key). For schemas, this is already handled by catching errors in the dataframely.schema.deserialize_schema method if strict is False.

It would be nice if this could also be done in collections (dataframely.collection.collection._deserialize_types).

Additionally, in both cases, when these methods are called transitively via a scan_parquet call, the strict option should be set to False whenever the validation option allows that.

That would allow reading data that was serialized with a previous version (which then might require validation). </issue_description>

<agent_instructions>Extend the collection method with the strict parameter and catch exceptions if it is False. When called from scan_parquet, strict should be set to False if the validation setting is set to allow.</agent_instructions>

Comments on the Issue (you are @copilot in this section)


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 2 commits November 21, 2025 14:49
…rquet error handling

Co-authored-by: MoritzPotthoffQC <160181542+MoritzPotthoffQC@users.noreply.github.com>
Co-authored-by: MoritzPotthoffQC <160181542+MoritzPotthoffQC@users.noreply.github.com>
Copilot AI changed the title [WIP] Catch errors when reading serialized schemas for collections Add strict parameter to collection deserialization for backward compatibility Nov 21, 2025
Copilot finished work on behalf of MoritzPotthoffQC November 21, 2025 14:54
@codecov
Copy link

codecov bot commented Nov 21, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (1660d88) to head (037b68c).

Additional details and impacted files
@@            Coverage Diff            @@
##              main      #231   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           53        53           
  Lines         3019      3032   +13     
=========================================
+ Hits          3019      3032   +13     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@MoritzPotthoffQC MoritzPotthoffQC changed the title Add strict parameter to collection deserialization for backward compatibility feat: Recover from reading incompatible schema metadata if validation can be used Nov 24, 2025
@github-actions github-actions bot added the enhancement New feature or request label Nov 24, 2025
@MoritzPotthoffQC MoritzPotthoffQC changed the title feat: Recover from reading incompatible schema metadata if validation can be used feat: Recover from reading incompatible schema metadata if validation is allowed Nov 24, 2025
@MoritzPotthoffQC MoritzPotthoffQC marked this pull request as ready for review November 24, 2025 14:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Catch errors when reading serialized schemas also for collections

2 participants