-
Notifications
You must be signed in to change notification settings - Fork 0
Update exception handling #175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Why these changes are being introduced: * Updating exception handling for the Transformer class for more explicit flow control. How this addresses that need: * Create exceptions module * Add SkippedRecordEvent exception * Move DeletedRecordEvent to exceptions module * Update Datacite.get_optional_fields to call SkippedRecordEvent Side effects of this change: * None Relevant ticket(s): * NA
ghukill
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Would be an outright approve, but curious about two things:
- giving
SkippedRecordEventexceptions a message (optionally) and logging that here - tweak
__next__logic such that a successful record is the default, fall-through path
| except SkippedRecordEvent: | ||
| self.skipped_record_count += 1 | ||
| continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SkippedRecordEvent looks great, slots right in.
For posterity's sake, just commenting that in our discussion, we had also talked about how more granular exceptions, e.g. invalid records, records with runtime-y errors, or otherwise, could extend SkippedRecordEvent to still trigger this code path of "skipping" a record.
Lastly, related to this comment in the PR,
"Logging can be added as necessary depending on what triggered the skip."
If within code we set a custom message on the SkippedRecordEvent, maybe this would be good place to log that? That would allow setting the exception + message deep in code, confident it would bubble up here and get logged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unable to comment on the lines below, as they did not change, but thoughts on updating the logic to treat a successful record as the default path?
I know this will get touched on in orchestration updates, but while touching this __next__ method, thought it could be a good time to nudge it.
e.g.
if not record:
self.skipped_record_count += 1
continue
self.transformed_record_count += 1
return record There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call on the exception message and I was frustrated with the repetition in that method so thank you very much for such a simple refactor!
| fields["content_type"] = [content_type] | ||
| else: | ||
| return None | ||
| raise SkippedRecordEvent(source_record_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. But see comment above about adding messaging close to the error/situation, confident it would get logged higher up.
* Update Transformer.__next__ method for more logical flow * Add message param to SkippedRecordEvent exception * Update call of SkippedRecordEvent in Datacite class
|
@ghukill Made changes in response to your review. Also, didn't realize it's possible to change the target branch on an existing PR so I'll keep this one open |
ghukill
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, but left some comments on the exceptions.
As noted, it feels like a time to iron out what they'll look and feel like, if they do become more central to things.
I could be persuaded that no changes are needed, but curious your thoughts!
transmogrifier/exceptions.py
Outdated
| class SkippedRecordEvent(Exception): # noqa: N818 | ||
| """Exception raised for records that should be skipped. | ||
|
|
||
| Attributes: | ||
| source_record_id: The ID for the source record. | ||
| """ | ||
|
|
||
| def __init__(self, source_record_id: str | None, message: str) -> None: | ||
| self.source_record_id = source_record_id | ||
| self.message = message |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is some nuance to consider here, but feels like the time to think about these exceptions as a) they are now going into main, and b) might be fairly central to new orchestration patterns when implemented.
I would propose two things:
- We can instantiate these exceptions just with a string message if we like, just like normal exceptions, and have this represented when you stringify the exception object
- All extra properties are optional, unless we decide they should be required
Example of how this could be done:
class SkippedRecordEvent(Exception):
""""""
def __init__(self, message:str|None=None, source_record_id:str|None=None):
super().__init__(message) # calls base Exception constructor
self.source_record_id = source_record_idBy default, the base Exception class is expecting a positional argument message that is then used in the __str__ method to stringify it.
This allows some behavior like this:
# acts like other exceptions, allows string message as first and only argument
In [19]: e = SkippedRecordEvent("Hello world!")
In [20]: str(e)
Out[20]: 'Hello world!'
In [21]: e
Out[21]: __main__.SkippedRecordEvent('Hello world!')
In [22]: e.source_record_id
# None passed
# also supports extra properties if defined
In [23]: e2 = SkippedRecordEvent("Hello world!", source_record_id="acb123")
In [24]: str(e2)
Out[24]: 'Hello world!'
# source_record_id also present if we want to use it
In [25]: e2.source_record_id
Out[25]: 'acb123'
# not ideal, but like default exceptions, can even raise it without messages
In [26]: e3 = SkippedRecordEvent()
In [27]: str(e3)
Out[27]: 'None' # string of None
In [28]: e3.source_record_id
# also None
Two additional considerations, maybe now, maybe later:
- Does this have implications for
DeletedRecordEvent? Do we want to update patterns so it has similar default, predictable behavior? - Do we want more granular exception classes like
InvalidRecordErrorthat extendSkippedRecordEvent, such that we could raise those more meaningful, granular exceptions from code, but still have a singleexcept SkippedRecordEvent: ...logic at the higher level to group skipping behavior for all those child Exception types?
Both seem like they could wait. And for that matter, the suggestions above could too. But felt like a good time to raise them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree on the proposed exception formatting!
Re: additional considerations, those feel better to address during the full orchestration discussion. It doesn't feel like we have full use cases for either yet (correct me if I'm wrong)but I would like to discuss both in the more comprehensive context of the orchestration refactor.
If that sounds OK, we can create an issue with those 2 points under something like Exception refactoring and as related items come up they could be added as comments. Open to other ways of doing it though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me. If SkippedRecordEvent is updated now to allow for default-y Exception behavior discussed above, then the rest feels like something we could suss out in future work. There aren't that many instances of raising these from code, so would be easy to ctrl + f them and then think about what new exceptions might make sense to raise there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New commit pushed and issue created!
| else: | ||
| return None | ||
| message = f'Record skipped based on content type: "{content_type}"' | ||
| raise SkippedRecordEvent(source_record_id, message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments above about possible reworking of these custom exceptions.
Even if we don't go that route, I think perhaps the message should be the first argument, and positional, and then source_record_id be a named argument is optional.
|
And agreed on figuring this out however many commits it takes! |
* Shift param order for SkippedRecordEvent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Agree with the changes that were proposed by @ghukill. Still need to review the latest discussion/comments posted above!
ghukill
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thanks for this intricate scaffolding work that is backwards/forwards compatible. Think it's establishing a nice pattern for more granular exception raising as we progress.
| def __init__(self, message: str | None = None, source_record_id: str | None = None): | ||
| super().__init__(message) | ||
| self.source_record_id = source_record_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
| else: | ||
| return None | ||
| message = f'Record skipped based on content type: "{content_type}"' | ||
| raise SkippedRecordEvent(message, source_record_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
I might suggest as we revist raising these in the future that we have a convention of positional message and then named arguments (e.g. source_record_id=source_record_id), but not blocking now as this definitely works too!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this idea, too!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 , I agree with that pattern as well going forward
* Update Transformer.__next__ method for more logical flow * Add message param to SkippedRecordEvent exception * Update call of SkippedRecordEvent in Datacite class
* Shift param order for SkippedRecordEvent
Purpose and background context
Adding more explicit flow control through exception handling in the
Transformerclass. Previously, records were skipped whenget_optional_fieldsreturnedNone. Adding aSkippedRecordEventexception that can handle records that should be skipped for an invalidcontent_typeor records with significant structural issues that raise the risk of unknown exceptions later in the process. This is a minor update to the orchestration that should aid us with the field method refactor.As we proceed with the refactor in other transforms, use the
SkippedRecordEventexception wheneverget_optional_fieldsreturnsNone. Logging can be added as necessary depending on what triggered the skip.How can a reviewer manually see the effects of these changes?
The
Dataciteclass was updated to raise theSkippedRecordEventexception illustrating that tests such astest_zenodo_skips_records_with_invalid_content_typesstill pass.Includes new or updated dependencies?
NO
Changes expectations for external applications?
NO
What are the relevant tickets?
Developer
Code Reviewer(s)