-
Notifications
You must be signed in to change notification settings - Fork 4
TUL/Script for correcting dc.date.issued formats #239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
0ddf7ef
6932305
edb3a9e
a5b6dad
e5f721d
8603b9d
1e7008d
5353f33
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -72,38 +72,78 @@ def input(self) -> str: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| def value(self) -> str: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| return self._d | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| def is_valid(self): | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| """Check if the given string is a valid date.""" | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| @staticmethod | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| def is_year_only(date_str: str) -> bool: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| """Check if the string represents a year-only format (YYYY).""" | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| if len(date_str) != 4 or not date_str.isdigit(): | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| return False | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| try: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| datetime.strptime(self._d, '%Y-%m-%d') | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| datetime.strptime(date_str, '%Y') | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| return True | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| except ValueError as e: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| date.invalid[self._d] += 1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| if date.invalid[self._d] == 1: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| _logger.debug(f"[{self._d}] is not valid date. Error: {e}") | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| except ValueError: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| return False | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| def parse(self) -> bool: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| """Convert the value to a date format. Normalize date to 'YYYY-MM-DD' format, filling missing parts with '01'.""" | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| def is_valid_hybrid(self): | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| """Check if date is valid in YYYY-MM-DD, YYYY-MM, or YYYY format (all kept as-is).""" | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # Check YYYY-MM-DD format | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| try: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| datetime.strptime(self._d, '%Y-%m-%d') | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| return True | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| except ValueError: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| pass | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # Check YYYY-MM format (partial date) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| try: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| datetime.strptime(self._d, '%Y-%m') | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| return True | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| except ValueError: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| pass | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+100
to
+101
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # Check YYYY format (year only) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| if date.is_year_only(self._d): | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| return True | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copilot
AI
Nov 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The partial date formats handling has inconsistent behavior. When parsing partial dates (YYYY-MM), the code converts them to YYYY-MM format (line 142), but the PR description states that partial dates should be "converted to year-only format" (e.g., 5/2025 → 2025). This contradicts the documented behavior.
Copilot
AI
Nov 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trailing whitespace on line. This should be removed to maintain code cleanliness.
Copilot
AI
Nov 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing docstring for the _perform_update method. Since this is a newly extracted helper method containing important update logic, it should have proper documentation explaining its parameters, return value, and behavior.
| """Common logic for updating item metadata in database.""" | |
| """ | |
| Updates the metadata of an item in the database with a converted date value. | |
| Parameters: | |
| item (dict): The item dictionary containing metadata to be updated. | |
| date_val (date): The date object representing the converted date value. | |
| uuid (str): The unique identifier of the item. | |
| id_str (str): A string used for logging, typically identifying the item and field. | |
| Returns: | |
| int: Status code indicating the result of the update operation: | |
| updater.ret_failed (1): Update failed. | |
| updater.ret_updated (2): Update succeeded. | |
| Behavior: | |
| - Logs the conversion of invalid dates. | |
| - Updates the item's metadata with the converted date value. | |
| - Attempts to update the item in the database (unless dry_run is True). | |
| - Updates internal info structures to track update status. | |
| """ |
Copilot
AI
Nov 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trailing whitespace on line. This should be removed to maintain code cleanliness.
Copilot
AI
Nov 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incomplete docstring. The docstring mentions "No null/empty handling (will crash on None)" but doesn't explain the force parameter, the return value, or other important aspects of the method's behavior.
| """Update existing metadata with hybrid rules: | |
| - No null/empty handling (will crash on None) | |
| - YYYY formats kept as-is | |
| - Partial dates (YYYY-MM) kept as-is (normalized) | |
| - Invalid dates logged as ANOMALY | |
| """ | |
| """ | |
| Update existing metadata for an item using hybrid date rules. | |
| Args: | |
| item (dict): The item whose metadata is to be updated. | |
| date_str (str): The date string to validate and update. | |
| force (bool, optional): If True, forces the update even if the date is invalid or not in a preferred format. | |
| If False, only updates if the date is not already valid or in a preferred format. | |
| Behavior: | |
| - No null/empty handling (will crash on None). | |
| - Dates in YYYY format are kept as-is. | |
| - Partial dates (YYYY-MM) are kept as-is (normalized). | |
| - Invalid dates are logged as ANOMALY and not updated unless force=True. | |
| - If force=True, attempts to update the metadata regardless of validity. | |
| Returns: | |
| int: Status code indicating the result: | |
| - updater.ret_already_ok: Metadata was already valid and no update was performed. | |
| - updater.ret_invalid_meta: The date was invalid and could not be updated. | |
| - updater.ret_updated: Metadata was successfully updated. | |
| - updater.ret_failed: An error occurred during the update. | |
| """ |
Copilot
AI
Nov 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trailing whitespace on line. This should be removed to maintain code cleanliness.
Copilot
AI
Nov 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment "No null/empty handling - let it crash if needed" indicates intentional lack of defensive programming. However, this could lead to unclear error messages when None or empty values are encountered. Consider at least validating that date_str is not None/empty and providing a clear error message if it is, rather than allowing a generic crash.
Copilot
AI
Nov 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hardcoded check for YYYY-MM format is fragile. This checks for exactly 7 characters with a dash at position 4, which could match invalid dates like "abcd-ef". Consider using a regex pattern or the datetime parsing logic already available to validate the format more robustly.
Copilot
AI
Nov 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent logging behavior. Year-only and partial date formats log info messages (lines 262, 265), but valid full dates in YYYY-MM-DD format do not (line 267). This creates inconsistent verbosity where some "valid" cases are logged and others are silent, making debugging and monitoring more difficult.
| self._info["valid"].append((uuid, date_val.input)) | |
| self._info["valid"].append((uuid, date_val.input)) | |
| _logger.info(f"{id_str}: full date format [{date_str}] - keeping as-is") |
Copilot
AI
Nov 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment "Always use standard validation/parsing (no relaxed mode)" is unclear and adds no value. Either expand it to explain what it means and why it's important, or remove it.
| # Always use standard validation/parsing (no relaxed mode) | |
| # Use strict validation/parsing for metadata values to ensure data integrity. | |
| # Relaxed parsing is intentionally avoided to prevent accepting malformed or ambiguous data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trailing whitespace on line. This should be removed to maintain code cleanliness.