Skip to content

Functions for checks and cleaning in parse_xml #285

@AdrianDAlessandro

Description

@AdrianDAlessandro

I'd split all these little checks and bits for cleaning the text into separate functions for readability + testability. For example, this one could be something like this:

def get_year(soup: BeautifulSoup) -> str:
    # Check if the 'accepted' date is found within 'date', and if it contains a 'year' tag
    ### no check for unicode or hexacode or XML tags
    if date := soup.find("date", {"date-type": "accepted"}):
        if year := date.find("year"):
            # Extract the text content of the 'year' tag if found
            return year.text

    # If 'accepted' date or 'year' is missing, return empty string
    return ""

(I'm not saying you have to do that on this PR -- just food for thought)

Originally posted by @alexdewar in #263 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions