Skip to content

Replace non-breaking space with regular space #233

@kellymarchisio

Description

@kellymarchisio

Hi there -- my normal data-cleaning pipeline is normalize-punctuation.perl | remove-non-printing-char.perl | tokenizer.perl, but this doesn't remove non-breaking spaces, which can break some things downstream. Any chance we can add removal of nonbreaking spaces? Hex: \C2 \A0 (source: https://www.cogsci.ed.ac.uk/~richard/utf-8.cgi?input=A0&mode=hex)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions