Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse Catalog Faster #48

Open
jonthegeek opened this issue Nov 29, 2023 · 0 comments
Open

Parse Catalog Faster #48

jonthegeek opened this issue Nov 29, 2023 · 0 comments
Assignees
Labels

Comments

@jonthegeek
Copy link
Collaborator

The all_metadata step of parse_rdfs.R is very, very slow. This makes debugging tedious. Some of this slowness might be unavoidable (we're parsing a lot of data), but try to optimize if possible.

The Project Gutenberg docs imply that there's a single XML/RDF file available, but I don't see it. That would presumably be much faster to parse.

@jonthegeek jonthegeek self-assigned this Aug 31, 2024
@jonthegeek jonthegeek added the data label Sep 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant