-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metha-Cat: Support for Paging? #28
Comments
I ran into similar issues (too big XML files) in the past and I remember that there are tools addressing this problem specifically; one I remember is xml_split. On debian/ubuntu it seems to be available with the
The resulting XML is valid, but slightly modified:
Does this help? PS: Thanks for using metha! I'm just curious (and collecting uses of metha) - if possible, can you share the project name in which metha is used for data acquisition? |
@miku Thanks a lot for your answer. We'll try this out! We are using |
This is go-specific, so I leave this here as a footnote: I had some success making XML processing faster by parallelizing it, with some ideas take from here: Faster XML processing in Go -- anecdata: 5GB of XML can be processed in a few seconds. |
Hi there,
We are using
metha-sync
to harvest quite a big set. Everything went smoothly and we could create an XML usingmetha-cat
containing the whole set.The XML is quite big (2.5 GB) and we have some difficulties processing it.
Is there a way to get the records in steps of a limited size (like paging with a defined size and offset) with
metha-cat
? Setting thefrom
oruntil
params wouldn't help us much I think (libraries might process big batches on a single day).Thanks a lot!
The text was updated successfully, but these errors were encountered: