-
Notifications
You must be signed in to change notification settings - Fork 558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Please make the parser more robust #724
Comments
hmm, i'm against changing the default behavior (it's correct to raise an error if the input format is broken)... but maybe providing an |
Sure, |
This is a good feature for the #283 clean-up, then there will be an error-callback. |
Hi, |
I have been working on this on this issue for quite some time now. I have made incorporated a few parsing errors, but I was wondering if you could explain in detail, the exact errors that you want us to handle so that I can keep those in mind before sending out a PR. |
I encountered some ParseError exceptions when I tried to parse the ntriples files. Some of them are quite easy to be fixed during the runtime such as empty lines, codec issues, etc. I hope that the parser could pre-process the files and deal with these problems or ignore the invalid records. At least, we need to know which lines in our data file have problems. Because we cannot make sure that the downloaded files strictly follow the standard format. If the package just raises the exception without correcting it, it will take more time to parse the whole file. Maybe the impact of neglected data could be accepted when we are processing a large data set.
In my case, I directly modify this line of code. I insert a
continue
code here to let the program proceed. Otherwise, I cannot get the remaining data when I encounter a ParseError. I know that it is not a good way to skip this exception but it is the fastest way to continue my project. Hope that this suggestion would be accepted.The text was updated successfully, but these errors were encountered: