Open
Description
I've found at least a couple of bad json+ld that extruct can't read.
File "/cygdrive/d/recipeWorkspace/python/parsers.py", line 25, in readJsonLd
data = jslde.extract(html)
File "/usr/lib/python2.7/site-packages/extruct/jsonld.py", line 21, in extract
return self.extract_items(lxmldoc)
File "/usr/lib/python2.7/site-packages/extruct/jsonld.py", line 25, in extract_items
self._xp_jsonld(document))
File "/usr/lib/python2.7/site-packages/extruct/jsonld.py", line 35, in _extract_items
data = json.loads(HTML_OR_JS_COMMENTLINE.sub('', script))
File "/usr/lib/python2.7/json/__init__.py", line 339, in loads
return _default_decoder.decode(s)
File "/usr/lib/python2.7/json/decoder.py", line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python2.7/json/decoder.py", line 380, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Expecting , delimiter: line 20 column 778 (char 1342)
The reason are ellipsis inside the text. For example:
"recipeInstructions": [
"1. blablabla two "buttons".5. Dab Snowmen!"
]
Html allow this, but it's not possible to read it. Is there an easy way to correct similar issues automatically?