Skip to content

LuceneSearcher.from_prebuilt_index returns empty contents #1250

Answered by lintool
joelrorseth asked this question in Q&A
Discussion options

You must be logged in to vote

raw stores the raw document in its original format, contents stores the "parsed" document. So, for example, raw might give the original HTML doc, and contents provides what's actually indexed after tag cleanup. Thus, you're always able to reconstruct contents form raw (i.e., just re-parse the document), but not vice versa. For this reason, we only store raw in the prebuilt indexes.

In this case, you get contents from raw by parsing out the JSON and pulling out the right field.

If you want an index with contents but not raw, you'll have to build a fresh index yourself.

Hope this helps!

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@joelrorseth
Comment options

Answer selected by joelrorseth
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants