We shouldn't index some parts of the page which are not relevant:

This is a hard feature since all this comes from online, and is a bit different in every project.
Note that default HTML parser from libzim (I don't recall if we use it or not) ignores everything inside <!-- htdig_noindex -->...<!-- /htdig_noindex -->