Skip to content

added SearchRecord custom functions for at-blocks #2672

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Rahban1
Copy link
Contributor

@Rahban1 Rahban1 commented Mar 31, 2025

this is an attempt to solve and potentially close #1929
my thought process was that right now the SearchRecord objects are created and added to the search index which are very general so I made custom function for adding SearchRecords and add special handling for MultiOutput which is @example block, MultiCodeBlock which is the @repl block and EvalNode which is the @eval block

@Rahban1
Copy link
Contributor Author

Rahban1 commented Mar 31, 2025

I can add the changelog and will fix the linting issues once we decide what is the most optimal solution :)

@mortenpi
Copy link
Member

mortenpi commented Apr 1, 2025

Just thinking out loud here. The methods you're adding, they are getting called here:

rec = SearchRecord(ctx, navnode, node, node.element)
push!(ctx.search_index, rec)

It feels like what we want is sometimes not to write any SearchRecord here. For example, for at-docs, we kinda get duplicate entries right now, one for the at-docs source, and one for the actual docstring.

    {
      "location": "",
      "page": "Example Site",
      "title": "Example Site",
      "text": "Main.DocString",
      "category": "page"
    },
    {
      "location": "#DocString",
      "page": "Example Site",
      "title": "DocString",
      "text": "Docstring\n\n\n\n\n\n",
      "category": "constant"
    }

The second one (actual docstring content) gets pushed here:

# push to search index
rec = SearchRecord(
ctx, navnode;
fragment = Documenter.anchor_fragment(docsnode.anchor),
title = string(docsnode.object.binding),
category = Documenter.doccat(docsnode.object),
text = mdflatten(mdast_node)
)
push!(ctx.search_index, rec)

So what I think we might be able to do is change these methods to functions (searchrecord, to differentiate from the constructor), which sometimes just returns a nothing. We'll add a searchrecord method for e.g. DocsNode to return nothing, and then in the first loop at the top, we just exclude any nothings. That should give us an easy way to exclude nodes at this level.

Comment on lines 726 to 737
output_text = ""
for child in children_array[2:end]
if isa(child.element, Documenter.MultiOutputElement) && isa(child.element.element, Dict)
for mime in [MIME"text/plain"(), MIME"text/markdown"(), MIME"text/html"()]
if haskey(child.element.element, mime)
output_text *= string(child.element.element[mime]) * " "
break
end
end
else
output_text *= mdflatten(child) * " "
end
Copy link
Member

@mortenpi mortenpi Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, for at-example blocks, we render both the input and the output. We probably want to put the source code into the index as well?

image

In general, though, I wonder if it's better to just add the search index entries in the actual domify methods, like here:

# Select the "best" representation for HTML output.
domify(dctx::DCtx, node::Node, ::Documenter.MultiOutput) = domify(dctx, node.children)
domify(dctx::DCtx, node::Node, moe::Documenter.MultiOutputElement) = Base.invokelatest(domify, dctx, node, moe.element)
function domify(dctx::DCtx, node::Node, d::Dict{MIME, Any})
rawhtml(code) = DOM.Tag(Symbol("#RAW#"))(code)
# Our first preference for the MIME type is 'text/html', which we can natively include
# in the HTML. But it might happen that it's too large (above example_size_threshold),
# in which case we move on to trying to write it as an image.
has_text_html = false
if haskey(d, MIME"text/html"())
if length(d[MIME"text/html"()]) < dctx.ctx.settings.example_size_threshold
# If the size threshold is met, we can just return right away
return rawhtml(d[MIME"text/html"()])
end
# We'll set has_text_html to distinguish it from the case where the 'text/html' MIME
# show() method was simply missing.
has_text_html = true
end
# If text/html failed, we try to write the output as an image, possibly to a file.
image = if haskey(d, MIME"image/svg+xml"())
@tags img
svg = d[MIME"image/svg+xml"()]
svg_tag_match = match(r"<svg[^>]*>", svg)
dom = if length(svg) >= dctx.ctx.settings.example_size_threshold
filename = write_data_file(dctx, svg; suffix = ".svg")
@assert !isnothing(filename)
img[:src => filename, :alt => "Example block output"]
elseif svg_tag_match === nothing
# There is no svg tag so we don't do any more advanced
# processing and just return the svg as HTML.
# The svg string should be invalid but that's not our concern here.
rawhtml(svg)
else
# The xmlns attribute has to be present for data:image/svg+xml
# to work (https://stackoverflow.com/questions/18467982).
# If it doesn't exist, we splice it into the first svg tag.
# This should never invalidate otherwise valid svg.
svg_tag = svg_tag_match.match
xmlns_present = occursin("xmlns", svg_tag)
if !xmlns_present
svg = replace(svg, "<svg" => "<svg xmlns=\"http://www.w3.org/2000/svg\"", count = 1)
end
# We can leave the svg as utf8, but the minimum safety precaution we need
# is to ensure the src string separator is not in the svg.
# That can be either " or ', and the svg will most likely use only one of them
# so we check which one occurs more often and use the other as the separator.
# This should leave most svg basically intact.
# Replace % with %25 and # with %23 https://github.com/jakubpawlowicz/clean-css/issues/763#issuecomment-215283553
svg = replace(svg, "%" => "%25")
svg = replace(svg, "#" => "%23")
singles = count(==('\''), svg)
doubles = count(==('"'), svg)
if singles > doubles
# Replace every " with %22 because it terminates the src=" string otherwise
svg = replace(svg, "\"" => "%22")
sep = "\""
else
# Replace every ' with %27 because it terminates the src=' string otherwise
svg = replace(svg, "\'" => "%27")
sep = "'"
end
rawhtml(string("<img src=", sep, "data:image/svg+xml;utf-8,", svg, sep, "/>"))
end
(; dom = dom, mime = "image/svg+xml")
elseif haskey(d, MIME"image/png"())
domify_show_image_binary(dctx, "png", d)
elseif haskey(d, MIME"image/webp"())
domify_show_image_binary(dctx, "webp", d)
elseif haskey(d, MIME"image/gif"())
domify_show_image_binary(dctx, "gif", d)
elseif haskey(d, MIME"image/jpeg"())
domify_show_image_binary(dctx, "jpeg", d)
end
# If image is nothing, then the object did not have any of the supported image
# representations. In that case, if the 'text/html' representation exists, we use that,
# but with a warning because it goes over the size limit. If 'text/html' was missing too,
# we carry on to additional MIME types.
if has_text_html && isnothing(image)
# The 'text/html' representation of an @example block is above the threshold, but no
# supported image representation is present as an alternative.
push!(
dctx.ctx.atexample_warnings,
AtExampleFallbackWarning(
page = dctx.navnode.page,
size_bytes = length(d[MIME"text/html"()]),
fallback = nothing,
)
)
return rawhtml(d[MIME"text/html"()])
elseif has_text_html && !isnothing(image)
# The 'text/html' representation of an @example block is above the threshold,
# falling back to '$(image.mime)' representation.
push!(
dctx.ctx.atexample_warnings,
AtExampleFallbackWarning(
page = dctx.navnode.page,
size_bytes = length(d[MIME"text/html"()]),
fallback = image.mime,
)
)
return image.dom
elseif !has_text_html && !isnothing(image)
return image.dom
end
# Check for some 'fallback' MIMEs, defaulting to 'text/plain' if we can't find any of them.
return if haskey(d, MIME"text/latex"())
latex = d[MIME"text/latex"()]
# If the show(io, ::MIME"text/latex", x) output is already wrapped in
# \[ ... \] or $$ ... $$, we unwrap it first, since when we output
# Markdown.LaTeX objects we put the correct delimiters around it anyway.
has_math, latex = _strip_latex_math_delimiters(latex)
out = if !has_math
Documenter.mdparse(latex; mode = :single)
else
[MarkdownAST.@ast MarkdownAST.DisplayMath(latex)]
end
domify(dctx, out)
elseif haskey(d, MIME"text/markdown"())
out = Markdown.parse(d[MIME"text/markdown"()])
out = convert(MarkdownAST.Node, out)
domify(dctx, out)
elseif haskey(d, MIME"text/plain"())
@tags pre
text = d[MIME"text/plain"()]
pre[".documenter-example-output"](domify_ansicoloredtext(text, "nohighlight hljs"))
else
error("this should never happen.")
end
end

The bonus is that the "rendered" output of a block is already present, so it would reduce duplication I think.

X-ref #2672 (comment) about how to remove the other entries.

@mortenpi
Copy link
Member

mortenpi commented Apr 1, 2025

Just one more note: as I was playing around with this PR, and to actually see what ends up in the search index, I put together a small toy build that also prettyfies the index, in case that might be useful: becf65c

In general (but probably not in this PR, to not scope creep this), I think we want some explicit tests for the search index generation, like with reference output etc.

@Rahban1
Copy link
Contributor Author

Rahban1 commented Apr 2, 2025

Just one more note: as I was playing around with this PR, and to actually see what ends up in the search index, I put together a small toy build that also prettyfies the index, in case that might be useful: becf65c

In general (but probably not in this PR, to not scope creep this), I think we want some explicit tests for the search index generation, like with reference output etc.

this is a great idea, I will make a new PR for tests after this PR's completion.

@Rahban1
Copy link
Contributor Author

Rahban1 commented Apr 2, 2025

this time I made a should_index_for_search function checking for at-blocks and only added the one which were not at blocks but still I don't think this issue is solved

"""
Filter function to prevent duplicate search entries.
"""
function should_index_for_search(navnode, node, element)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be sufficient to just check the type, no need to introspect into the objects? So something as simple as this should be enough probably:

exclude_from_search_index(element) = typeof(element) in (Documenter.MultiOutputElement, Documenter.MultiCodeBlock, Documenter.EvalNode)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove at-block source code from search index
2 participants