added SearchRecord custom functions for at-blocks #2672

Rahban1 · 2025-03-31T04:38:07Z

this is an attempt to solve and potentially close #1929
my thought process was that right now the SearchRecord objects are created and added to the search index which are very general so I made custom function for adding SearchRecords and add special handling for MultiOutput which is @example block, MultiCodeBlock which is the @repl block and EvalNode which is the @eval block

Rahban1 · 2025-03-31T04:40:25Z

I can add the changelog and will fix the linting issues once we decide what is the most optimal solution :)

mortenpi · 2025-04-01T02:50:15Z

Just thinking out loud here. The methods you're adding, they are getting called here:

Documenter.jl/src/html/HTMLWriter.jl

Lines 1683 to 1684 in 4328aec

    
           rec = SearchRecord(ctx, navnode, node, node.element) 
        
           push!(ctx.search_index, rec)

It feels like what we want is sometimes not to write any SearchRecord here. For example, for at-docs, we kinda get duplicate entries right now, one for the at-docs source, and one for the actual docstring.

    {
      "location": "",
      "page": "Example Site",
      "title": "Example Site",
      "text": "Main.DocString",
      "category": "page"
    },
    {
      "location": "#DocString",
      "page": "Example Site",
      "title": "DocString",
      "text": "Docstring\n\n\n\n\n\n",
      "category": "constant"
    }

The second one (actual docstring content) gets pushed here:

Documenter.jl/src/html/HTMLWriter.jl

Lines 1775 to 1783 in 4328aec

    
           # push to search index 
        
           rec = SearchRecord( 
        
               ctx, navnode; 
        
               fragment = Documenter.anchor_fragment(docsnode.anchor), 
        
               title = string(docsnode.object.binding), 
        
               category = Documenter.doccat(docsnode.object), 
        
               text = mdflatten(mdast_node) 
        
           ) 
        
           push!(ctx.search_index, rec)

So what I think we might be able to do is change these methods to functions (searchrecord, to differentiate from the constructor), which sometimes just returns a nothing. We'll add a searchrecord method for e.g. DocsNode to return nothing, and then in the first loop at the top, we just exclude any nothings. That should give us an easy way to exclude nodes at this level.

src/html/HTMLWriter.jl

mortenpi · 2025-04-01T02:38:13Z

src/html/HTMLWriter.jl

+    output_text = ""
+    for child in children_array[2:end]
+        if isa(child.element, Documenter.MultiOutputElement) && isa(child.element.element, Dict)
+            for mime in [MIME"text/plain"(), MIME"text/markdown"(), MIME"text/html"()]
+                if haskey(child.element.element, mime)
+                    output_text *= string(child.element.element[mime]) * " "
+                    break
+                end
+            end
+        else 
+            output_text *= mdflatten(child) * " "
+        end


So, for at-example blocks, we render both the input and the output. We probably want to put the source code into the index as well?

In general, though, I wonder if it's better to just add the search index entries in the actual domify methods, like here:

Documenter.jl/src/html/HTMLWriter.jl

Lines 2394 to 2527 in 4328aec

# Select the "best" representation for HTML output.

domify(dctx::DCtx, node::Node, ::Documenter.MultiOutput) = domify(dctx, node.children)

domify(dctx::DCtx, node::Node, moe::Documenter.MultiOutputElement) = Base.invokelatest(domify, dctx, node, moe.element)

function domify(dctx::DCtx, node::Node, d::Dict{MIME, Any})

rawhtml(code) = DOM.Tag(Symbol("#RAW#"))(code)

# Our first preference for the MIME type is 'text/html', which we can natively include

# in the HTML. But it might happen that it's too large (above example_size_threshold),

# in which case we move on to trying to write it as an image.

has_text_html = false

if haskey(d, MIME"text/html"())

if length(d[MIME"text/html"()]) < dctx.ctx.settings.example_size_threshold

# If the size threshold is met, we can just return right away

return rawhtml(d[MIME"text/html"()])

end

# We'll set has_text_html to distinguish it from the case where the 'text/html' MIME

# show() method was simply missing.

has_text_html = true

end

# If text/html failed, we try to write the output as an image, possibly to a file.

image = if haskey(d, MIME"image/svg+xml"())

@tags img

svg = d[MIME"image/svg+xml"()]

svg_tag_match = match(r"<svg[^>]*>", svg)

dom = if length(svg) >= dctx.ctx.settings.example_size_threshold

filename = write_data_file(dctx, svg; suffix = ".svg")

@assert !isnothing(filename)

img[:src => filename, :alt => "Example block output"]

elseif svg_tag_match === nothing

# There is no svg tag so we don't do any more advanced

# processing and just return the svg as HTML.

# The svg string should be invalid but that's not our concern here.

rawhtml(svg)

else

# The xmlns attribute has to be present for data:image/svg+xml

# to work (https://stackoverflow.com/questions/18467982).

# If it doesn't exist, we splice it into the first svg tag.

# This should never invalidate otherwise valid svg.

svg_tag = svg_tag_match.match

xmlns_present = occursin("xmlns", svg_tag)

if !xmlns_present

svg = replace(svg, "<svg" => "<svg xmlns=\"http://www.w3.org/2000/svg\"", count = 1)

end

# We can leave the svg as utf8, but the minimum safety precaution we need

# is to ensure the src string separator is not in the svg.

# That can be either " or ', and the svg will most likely use only one of them

# so we check which one occurs more often and use the other as the separator.

# This should leave most svg basically intact.

# Replace % with %25 and # with %23 https://github.com/jakubpawlowicz/clean-css/issues/763#issuecomment-215283553

svg = replace(svg, "%" => "%25")

svg = replace(svg, "#" => "%23")

singles = count(==('\''), svg)

doubles = count(==('"'), svg)

if singles > doubles

# Replace every " with %22 because it terminates the src=" string otherwise

svg = replace(svg, "\"" => "%22")

sep = "\""

else

# Replace every ' with %27 because it terminates the src=' string otherwise

svg = replace(svg, "\'" => "%27")

sep = "'"

end

rawhtml(string("<img src=", sep, "data:image/svg+xml;utf-8,", svg, sep, "/>"))

end

(; dom = dom, mime = "image/svg+xml")

elseif haskey(d, MIME"image/png"())

domify_show_image_binary(dctx, "png", d)

elseif haskey(d, MIME"image/webp"())

domify_show_image_binary(dctx, "webp", d)

elseif haskey(d, MIME"image/gif"())

domify_show_image_binary(dctx, "gif", d)

elseif haskey(d, MIME"image/jpeg"())

domify_show_image_binary(dctx, "jpeg", d)

end

# If image is nothing, then the object did not have any of the supported image

# representations. In that case, if the 'text/html' representation exists, we use that,

# but with a warning because it goes over the size limit. If 'text/html' was missing too,

# we carry on to additional MIME types.

if has_text_html && isnothing(image)

# The 'text/html' representation of an @example block is above the threshold, but no

# supported image representation is present as an alternative.

push!(

dctx.ctx.atexample_warnings,

AtExampleFallbackWarning(

page = dctx.navnode.page,

size_bytes = length(d[MIME"text/html"()]),

fallback = nothing,

)

)

return rawhtml(d[MIME"text/html"()])

elseif has_text_html && !isnothing(image)

# The 'text/html' representation of an @example block is above the threshold,

# falling back to '$(image.mime)' representation.

push!(

dctx.ctx.atexample_warnings,

AtExampleFallbackWarning(

page = dctx.navnode.page,

size_bytes = length(d[MIME"text/html"()]),

fallback = image.mime,

)

)

return image.dom

elseif !has_text_html && !isnothing(image)

return image.dom

end

# Check for some 'fallback' MIMEs, defaulting to 'text/plain' if we can't find any of them.

return if haskey(d, MIME"text/latex"())

latex = d[MIME"text/latex"()]

# If the show(io, ::MIME"text/latex", x) output is already wrapped in

# \[ ... \] or $$ ... $$, we unwrap it first, since when we output

# Markdown.LaTeX objects we put the correct delimiters around it anyway.

has_math, latex = _strip_latex_math_delimiters(latex)

out = if !has_math

Documenter.mdparse(latex; mode = :single)

else

[MarkdownAST.@ast MarkdownAST.DisplayMath(latex)]

end

domify(dctx, out)

elseif haskey(d, MIME"text/markdown"())

out = Markdown.parse(d[MIME"text/markdown"()])

out = convert(MarkdownAST.Node, out)

domify(dctx, out)

elseif haskey(d, MIME"text/plain"())

@tags pre

text = d[MIME"text/plain"()]

pre[".documenter-example-output"](domify_ansicoloredtext(text, "nohighlight hljs"))

else

error("this should never happen.")

end

end

The bonus is that the "rendered" output of a block is already present, so it would reduce duplication I think.

X-ref #2672 (comment) about how to remove the other entries.

mortenpi · 2025-04-01T02:53:41Z

Just one more note: as I was playing around with this PR, and to actually see what ends up in the search index, I put together a small toy build that also prettyfies the index, in case that might be useful: becf65c

In general (but probably not in this PR, to not scope creep this), I think we want some explicit tests for the search index generation, like with reference output etc.

Rahban1 · 2025-04-02T19:51:11Z

Just one more note: as I was playing around with this PR, and to actually see what ends up in the search index, I put together a small toy build that also prettyfies the index, in case that might be useful: becf65c

In general (but probably not in this PR, to not scope creep this), I think we want some explicit tests for the search index generation, like with reference output etc.

this is a great idea, I will make a new PR for tests after this PR's completion.

Rahban1 · 2025-04-02T19:56:53Z

this time I made a should_index_for_search function checking for at-blocks and only added the one which were not at blocks but still I don't think this issue is solved

mortenpi · 2025-04-02T20:42:29Z

src/html/HTMLWriter.jl

+"""
+Filter function to prevent duplicate search entries.
+"""
+function should_index_for_search(navnode, node, element)


I think it should be sufficient to just check the type, no need to introspect into the objects? So something as simple as this should be enough probably:

exclude_from_search_index(element) = typeof(element) in (Documenter.MultiOutputElement, Documenter.MultiCodeBlock, Documenter.EvalNode)

added SearchRecord custom functions for at-blocks

44a74f4

mortenpi reviewed Apr 1, 2025

View reviewed changes

tried fixing it

32e47b4

mortenpi reviewed Apr 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added SearchRecord custom functions for at-blocks #2672

added SearchRecord custom functions for at-blocks #2672

Rahban1 commented Mar 31, 2025

Rahban1 commented Mar 31, 2025

mortenpi commented Apr 1, 2025

mortenpi Apr 1, 2025 •

edited

Loading

mortenpi commented Apr 1, 2025

Rahban1 commented Apr 2, 2025

Rahban1 commented Apr 2, 2025

mortenpi Apr 2, 2025

	# Select the "best" representation for HTML output.
	domify(dctx::DCtx, node::Node, ::Documenter.MultiOutput) = domify(dctx, node.children)
	domify(dctx::DCtx, node::Node, moe::Documenter.MultiOutputElement) = Base.invokelatest(domify, dctx, node, moe.element)

	function domify(dctx::DCtx, node::Node, d::Dict{MIME, Any})
	rawhtml(code) = DOM.Tag(Symbol("#RAW#"))(code)
	# Our first preference for the MIME type is 'text/html', which we can natively include
	# in the HTML. But it might happen that it's too large (above example_size_threshold),
	# in which case we move on to trying to write it as an image.
	has_text_html = false
	if haskey(d, MIME"text/html"())
	if length(d[MIME"text/html"()]) < dctx.ctx.settings.example_size_threshold
	# If the size threshold is met, we can just return right away
	return rawhtml(d[MIME"text/html"()])
	end
	# We'll set has_text_html to distinguish it from the case where the 'text/html' MIME
	# show() method was simply missing.
	has_text_html = true
	end
	# If text/html failed, we try to write the output as an image, possibly to a file.
	image = if haskey(d, MIME"image/svg+xml"())
	@tags img
	svg = d[MIME"image/svg+xml"()]
	svg_tag_match = match(r"<svg[^>]*>", svg)
	dom = if length(svg) >= dctx.ctx.settings.example_size_threshold
	filename = write_data_file(dctx, svg; suffix = ".svg")
	@assert !isnothing(filename)
	img[:src => filename, :alt => "Example block output"]
	elseif svg_tag_match === nothing
	# There is no svg tag so we don't do any more advanced
	# processing and just return the svg as HTML.
	# The svg string should be invalid but that's not our concern here.
	rawhtml(svg)
	else
	# The xmlns attribute has to be present for data:image/svg+xml
	# to work (https://stackoverflow.com/questions/18467982).
	# If it doesn't exist, we splice it into the first svg tag.
	# This should never invalidate otherwise valid svg.
	svg_tag = svg_tag_match.match
	xmlns_present = occursin("xmlns", svg_tag)
	if !xmlns_present
	svg = replace(svg, "<svg" => "<svg xmlns=\"http://www.w3.org/2000/svg\"", count = 1)
	end

	# We can leave the svg as utf8, but the minimum safety precaution we need
	# is to ensure the src string separator is not in the svg.
	# That can be either " or ', and the svg will most likely use only one of them
	# so we check which one occurs more often and use the other as the separator.
	# This should leave most svg basically intact.

	# Replace % with %25 and # with %23 https://github.com/jakubpawlowicz/clean-css/issues/763#issuecomment-215283553
	svg = replace(svg, "%" => "%25")
	svg = replace(svg, "#" => "%23")

	singles = count(==('\''), svg)
	doubles = count(==('"'), svg)
	if singles > doubles
	# Replace every " with %22 because it terminates the src=" string otherwise
	svg = replace(svg, "\"" => "%22")
	sep = "\""
	else
	# Replace every ' with %27 because it terminates the src=' string otherwise
	svg = replace(svg, "\'" => "%27")
	sep = "'"
	end

	rawhtml(string("<img src=", sep, "data:image/svg+xml;utf-8,", svg, sep, "/>"))
	end
	(; dom = dom, mime = "image/svg+xml")
	elseif haskey(d, MIME"image/png"())
	domify_show_image_binary(dctx, "png", d)
	elseif haskey(d, MIME"image/webp"())
	domify_show_image_binary(dctx, "webp", d)
	elseif haskey(d, MIME"image/gif"())
	domify_show_image_binary(dctx, "gif", d)
	elseif haskey(d, MIME"image/jpeg"())
	domify_show_image_binary(dctx, "jpeg", d)
	end
	# If image is nothing, then the object did not have any of the supported image
	# representations. In that case, if the 'text/html' representation exists, we use that,
	# but with a warning because it goes over the size limit. If 'text/html' was missing too,
	# we carry on to additional MIME types.
	if has_text_html && isnothing(image)
	# The 'text/html' representation of an @example block is above the threshold, but no
	# supported image representation is present as an alternative.
	push!(
	dctx.ctx.atexample_warnings,
	AtExampleFallbackWarning(
	page = dctx.navnode.page,
	size_bytes = length(d[MIME"text/html"()]),
	fallback = nothing,
	)
	)
	return rawhtml(d[MIME"text/html"()])
	elseif has_text_html && !isnothing(image)
	# The 'text/html' representation of an @example block is above the threshold,
	# falling back to '$(image.mime)' representation.
	push!(
	dctx.ctx.atexample_warnings,
	AtExampleFallbackWarning(
	page = dctx.navnode.page,
	size_bytes = length(d[MIME"text/html"()]),
	fallback = image.mime,
	)
	)
	return image.dom
	elseif !has_text_html && !isnothing(image)
	return image.dom
	end
	# Check for some 'fallback' MIMEs, defaulting to 'text/plain' if we can't find any of them.
	return if haskey(d, MIME"text/latex"())
	latex = d[MIME"text/latex"()]
	# If the show(io, ::MIME"text/latex", x) output is already wrapped in
	# \[ ... \] or $$ ... $$, we unwrap it first, since when we output
	# Markdown.LaTeX objects we put the correct delimiters around it anyway.
	has_math, latex = _strip_latex_math_delimiters(latex)
	out = if !has_math
	Documenter.mdparse(latex; mode = :single)
	else
	[MarkdownAST.@ast MarkdownAST.DisplayMath(latex)]
	end
	domify(dctx, out)
	elseif haskey(d, MIME"text/markdown"())
	out = Markdown.parse(d[MIME"text/markdown"()])
	out = convert(MarkdownAST.Node, out)
	domify(dctx, out)
	elseif haskey(d, MIME"text/plain"())
	@tags pre
	text = d[MIME"text/plain"()]
	pre[".documenter-example-output"](domify_ansicoloredtext(text, "nohighlight hljs"))
	else
	error("this should never happen.")
	end
	end

added SearchRecord custom functions for at-blocks #2672

Are you sure you want to change the base?

added SearchRecord custom functions for at-blocks #2672

Conversation

Rahban1 commented Mar 31, 2025

Rahban1 commented Mar 31, 2025

mortenpi commented Apr 1, 2025

mortenpi Apr 1, 2025 • edited Loading

Choose a reason for hiding this comment

mortenpi commented Apr 1, 2025

Rahban1 commented Apr 2, 2025

Rahban1 commented Apr 2, 2025

mortenpi Apr 2, 2025

Choose a reason for hiding this comment

mortenpi Apr 1, 2025 •

edited

Loading