Update html printing (#115)

svilupp · Mar 27, 2024 · 19bcaad · 19bcaad · svilupp · Mar 27, 2024
1 parent 06b1abe
commit 19bcaad
Show file tree

Hide file tree

Showing 7 changed files with 180 additions and 7 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -6,12 +6,17 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ## [Unreleased]
 
+### Added
+
+### Fixed
+
+## [0.17.0]
+
 ### Added
 - Added support for `aigenerate` with Anthropic API. Preset model aliases are `claudeo`, `claudes`, and `claudeh`, for Claude 3 Opus, Sonnet, and Haiku, respectively.
 - Enabled the GoogleGenAI extension since `GoogleGenAI.jl` is now officially registered. You can use `aigenerate` by setting the model to `gemini` and providing the `GOOGLE_API_KEY` environment variable.
 - Added utilities to make preparation of finetuning datasets easier. You can now export your conversations in JSONL format with ShareGPT formatting (eg, for Axolotl). See `?PT.save_conversations` for more information.
-
-### Fixed
+- Added `print_html` utility for RAGTools module to print HTML-styled RAG answer annotations for web applications (eg, Genie.jl). See `?PromptingTools.Experimental.RAGTools.print_html` for more information and examples.
 
 ## [0.16.1]
 

diff --git a/README.md b/README.md
@@ -820,7 +820,7 @@ Fine-tuning is a powerful technique to adapt a model to your specific use case (
 
 2. Once the finetuning time comes, create a bundle of ShareGPT-formatted conversations (common finetuning format) in a single `.jsonl` file. Use `PT.save_conversations("dataset.jsonl", [conversation1, conversation2, ...])` (notice that plural "conversationS" in the function name).
 
-For an example of an end-to-end finetuning process, check out our sister project [JuliaLLMLeaderboard Finetuning experiment](https://github.com/svilupp/Julia-LLM-Leaderboard/blob/main/experiments/cheater-7b-finetune/README.md). It shows the process of finetuning for half a dollar with [Jarvislabs.ai](jarvislabs.ai) and [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl).
+For an example of an end-to-end finetuning process, check out our sister project [JuliaLLMLeaderboard Finetuning experiment](https://github.com/svilupp/Julia-LLM-Leaderboard/blob/main/experiments/cheater-7b-finetune/README.md). It shows the process of finetuning for half a dollar with [Jarvislabs.ai](https://jarvislabs.ai/templates/axolotl) and [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl).
 
 ## Roadmap
 

diff --git a/docs/src/extra_tools/rag_tools_intro.md b/docs/src/extra_tools/rag_tools_intro.md
@@ -145,6 +145,8 @@ SOURCES
 5. Doc9
 ```
 
+See `?print_html` for the HTML version of the pretty-printing and styling system, eg, when you want to display the results in a web application based on Genie.jl/Stipple.jl.
+
 **How to read the output**
 - Color legend:
   - No color: High match with the context, can be trusted more

diff --git a/docs/src/frequently_asked_questions.md b/docs/src/frequently_asked_questions.md
@@ -407,4 +407,4 @@ Fine-tuning is a powerful technique to adapt a model to your specific use case (
 
 2. Once the finetuning time comes, create a bundle of ShareGPT-formatted conversations (common finetuning format) in a single `.jsonl` file. Use `PT.save_conversations("dataset.jsonl", [conversation1, conversation2, ...])` (notice that plural "conversationS" in the function name).
 
-For an example of an end-to-end finetuning process, check out our sister project [JuliaLLMLeaderboard Finetuning experiment](https://github.com/svilupp/Julia-LLM-Leaderboard/blob/main/experiments/cheater-7b-finetune/README.md). It shows the process of finetuning for half a dollar with [Jarvislabs.ai](jarvislabs.ai) and [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl).
+For an example of an end-to-end finetuning process, check out our sister project [JuliaLLMLeaderboard Finetuning experiment](https://github.com/svilupp/Julia-LLM-Leaderboard/blob/main/experiments/cheater-7b-finetune/README.md). It shows the process of finetuning for half a dollar with [JarvisLabs.ai](https://jarvislabs.ai/templates/axolotl) and [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl).
diff --git a/src/Experimental/RAGTools/RAGTools.jl b/src/Experimental/RAGTools/RAGTools.jl
@@ -43,7 +43,7 @@ export airag, build_context!, generate!, refine!, answer!, postprocess!
 export SimpleGenerator, AdvancedGenerator, RAGConfig
 include("generation.jl")
 
-export annotate_support, TrigramAnnotater
+export annotate_support, TrigramAnnotater, print_html
 include("annotation.jl")
 
 export build_qa_evals, run_qa_evals

diff --git a/src/Experimental/RAGTools/annotation.jl b/src/Experimental/RAGTools/annotation.jl
@@ -532,4 +532,119 @@ function annotate_support(
     return annotate_support(
         annotater, final_answer, result.context; min_score, skip_trigrams,
         hashed, result.sources, min_source_score, add_sources, add_scores, kwargs...)
+end
+
+"""
+    print_html([io::IO,] parent_node::AbstractAnnotatedNode)
+
+    print_html([io::IO,] rag::AbstractRAGResult; add_sources::Bool = false,
+        add_scores::Bool = false, default_styler = HTMLStyler(),
+        low_styler = HTMLStyler(styles = "color:magenta", classes = ""),
+        medium_styler = HTMLStyler(styles = "color:blue", classes = ""),
+        high_styler = HTMLStyler(styles = "", classes = ""), styler_kwargs...)
+
+Pretty-prints the annotation `parent_node` (or `RAGResult`) to the `io` stream (or returns the string) in HTML format (assumes node is styled with styler `HTMLStyler`).
+
+It wraps each "token" into a span with requested styling (HTMLStyler's properties `classes` and `styles`).
+It also replaces new lines with `<br>` for better HTML formatting.
+
+For any non-HTML styler, it prints the content as plain text.
+
+# Returns 
+- `nothing` if `io` is provided
+- or the string with HTML-formatted text (if `io` is not provided, we print the result out)
+
+See also `HTMLStyler`, `annotate_support`, and `set_node_style!` for how the styling is applied and what the arguments mean.
+
+# Examples
+Note: `RT` is an alias for `PromptingTools.Experimental.RAGTools`
+
+Simple start directly with the `RAGResult`:
+```julia
+# set up the text/RAGResult
+context = [
+    "This is a test context.", "Another context sentence.", "Final piece of context."]
+answer = "This is a test answer. It has multiple sentences."
+rag = RT.RAGResult(; context, final_answer=answer, question="")
+
+# print the HTML
+print_html(rag)
+```
+
+Low-level control by creating our `AnnotatedNode`:
+```julia
+# prepare your HTML styling
+styler_kwargs = (;
+    default_styler=RT.HTMLStyler(),
+    low_styler=RT.HTMLStyler(styles="color:magenta", classes=""),
+    medium_styler=RT.HTMLStyler(styles="color:blue", classes=""),
+    high_styler=RT.HTMLStyler(styles="", classes=""))
+
+# annotate the text
+context = [
+    "This is a test context.", "Another context sentence.", "Final piece of context."]
+answer = "This is a test answer. It has multiple sentences."
+
+parent_node = RT.annotate_support(
+    RT.TrigramAnnotater(), answer, context; add_sources=false, add_scores=false, styler_kwargs...)
+
+# print the HTML
+print_html(parent_node)
+
+# or to accumulate more nodes
+io = IOBuffer()
+print_html(io, parent_node)
+```
+"""
+function print_html(io::IO, parent_node::AbstractAnnotatedNode)
+    print(io, "<div>")
+    for node in PreOrderDFS(parent_node)
+        ## print out text only for leaf nodes (ie, with no children)
+        if isempty(node.children)
+            # create HTML style new lines
+            content = replace(node.content, "\n" => "<br>")
+            if node.style isa HTMLStyler
+                # HTML styler -> wrap each token into a span with requested styling
+                style_str = isempty(node.style.styles) ? "" :
+                            " style=\"$(node.style.styles)\""
+                class_str = isempty(node.style.classes) ? "" :
+                            " class=\"$(node.style.classes)\""
+                if isempty(class_str) && isempty(style_str)
+                    print(io, content)
+                else
+                    print(io,
+                        "<span", style_str, class_str, ">$(content)</span>")
+                end
+            else
+                # print plain text
+                print(io, content)
+            end
+        end
+    end
+    print(io, "</div>")
+    return nothing
+end
+
+# utility for RAGResult
+function print_html(io::IO, rag::AbstractRAGResult; add_sources::Bool = false,
+        add_scores::Bool = false, default_styler = HTMLStyler(),
+        low_styler = HTMLStyler(styles = "color:magenta", classes = ""),
+        medium_styler = HTMLStyler(styles = "color:blue", classes = ""),
+        high_styler = HTMLStyler(styles = "", classes = ""), styler_kwargs...)
+
+    # Create the annotation
+    parent_node = annotate_support(
+        TrigramAnnotater(), rag; add_sources, add_scores, default_styler,
+        low_styler, medium_styler, high_styler, styler_kwargs...)
+
+    # Print the HTML
+    print_html(io, parent_node)
+end
+
+# Non-io dispatch
+function print_html(
+        rag_or_parent_node::Union{AbstractAnnotatedNode, AbstractRAGResult}; kwargs...)
+    io = IOBuffer()
+    print_html(io, rag_or_parent_node)
+    String(take!(io))
 end
diff --git a/test/Experimental/RAGTools/annotation.jl b/test/Experimental/RAGTools/annotation.jl
@@ -3,7 +3,7 @@ using PromptingTools.Experimental.RAGTools: AnnotatedNode, AbstractAnnotater,
                                             set_node_style!,
                                             align_node_styles!, TrigramAnnotater, Styler,
                                             HTMLStyler,
-                                            pprint
+                                            pprint, print_html
 using PromptingTools.Experimental.RAGTools: trigram_support!, add_node_metadata!,
                                             annotate_support, RAGResult, text_to_trigrams
 
@@ -350,4 +350,55 @@ end
     # Invalid types
     struct Random123Annotater <: AbstractAnnotater end
     @test_throws ArgumentError annotate_support(Random123Annotater(), "test", context)
-end
+end
+
+@testset "print_html" begin
+    # Test for plain text without any HTML styler
+    node = AnnotatedNode(content = "text\nNew line", score = 0.5)
+    str = print_html(node)
+    @test str == "<div>text<br>New line</div>"
+
+    # Test for single HTMLStyler with no new lines
+    styler = HTMLStyler(styles = "font-weight:bold", classes = "highlight")
+    node = AnnotatedNode(content = "text\nNew line", score = 0.5, style = styler)
+    str = print_html(node)
+    @test str ==
+          "<div><span style=\"font-weight:bold\" class=\"highlight\">text<br>New line</span></div>"
+
+    # Test for HTMLStyler without styling
+    styler = HTMLStyler()
+    node = AnnotatedNode(content = "text\nNew line", score = 0.5, style = styler)
+    str = print_html(node)
+    @test str == "<div>text<br>New line</div>"
+
+    styler = HTMLStyler(styles = "color:red", classes = "error")
+    node = AnnotatedNode(
+        content = "Error message\nSecond line", score = 0.5, style = styler)
+    str = print_html(node)
+    @test str ==
+          "<div><span style=\"color:red\" class=\"error\">Error message<br>Second line</span></div>"
+
+    ## Test with proper highlighting of context and answer
+    styler_kwargs = (;
+        default_styler = HTMLStyler(),
+        low_styler = HTMLStyler(styles = "color:magenta", classes = ""),
+        medium_styler = HTMLStyler(styles = "color:blue", classes = ""),
+        high_styler = HTMLStyler(styles = "", classes = ""))
+
+    # annotate the text
+    context = [
+        "This is a test context.", "Another context sentence.", "Final piece of context."]
+    answer = "This is a test answer. It has multiple sentences."
+
+    parent_node = annotate_support(
+        TrigramAnnotater(), answer, context; add_sources = false, add_scores = false, styler_kwargs...)
+
+    # print the HTML
+    str = print_html(parent_node)
+    expected_output = "<div>This is a test <span style=\"color:magenta\">answer</span>. <span style=\"color:magenta\">It</span> has <span style=\"color:magenta\">multiple</span> <span style=\"color:blue\">sentences</span>.</div>"
+    @test str == expected_output
+    # Test RAGResult overload
+    rag = RAGResult(; context, final_answer = answer, question = "")
+    str = print_html(rag)
+    @test str == expected_output
+end