Skip to content

Commit

Permalink
Update html printing (#115)
Browse files Browse the repository at this point in the history
  • Loading branch information
svilupp authored Mar 27, 2024
1 parent 06b1abe commit 19bcaad
Show file tree
Hide file tree
Showing 7 changed files with 180 additions and 7 deletions.
9 changes: 7 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,17 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Added

### Fixed

## [0.17.0]

### Added
- Added support for `aigenerate` with Anthropic API. Preset model aliases are `claudeo`, `claudes`, and `claudeh`, for Claude 3 Opus, Sonnet, and Haiku, respectively.
- Enabled the GoogleGenAI extension since `GoogleGenAI.jl` is now officially registered. You can use `aigenerate` by setting the model to `gemini` and providing the `GOOGLE_API_KEY` environment variable.
- Added utilities to make preparation of finetuning datasets easier. You can now export your conversations in JSONL format with ShareGPT formatting (eg, for Axolotl). See `?PT.save_conversations` for more information.

### Fixed
- Added `print_html` utility for RAGTools module to print HTML-styled RAG answer annotations for web applications (eg, Genie.jl). See `?PromptingTools.Experimental.RAGTools.print_html` for more information and examples.

## [0.16.1]

Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -820,7 +820,7 @@ Fine-tuning is a powerful technique to adapt a model to your specific use case (

2. Once the finetuning time comes, create a bundle of ShareGPT-formatted conversations (common finetuning format) in a single `.jsonl` file. Use `PT.save_conversations("dataset.jsonl", [conversation1, conversation2, ...])` (notice that plural "conversationS" in the function name).

For an example of an end-to-end finetuning process, check out our sister project [JuliaLLMLeaderboard Finetuning experiment](https://github.com/svilupp/Julia-LLM-Leaderboard/blob/main/experiments/cheater-7b-finetune/README.md). It shows the process of finetuning for half a dollar with [Jarvislabs.ai](jarvislabs.ai) and [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl).
For an example of an end-to-end finetuning process, check out our sister project [JuliaLLMLeaderboard Finetuning experiment](https://github.com/svilupp/Julia-LLM-Leaderboard/blob/main/experiments/cheater-7b-finetune/README.md). It shows the process of finetuning for half a dollar with [Jarvislabs.ai](https://jarvislabs.ai/templates/axolotl) and [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl).

## Roadmap

Expand Down
2 changes: 2 additions & 0 deletions docs/src/extra_tools/rag_tools_intro.md
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,8 @@ SOURCES
5. Doc9
```

See `?print_html` for the HTML version of the pretty-printing and styling system, eg, when you want to display the results in a web application based on Genie.jl/Stipple.jl.

**How to read the output**
- Color legend:
- No color: High match with the context, can be trusted more
Expand Down
2 changes: 1 addition & 1 deletion docs/src/frequently_asked_questions.md
Original file line number Diff line number Diff line change
Expand Up @@ -407,4 +407,4 @@ Fine-tuning is a powerful technique to adapt a model to your specific use case (

2. Once the finetuning time comes, create a bundle of ShareGPT-formatted conversations (common finetuning format) in a single `.jsonl` file. Use `PT.save_conversations("dataset.jsonl", [conversation1, conversation2, ...])` (notice that plural "conversationS" in the function name).

For an example of an end-to-end finetuning process, check out our sister project [JuliaLLMLeaderboard Finetuning experiment](https://github.com/svilupp/Julia-LLM-Leaderboard/blob/main/experiments/cheater-7b-finetune/README.md). It shows the process of finetuning for half a dollar with [Jarvislabs.ai](jarvislabs.ai) and [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl).
For an example of an end-to-end finetuning process, check out our sister project [JuliaLLMLeaderboard Finetuning experiment](https://github.com/svilupp/Julia-LLM-Leaderboard/blob/main/experiments/cheater-7b-finetune/README.md). It shows the process of finetuning for half a dollar with [JarvisLabs.ai](https://jarvislabs.ai/templates/axolotl) and [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl).
2 changes: 1 addition & 1 deletion src/Experimental/RAGTools/RAGTools.jl
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ export airag, build_context!, generate!, refine!, answer!, postprocess!
export SimpleGenerator, AdvancedGenerator, RAGConfig
include("generation.jl")

export annotate_support, TrigramAnnotater
export annotate_support, TrigramAnnotater, print_html
include("annotation.jl")

export build_qa_evals, run_qa_evals
Expand Down
115 changes: 115 additions & 0 deletions src/Experimental/RAGTools/annotation.jl
Original file line number Diff line number Diff line change
Expand Up @@ -532,4 +532,119 @@ function annotate_support(
return annotate_support(
annotater, final_answer, result.context; min_score, skip_trigrams,
hashed, result.sources, min_source_score, add_sources, add_scores, kwargs...)
end

"""
print_html([io::IO,] parent_node::AbstractAnnotatedNode)
print_html([io::IO,] rag::AbstractRAGResult; add_sources::Bool = false,
add_scores::Bool = false, default_styler = HTMLStyler(),
low_styler = HTMLStyler(styles = "color:magenta", classes = ""),
medium_styler = HTMLStyler(styles = "color:blue", classes = ""),
high_styler = HTMLStyler(styles = "", classes = ""), styler_kwargs...)
Pretty-prints the annotation `parent_node` (or `RAGResult`) to the `io` stream (or returns the string) in HTML format (assumes node is styled with styler `HTMLStyler`).
It wraps each "token" into a span with requested styling (HTMLStyler's properties `classes` and `styles`).
It also replaces new lines with `<br>` for better HTML formatting.
For any non-HTML styler, it prints the content as plain text.
# Returns
- `nothing` if `io` is provided
- or the string with HTML-formatted text (if `io` is not provided, we print the result out)
See also `HTMLStyler`, `annotate_support`, and `set_node_style!` for how the styling is applied and what the arguments mean.
# Examples
Note: `RT` is an alias for `PromptingTools.Experimental.RAGTools`
Simple start directly with the `RAGResult`:
```julia
# set up the text/RAGResult
context = [
"This is a test context.", "Another context sentence.", "Final piece of context."]
answer = "This is a test answer. It has multiple sentences."
rag = RT.RAGResult(; context, final_answer=answer, question="")
# print the HTML
print_html(rag)
```
Low-level control by creating our `AnnotatedNode`:
```julia
# prepare your HTML styling
styler_kwargs = (;
default_styler=RT.HTMLStyler(),
low_styler=RT.HTMLStyler(styles="color:magenta", classes=""),
medium_styler=RT.HTMLStyler(styles="color:blue", classes=""),
high_styler=RT.HTMLStyler(styles="", classes=""))
# annotate the text
context = [
"This is a test context.", "Another context sentence.", "Final piece of context."]
answer = "This is a test answer. It has multiple sentences."
parent_node = RT.annotate_support(
RT.TrigramAnnotater(), answer, context; add_sources=false, add_scores=false, styler_kwargs...)
# print the HTML
print_html(parent_node)
# or to accumulate more nodes
io = IOBuffer()
print_html(io, parent_node)
```
"""
function print_html(io::IO, parent_node::AbstractAnnotatedNode)
print(io, "<div>")
for node in PreOrderDFS(parent_node)
## print out text only for leaf nodes (ie, with no children)
if isempty(node.children)
# create HTML style new lines
content = replace(node.content, "\n" => "<br>")
if node.style isa HTMLStyler
# HTML styler -> wrap each token into a span with requested styling
style_str = isempty(node.style.styles) ? "" :
" style=\"$(node.style.styles)\""
class_str = isempty(node.style.classes) ? "" :
" class=\"$(node.style.classes)\""
if isempty(class_str) && isempty(style_str)
print(io, content)
else
print(io,
"<span", style_str, class_str, ">$(content)</span>")
end
else
# print plain text
print(io, content)
end
end
end
print(io, "</div>")
return nothing
end

# utility for RAGResult
function print_html(io::IO, rag::AbstractRAGResult; add_sources::Bool = false,
add_scores::Bool = false, default_styler = HTMLStyler(),
low_styler = HTMLStyler(styles = "color:magenta", classes = ""),
medium_styler = HTMLStyler(styles = "color:blue", classes = ""),
high_styler = HTMLStyler(styles = "", classes = ""), styler_kwargs...)

# Create the annotation
parent_node = annotate_support(
TrigramAnnotater(), rag; add_sources, add_scores, default_styler,
low_styler, medium_styler, high_styler, styler_kwargs...)

# Print the HTML
print_html(io, parent_node)
end

# Non-io dispatch
function print_html(
rag_or_parent_node::Union{AbstractAnnotatedNode, AbstractRAGResult}; kwargs...)
io = IOBuffer()
print_html(io, rag_or_parent_node)
String(take!(io))
end
55 changes: 53 additions & 2 deletions test/Experimental/RAGTools/annotation.jl
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ using PromptingTools.Experimental.RAGTools: AnnotatedNode, AbstractAnnotater,
set_node_style!,
align_node_styles!, TrigramAnnotater, Styler,
HTMLStyler,
pprint
pprint, print_html
using PromptingTools.Experimental.RAGTools: trigram_support!, add_node_metadata!,
annotate_support, RAGResult, text_to_trigrams

Expand Down Expand Up @@ -350,4 +350,55 @@ end
# Invalid types
struct Random123Annotater <: AbstractAnnotater end
@test_throws ArgumentError annotate_support(Random123Annotater(), "test", context)
end
end

@testset "print_html" begin
# Test for plain text without any HTML styler
node = AnnotatedNode(content = "text\nNew line", score = 0.5)
str = print_html(node)
@test str == "<div>text<br>New line</div>"

# Test for single HTMLStyler with no new lines
styler = HTMLStyler(styles = "font-weight:bold", classes = "highlight")
node = AnnotatedNode(content = "text\nNew line", score = 0.5, style = styler)
str = print_html(node)
@test str ==
"<div><span style=\"font-weight:bold\" class=\"highlight\">text<br>New line</span></div>"

# Test for HTMLStyler without styling
styler = HTMLStyler()
node = AnnotatedNode(content = "text\nNew line", score = 0.5, style = styler)
str = print_html(node)
@test str == "<div>text<br>New line</div>"

styler = HTMLStyler(styles = "color:red", classes = "error")
node = AnnotatedNode(
content = "Error message\nSecond line", score = 0.5, style = styler)
str = print_html(node)
@test str ==
"<div><span style=\"color:red\" class=\"error\">Error message<br>Second line</span></div>"

## Test with proper highlighting of context and answer
styler_kwargs = (;
default_styler = HTMLStyler(),
low_styler = HTMLStyler(styles = "color:magenta", classes = ""),
medium_styler = HTMLStyler(styles = "color:blue", classes = ""),
high_styler = HTMLStyler(styles = "", classes = ""))

# annotate the text
context = [
"This is a test context.", "Another context sentence.", "Final piece of context."]
answer = "This is a test answer. It has multiple sentences."

parent_node = annotate_support(
TrigramAnnotater(), answer, context; add_sources = false, add_scores = false, styler_kwargs...)

# print the HTML
str = print_html(parent_node)
expected_output = "<div>This is a test <span style=\"color:magenta\">answer</span>. <span style=\"color:magenta\">It</span> has <span style=\"color:magenta\">multiple</span> <span style=\"color:blue\">sentences</span>.</div>"
@test str == expected_output
# Test RAGResult overload
rag = RAGResult(; context, final_answer = answer, question = "")
str = print_html(rag)
@test str == expected_output
end

2 comments on commit 19bcaad

@svilupp
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JuliaRegistrator register

Release notes:

Added

  • Added support for aigenerate with Anthropic API. Preset model aliases are claudeo, claudes, and claudeh, for Claude 3 Opus, Sonnet, and Haiku, respectively.
  • Enabled the GoogleGenAI extension since GoogleGenAI.jl is now officially registered. You can use aigenerate by setting the model to gemini and providing the GOOGLE_API_KEY environment variable.
  • Added utilities to make preparation of finetuning datasets easier. You can now export your conversations in JSONL format with ShareGPT formatting (eg, for Axolotl). See ?PT.save_conversations for more information.
  • Added print_html utility for RAGTools module to print HTML-styled RAG answer annotations for web applications (eg, Genie.jl). See ?PromptingTools.Experimental.RAGTools.print_html for more information and examples.

Commits

@JuliaRegistrator
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registration pull request created: JuliaRegistries/General/103708

Tagging

After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.

This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:

git tag -a v0.17.0 -m "<description of version>" 19bcaad82704993de35a923aa6ca297f958a3240
git push origin v0.17.0

Please sign in to comment.