Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify BBCode tokenizing/parsing within Godot #10339

Open
RedMser opened this issue Aug 1, 2024 · 4 comments
Open

Unify BBCode tokenizing/parsing within Godot #10339

RedMser opened this issue Aug 1, 2024 · 4 comments

Comments

@RedMser
Copy link

RedMser commented Aug 1, 2024

Describe the project you are working on

Godot Editor

Describe the problem or limitation you are having in your project

While thinking about solutions for #10318, I've noticed that Godot has many ways a BBCode string is parsed or tokenized:

Each implementation has its own flavor of BBCode and supported tags and syntaxes. It makes changing the BBCode syntax or parsing it for a new purpose (e.g. conversion to different formats like markdown or HTML) very difficult.

Describe the feature / enhancement and how it helps to overcome the problem or limitation

Create a configurable BBCodeParser class which will be used to replace the individual parser implementations. Try to unify the different BBCode syntaxes as much as possible, still allowing configuration while being type-safe. From what I can tell, the intended syntax flavors are:

  1. RichTextLabel
  2. XML documentation (special tags like [Node2D] or [method blah])

I am not sure if XML documentation is supposed to be a superset of RichTextLabel.
As it stands, many tags are not supported, and the new documentation syntax is only very loosely interpretable as BBCode (it should be [method=blah] or [method name=blah] to be correct - but we could generally allow things like [color green] in the new parser too).

I would expose the BBCodeParser to scripting as well, unless there is opposition due to the number of classes it would introduce (see code snippets below).

Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams

Class definitions:

## Base class for BBCode parsing, this implements the basic syntax rules but knows no BBCode tags by default.
class BaseBBCodeParser extends Resource:
    var resource_local_to_scene: bool = true

    ## Tags that are known to the parser. Starts empty, can be manually extended.
    ## If a tag with an empty name is registered, its definition will be used as a fallback for any unknown tags.
    ## This can be used to implement documentation tags like [String] which interprets all unknown tags as self-closing.
    var registered_tags: Array[BBCodeTagDefinition]

    ## Can be called multiple times, parser keeps tag stack between calls.
    func push_bbcode(bbcode: String) -> BBCodeParserResult
    ## Automatically escapes BBCode tags, used for parsing plaintext.
    func push_text(text: String) -> BBCodeParserResult


## Registers BBCode tags known by the RichTextLabel.
class RichTextLabelBBCodeParser extends BaseBBCodeParser

## Registers cross-reference tags like [String] or [member abc], as well as any supported [b] or whatever tags.
class GodotDocBBCodeParser extends BaseBBCodeParser


class BBCodeTagDefinition extends Resource:
    var name: String
    ## Dictionary of dictionaries, structured like so:
    ## {
    ##    "parameter_name": {
    ##        "type": VariantType, # defaults to Variant
    ##        "required": bool, # defaults to false
    ##    }
    ## }
    ## Parameter name can be an empty string to allow an anonymous parameter like [color red] or [color=red]
    var parameters: Dictionary
    ## Enable this to allow tags like `[code][b][/code]` without escaping.
    var escape_contents: bool
    ## For tags like `[br]`.
    var is_self_closing: bool


class BBCodeTag extends RefCounted:
    var name: String
    ## Dictionary structured like so:
    ## {
    ##    "parameter_name": value,
    ## }
    ## Parameter name can be an empty string in the case of an anonymous parameter like [color red] or [color=red]
    var parameters: Dictionary


class BBCodeParserResult extends RefCounted:
    func get_error() -> BBCodeParserError

    ## Flat list of parsed items, each Dictionary is formatted like so:
    ## {
    ##    "text": String, # can be empty for self-closing tags
    ##    "styles": Array[BBCodeTag], # empty array in case of unstyled plain-text
    ## }
    func get_items() -> Array[Dictionary]

    ## Tree structure of the parsed BBCode. Returns all root-level elements.
    ## Each tree item is either a String or a BBCodeTreeItem.
    func get_tree() -> Array[Variant]


class BBCodeTreeItem extends BBCodeTag:
    ## Returns all items within the BBCode tag.
    ## Each tree item is either a String or a BBCodeTreeItem.
    func get_contents() -> Array[Variant]


enum BBCodeParserError {
    ## No errors while parsing.
    OK,

    ## General parse errors like missing tag name, missing closing bracket, etc.
    SYNTAX,

    TAG_UNKNOWN,
    TAG_UNCLOSED,

    PARAMETER_UNKNOWN,
    PARAMETER_TYPE_MISMATCH,
    REQUIRED_PARAMETER_MISSING,
}

Example usage:

# Initialization.
var parser = RichTextLabelBBCodeParser.new()
var result = parser.push_bbcode("hello [color=red]world[/color]")
assert(result.get_error() == BBCodeParserError.OK)

# 1. Flat list API:
for item in result.get_items():
    prints(item.text, len(item.styles))
    for style in item.styles:
        prints("TAG", style.name, style.parameters)

# This would print:

# hello 0
# world 1
# TAG color { "": "red" }

# 2. Tree API:
var root = result.get_tree()
prints(root[0]) # hello
prints(root[1].name, root[1].parameters) # color { "": "red" }
prints(root[1].get_contents()) # ["world"]

RichTextLabel would for example instantiate a RichTextLabelBBCodeParser in its constructor and add any RichTextEffects to the tag definition list.

Stringifying the parser results back to BBCode could be part of the base parser API as well.
But other conversion targets like Markdown or RST should be implemented by the caller of the parser.

Open discussion points:

  • Does RichTextLabel still need its own Item structure, or can we completely move to the parser's result structure? Does it need any other API or information to make the Item structure obsolete? For example maybe it needs to store the start and end location of every item...?
  • Do we need push and pop methods like RichTextLabel has them? Not something like push_bold, but a generic push_tag(tag: BBCodeTag).
  • Should the definitions API be replaced with a virtual method? e.g. for docs this could allow checking if [Abc] is a valid type directly when parsing.
  • Which state should the parser keep between calls?
    • Auto-close unclosed tags at the end of a push call?
    • Should we add a reset() or flush() method? It basically asserts "this is the end of the String, anything that's wrong now is an error".
    • Not sure how to unify this with the fact that we return a result after each push.
    • If the parser were stateless, it would be a lot easier, and make more sense with it being a Resource as well. But not sure how this API will be called in the codebase.
  • Allow selecting the escaping method. Possibilities:
    • [lb] and [rb] this is how RichTextLabel does it
    • [[ and ]]
    • [[] and []]
    • \[ and \] this is how BBob does it
  • Error handling. Should we keep going on an error if possible, or cancel the parse? Accumulate errors and return an array instead?

If this enhancement will not be used often, can it be worked around with a few lines of script?

Yes but writing a new feature-complete BBCode parser is a lot of effort to maintain and prone to diverge from other implementations.

Is there a reason why this should be core and not an add-on in the asset library?

BBCode parsers are already in core. This is a refactor combined with exposing the new class to scripting for user's convenience.

@dalexeev
Copy link
Member

dalexeev commented Aug 2, 2024

I had a similar idea. We could add an abstract SyntaxParser and default implementations for BBCode and Godot Doc BBCode. Users could then implement their own parsers for RichTextLabel, for example to add Markdown support.

Notes
SyntaxParser (core) extends Resource, resource_local_to_scene = true
  BaseBBCodeParser (core)
    RichTextLabelBBCodeParser (scene)
    GodotDocBBCodeParser (editor)
      EditorHelpBBCodeParser (editor)
      and other parsers (C#, GDExtension bindings)

SyntaxParser
============
Abstract syntax parser.

virtual void _set_source_code(const String &p_source_code);
virtual void _parse();
void set_source_code(const String &p_source_code);
void parse();


BaseBBCodeParser
================
Abstract BBCode parser, like XMLParser. Doesn't know about specific RichTextLabel tags.

virtual void _set_source_code(const String &p_source_code) override;
virtual void _parse() override;
virtual void _parse_tag(const String &p_name, bool p_is_open, const Dictionary &p_params);
virtual void _parse_text(const String &p_text);


RichTextLabelBBCodeParser
=========================
Outputs BBCode to RichTextLabel with push_*(), pop(), and add_*() methods.

virtual void _parse_tag(const String &p_name, bool p_is_open, const Dictionary &p_params) override;
virtual void _parse_text(const String &p_text);
void set_rich_text_label(RichTextLabel *p_rich_text_label);


GodotDocBBCodeParser
====================
Abstract Godot Doc BBCode parser. Knows cross-reference tags, [param], etc.

virtual void _parse_tag(const String &p_name, bool p_is_open, const Dictionary &p_params) override;
virtual void _parse_text(const String &p_text);


EditorHelpBBCodeParser
======================
Outputs Godot Doc BBCode to Editor Help with push_*(), pop(), and add_*() methods.

virtual void _parse_tag(const String &p_name, bool p_is_open, const Dictionary &p_params) override;
virtual void _parse_text(const String &p_text);
void set_rich_text_label(RichTextLabel *p_rich_text_label);

If the RichTextLabel.syntax_parser property is not set, RichTextLabel uses the default RichTextLabelBBCodeParser.

@Calinou
Copy link
Member

Calinou commented Aug 2, 2024

This might be useful for proper BBCode-to-ANSI conversion as well (which is needed for print_rich() to fully support terminal output correctly).

@RedMser
Copy link
Author

RedMser commented Aug 2, 2024

@dalexeev I like your proposal to make this more focused on inheritence, although the API would have to be well thought out to make it worth expanding to other domains on in the future.
I'd personally remove the generic SyntaxParser base class. Too many domain specific demands, like parsing PackedByteArray or streaming data. Something like XML, JSON or GDScript is unlikely to be able to switch to a new base class like this one.

Also Resource makes sense, for re-using parsers and storing their configurations in tres files.

Not sure why you need to set the RTL instance, it should probably only be a one-directional relationship, and RTL updates properties on the parser instance to add custom richtexteffects.

I'll update my proposal soon accordingly. Then I'll ask for feedback from RTL maintainers, also since I saw a new PR for BBCode stringify appear.

@RedMser
Copy link
Author

RedMser commented Aug 5, 2024

@bruvzg Hi, I'd like your feedback on this proposal if you have the time, since you're very deep in the code of RichTextLabel and BBCode.

I'm mainly looking for answers of the open questions near the end of the issue. But a look over the code API might make sense as well, since it would be great to expose to scripting / GDExtensions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants