Skip to content

perf: pre-compile regex patterns in markdownToTelegramHTML#787

Open
chillum-codeX wants to merge 1 commit intosipeed:mainfrom
chillum-codeX:perf/precompile-regex
Open

perf: pre-compile regex patterns in markdownToTelegramHTML#787
chillum-codeX wants to merge 1 commit intosipeed:mainfrom
chillum-codeX:perf/precompile-regex

Conversation

@chillum-codeX
Copy link

perf: pre-compile regex patterns in markdownToTelegramHTML

Problem

The markdownToTelegramHTML(), extractCodeBlocks(), and extractInlineCodes() functions in pkg/channels/telegram.go compile 9 regex patterns from scratch on every call using regexp.MustCompile() inside the function body.

Each regexp.MustCompile() allocates ~2-4 KB on the heap for the compiled automaton. For a single outbound message, this creates ~20 KB of unnecessary heap allocation that immediately becomes garbage, increasing GC pressure.

Solution

Move all 9 regexp.MustCompile() calls to package-level var declarations. The compiled patterns are created once at program start and reused across all calls. This is safe because Go's regexp.Regexp is goroutine-safe.

Before

func markdownToTelegramHTML(text string) string {
    text = regexp.MustCompile(`^#{1,6}\s+(.+)$`).ReplaceAllString(text, "$1")
    text = regexp.MustCompile(`^>\s*(.*)$`).ReplaceAllString(text, "$1")
    // ... 5 more inline compilations
}

func extractCodeBlocks(text string) codeBlockMatch {
    re := regexp.MustCompile("```[\\w]*\\n?([\\s\\S]*?)```")
    // ...
}

After

var (
    reHeader     = regexp.MustCompile(`^#{1,6}\s+(.+)$`)
    reBlockquote = regexp.MustCompile(`^>\s*(.*)$`)
    reLink       = regexp.MustCompile(`\[([^\]]+)\]\(([^)]+)\)`)
    reBold       = regexp.MustCompile(`\*\*(.+?)\*\*`)
    reBoldAlt    = regexp.MustCompile(`__(.+?)__`)
    reItalic     = regexp.MustCompile(`_([^_]+)_`)
    reStrike     = regexp.MustCompile(`~~(.+?)~~`)
    reListItem   = regexp.MustCompile(`^[-*]\s+`)
    reCodeBlock  = regexp.MustCompile("```[\\w]*\\n?([\\s\\S]*?)```")
    reInlineCode = regexp.MustCompile("`([^`]+)`")
)

func markdownToTelegramHTML(text string) string {
    text = reHeader.ReplaceAllString(text, "$1")
    text = reBlockquote.ReplaceAllString(text, "$1")
    // ...
}

Impact

  • Eliminates ~20 KB heap allocation per outbound message
  • Reduces GC pressure under high message throughput
  • Zero behavioral change — all regex patterns are identical
  • Measured context: Independent benchmarking (100-1000 msg bursts) showed the telegram-slim build achieves 1-3 MB RSS on Linux. This optimization reduces the per-message GC cost, further tightening the memory footprint.

Testing

  • Build succeeds: go build -tags "telegram pprof smallbuf" ./cmd/picoclaw
  • No functional change — identical regex patterns, same replacement logic
  • regexp.Regexp is documented as goroutine-safe

References

@alexhoshina
Copy link
Collaborator

Hey! We are currently refactoring the channel system, and we have opened a refactor branch. It might be better if you could target your pull request to the refactor branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants