Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/gofuzz.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
fuzzer: [FuzzBytesAndString, FuzzRune, FuzzTruncateStringAndBytes]
fuzzer: [FuzzBytesAndString, FuzzRune, FuzzTruncateStringAndBytes, FuzzControlSequences]
steps:
- name: Check out code
uses: actions/checkout@v4
Expand Down
12 changes: 7 additions & 5 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,13 +18,15 @@ by running `go generate` from the top package directory.

## Pull Requests and branches

For PRs (pull requests), you can use the gh CLI tool to retrieve details,
or post comments. Then, compare the current branch with main. Reviewing a PR
and reviewing a branch are about the same, but the PR may add context.
For PRs (pull requests), you can use the gh CLI tool. Compare the current branch with main. Reviewing a PR and reviewing a branch are about the same, but the PR may add context.

Look for bugs. Think like GitHub Copilot or Cursor BugBot.
Understand the goals of the PR. Note any API changes, especially breaking changes.

Offer to post a brief summary of the review to the PR, via the gh CLI tool.
Look for thoroughness of tests, as well as GoDoc comments.

Retrieve and consider the comments on the PR, which may have come from GitHub Copilot or Cursor BugBot. Think like GitHub Copilot or Cursor BugBot.

Offer to optionally post a brief summary of the review to the PR, via the gh CLI tool.

## Comparisons to go-runewidth

Expand Down
46 changes: 25 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,35 +61,37 @@ func main() {

### Options

There is one option, `displaywidth.Options.EastAsianWidth`, which defines
how [East Asian Ambiguous characters](https://www.unicode.org/reports/tr11/#Ambiguous)
Create the options you need, and then use methods on the options struct.

```go
var myOptions = displaywidth.Options{
EastAsianWidth: true,
IgnoreControlSequences: true,
}

width := myOptions.String("Hello, 世界!")
```

#### IgnoreControlSequences

`IgnoreControlSequences` specifies whether to ignore ECMA-48 escape sequences
when calculating the display width. When `false` (default), ANSI escape
sequences are treated as just a series of characters. When `true`, they are
treated as a single zero-width unit.

#### EastAsianWidth

`EastAsianWidth` defines how
[East Asian Ambiguous characters](https://www.unicode.org/reports/tr11/#Ambiguous)
are treated.

When `false` (default), East Asian Ambiguous characters are treated as width 1.
When `true`, they are treated as width 2.

You may wish to configure this based on environment variables or locale.
`go-runewidth`, for example, does so
[during package initialization](https://github.com/mattn/go-runewidth/blob/master/runewidth.go#L26C1-L45C2).
[during package initialization](https://github.com/mattn/go-runewidth/blob/master/runewidth.go#L26C1-L45C2). `displaywidth` does not do this automatically, we prefer to leave it to you.

`displaywidth` does not do this automatically, we prefer to leave it to you.
You might do something like:

```go
var width displaywidth.Options // zero value is default

func init() {
if os.Getenv("EAST_ASIAN_WIDTH") == "true" {
width = displaywidth.Options{EastAsianWidth: true}
}
// or check locale, or any other logic you want
}

// use it in your logic
func myApp() {
fmt.Println(width.String("Hello, 世界!"))
}
```

## Technical standards and compatibility

Expand All @@ -101,6 +103,8 @@ and [regional indicator pairs](https://en.wikipedia.org/wiki/Regional_indicator_
for emojis. We are keeping an eye on
[emerging standards](https://www.jeffquast.com/post/state-of-terminal-emulation-2025/).

For control sequences, we implement the [ECMA-48](https://ecma-international.org/publications-and-standards/standards/ecma-48/) standard for 7-bit ASCII control sequences.

`clipperhouse/displaywidth`, `mattn/go-runewidth`, and `rivo/uniseg` will
give the same outputs for most real-world text. Extensive details are in the
[compatibility analysis](comparison/COMPATIBILITY_ANALYSIS.md).
Expand Down
2 changes: 1 addition & 1 deletion comparison/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ require (

require (
github.com/clipperhouse/stringish v0.1.1 // indirect
github.com/clipperhouse/uax29/v2 v2.5.0 // indirect
github.com/clipperhouse/uax29/v2 v2.6.0 // indirect
)

replace github.com/clipperhouse/displaywidth => ../
4 changes: 2 additions & 2 deletions comparison/go.sum
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
github.com/clipperhouse/stringish v0.1.1 h1:+NSqMOr3GR6k1FdRhhnXrLfztGzuG+VuFDfatpWHKCs=
github.com/clipperhouse/stringish v0.1.1/go.mod h1:v/WhFtE1q0ovMta2+m+UbpZ+2/HEXNWYXQgCt4hdOzA=
github.com/clipperhouse/uax29/v2 v2.5.0 h1:x7T0T4eTHDONxFJsL94uKNKPHrclyFI0lm7+w94cO8U=
github.com/clipperhouse/uax29/v2 v2.5.0/go.mod h1:Wn1g7MK6OoeDT0vL+Q0SQLDz/KpfsVRgg6W7ihQeh4g=
github.com/clipperhouse/uax29/v2 v2.6.0 h1:z0cDbUV+aPASdFb2/ndFnS9ts/WNXgTNNGFoKXuhpos=
github.com/clipperhouse/uax29/v2 v2.6.0/go.mod h1:Wn1g7MK6OoeDT0vL+Q0SQLDz/KpfsVRgg6W7ihQeh4g=
github.com/mattn/go-runewidth v0.0.19 h1:v++JhqYnZuu5jSKrk9RbgF5v4CGUjqRfBm05byFGLdw=
github.com/mattn/go-runewidth v0.0.19/go.mod h1:XBkDxAl56ILZc9knddidhrOlY5R/pDhgLpndooCuJAs=
github.com/rivo/uniseg v0.4.7 h1:WUdvkW8uEhrYfLC4ZzdpI2ztxP1I582+49Oc5Mq64VQ=
Expand Down
146 changes: 142 additions & 4 deletions fuzz_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -99,8 +99,10 @@ func FuzzBytesAndString(f *testing.F) {

// Test with different options combinations
options := []Options{
{EastAsianWidth: false}, // default
{EastAsianWidth: false},
{EastAsianWidth: true},
{IgnoreControlSequences: true},
{EastAsianWidth: true, IgnoreControlSequences: true},
}

for _, option := range options {
Expand Down Expand Up @@ -188,10 +190,13 @@ func FuzzRune(f *testing.F) {
}
}

// Test with different options
// Test with different options (Rune is per-rune, IgnoreControlSequences
// doesn't affect single runes, but we include it for completeness)
options := []Options{
{EastAsianWidth: false}, // default
{EastAsianWidth: false},
{EastAsianWidth: true},
{IgnoreControlSequences: true},
{EastAsianWidth: true, IgnoreControlSequences: true},
}

for _, option := range options {
Expand Down Expand Up @@ -308,8 +313,10 @@ func FuzzTruncateStringAndBytes(f *testing.F) {

// Test with different options
options := []Options{
{EastAsianWidth: false}, // default
{EastAsianWidth: false},
{EastAsianWidth: true},
{IgnoreControlSequences: true},
{EastAsianWidth: true, IgnoreControlSequences: true},
}

for _, option := range options {
Expand All @@ -327,3 +334,134 @@ func FuzzTruncateStringAndBytes(f *testing.F) {
}
})
}

// FuzzControlSequences fuzzes strings containing ANSI/ECMA-48 escape sequences
// across all option combinations (EastAsianWidth x IgnoreControlSequences).
func FuzzControlSequences(f *testing.F) {
if testing.Short() {
f.Skip("skipping fuzz test in short mode")
}

// Seed with ANSI escape sequences
f.Add([]byte("\x1b[31m")) // SGR red
f.Add([]byte("\x1b[0m")) // SGR reset
f.Add([]byte("\x1b[1m")) // SGR bold
f.Add([]byte("\x1b[38;5;196m")) // SGR 256-color
f.Add([]byte("\x1b[38;2;255;0;0m")) // SGR truecolor
f.Add([]byte("\x1b[A")) // cursor up
f.Add([]byte("\x1b[10;20H")) // cursor position
f.Add([]byte("\x1b[2J")) // erase in display
f.Add([]byte("\x1b[31mhello\x1b[0m")) // red text
f.Add([]byte("\x1b[1m\x1b[31mhi\x1b[0m")) // nested SGR
f.Add([]byte("hello\x1b[31mworld\x1b[0m")) // ANSI mid-string
f.Add([]byte("\x1b[31m中文\x1b[0m")) // colored CJK
f.Add([]byte("\x1b[31m😀\x1b[0m")) // colored emoji
f.Add([]byte("\x1b[31m🇺🇸\x1b[0m")) // colored flag
f.Add([]byte("a\x1b[31mb\x1b[32mc\x1b[33md\x1b[0m")) // multiple colors
f.Add([]byte("\x1b[31m\x1b[42m\x1b[1mbold on red\x1b[0m")) // stacked SGR
f.Add([]byte("\r\n")) // CR+LF
f.Add([]byte("hello\r\nworld")) // text with CRLF
f.Add([]byte("\x1b")) // bare ESC
f.Add([]byte("\x1b[")) // incomplete sequence
f.Add([]byte("\x1b[31")) // incomplete SGR
f.Add([]byte("")) // empty
f.Add([]byte("hello")) // plain ASCII
f.Add([]byte("中文")) // plain CJK
f.Add([]byte("😀")) // plain emoji

// Seed with multi-lingual text
file, err := testdata.Sample()
if err != nil {
f.Fatal(err)
}
chunks := bytes.Split(file, []byte("\n"))
for _, chunk := range chunks {
f.Add(chunk)
}

allOptions := []Options{
{},
{EastAsianWidth: true},
{IgnoreControlSequences: true},
{EastAsianWidth: true, IgnoreControlSequences: true},
}

f.Fuzz(func(t *testing.T, text []byte) {
for _, opt := range allOptions {
wb := opt.Bytes(text)
ws := opt.String(string(text))

// Invariant: width is never negative
if wb < 0 {
t.Errorf("Bytes() with %+v returned negative width %d for %q", opt, wb, text)
}

// Invariant: String and Bytes agree
if wb != ws {
t.Errorf("Bytes()=%d != String()=%d with %+v for %q", wb, ws, opt, text)
}

// Invariant: empty input is always 0
if len(text) == 0 && wb != 0 {
t.Errorf("non-zero width %d for empty input with %+v", wb, opt)
}

// Invariant: sum of grapheme widths equals total width
gIter := opt.BytesGraphemes(text)
gSum := 0
for gIter.Next() {
gw := gIter.Width()
if gw < 0 {
t.Errorf("grapheme Width() < 0 with %+v for %q", opt, text)
}
gSum += gw
}
if gSum != wb {
t.Errorf("sum of grapheme widths %d != Bytes() %d with %+v for %q", gSum, wb, opt, text)
}

// Same for StringGraphemes
sgIter := opt.StringGraphemes(string(text))
sgSum := 0
for sgIter.Next() {
sgSum += sgIter.Width()
}
if sgSum != ws {
t.Errorf("sum of StringGraphemes widths %d != String() %d with %+v for %q", sgSum, ws, opt, text)
}

// Invariant: IgnoreControlSequences width <= default width
// (escape sequences become 0 instead of their visible char widths)
if opt.IgnoreControlSequences {
noIgnore := Options{EastAsianWidth: opt.EastAsianWidth}
wDefault := noIgnore.Bytes(text)
if wb > wDefault {
t.Errorf("IgnoreControlSequences width %d > default width %d with %+v for %q", wb, wDefault, opt, text)
}
}

// Invariant: truncation respects maxWidth (accounting for the tail,
// which is always appended and may itself exceed maxWidth)
tail := "..."
tailWidth := opt.String(tail)
for _, maxWidth := range []int{0, 1, 3, 5, 10, 20} {
ts := opt.TruncateString(string(text), maxWidth, tail)
tsWidth := opt.String(ts)
limit := maxWidth
if tailWidth > limit {
limit = tailWidth
}
if tsWidth > limit {
t.Errorf("TruncateString() width %d > max(maxWidth, tailWidth) %d with %+v for %q -> %q",
tsWidth, limit, opt, text, ts)
}

tb := opt.TruncateBytes(text, maxWidth, []byte(tail))
if !bytes.Equal(tb, []byte(ts)) {
t.Errorf("TruncateBytes() != TruncateString() with %+v for %q: %q != %q",
opt, text, tb, ts)
}
}
}
})
}
2 changes: 1 addition & 1 deletion go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@ module github.com/clipperhouse/displaywidth

go 1.18

require github.com/clipperhouse/uax29/v2 v2.5.0
require github.com/clipperhouse/uax29/v2 v2.6.0

require github.com/clipperhouse/stringish v0.1.1
4 changes: 2 additions & 2 deletions go.sum
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
github.com/clipperhouse/stringish v0.1.1 h1:+NSqMOr3GR6k1FdRhhnXrLfztGzuG+VuFDfatpWHKCs=
github.com/clipperhouse/stringish v0.1.1/go.mod h1:v/WhFtE1q0ovMta2+m+UbpZ+2/HEXNWYXQgCt4hdOzA=
github.com/clipperhouse/uax29/v2 v2.5.0 h1:x7T0T4eTHDONxFJsL94uKNKPHrclyFI0lm7+w94cO8U=
github.com/clipperhouse/uax29/v2 v2.5.0/go.mod h1:Wn1g7MK6OoeDT0vL+Q0SQLDz/KpfsVRgg6W7ihQeh4g=
github.com/clipperhouse/uax29/v2 v2.6.0 h1:z0cDbUV+aPASdFb2/ndFnS9ts/WNXgTNNGFoKXuhpos=
github.com/clipperhouse/uax29/v2 v2.6.0/go.mod h1:Wn1g7MK6OoeDT0vL+Q0SQLDz/KpfsVRgg6W7ihQeh4g=
16 changes: 8 additions & 8 deletions graphemes.go
Original file line number Diff line number Diff line change
Expand Up @@ -44,10 +44,10 @@ func StringGraphemes(s string) Graphemes[string] {
// Iterate using the Next method, and get the width of the current grapheme
// using the Width method.
func (options Options) StringGraphemes(s string) Graphemes[string] {
return Graphemes[string]{
iter: graphemes.FromString(s),
options: options,
}
g := graphemes.FromString(s)
g.AnsiEscapeSequences = options.IgnoreControlSequences

return Graphemes[string]{iter: g, options: options}
}

// BytesGraphemes returns an iterator over grapheme clusters for the given
Expand All @@ -65,8 +65,8 @@ func BytesGraphemes(s []byte) Graphemes[[]byte] {
// Iterate using the Next method, and get the width of the current grapheme
// using the Width method.
func (options Options) BytesGraphemes(s []byte) Graphemes[[]byte] {
return Graphemes[[]byte]{
iter: graphemes.FromBytes(s),
options: options,
}
g := graphemes.FromBytes(s)
g.AnsiEscapeSequences = options.IgnoreControlSequences

return Graphemes[[]byte]{iter: g, options: options}
}
Loading