Skip to content

Text slicing/indexing silently operates on a whitespace-stripped copy of the string, not the original #4864

Description

@SupernovaIa

Description of bug / unexpected behavior

Text.__init__ internally strips spaces and newlines from self.text after building the mobject:

# manim/mobject/text/text_mobject.py, Text.__init__
self.text = text
if self.disable_ligatures:
    self.submobjects = [*self._gen_chars()]
self.chars = self.get_group_class()(*self.submobjects)
self.text = text_without_tabs.replace(" ", "").replace("\n", "")

By default (disable_ligatures=False, the default), self.submobjects is whatever the SVG/Pango rendering produced — one path per visible glyph, with no entries for space characters (there's nothing to draw). Text doesn't override __getitem__, so slicing (text[a:b]) falls back to the generic Mobject.__getitem__, which indexes directly into self.submobjects.

The net effect: for any string containing spaces, text[a:b] does not correspond to original_string[a:b]. It corresponds to text.text[a:b] — the internal, space-stripped copy — which is not documented or discoverable without reading the source.

This is easy to hit whenever someone tries to color/transform a specific substring by computing indices against the string they passed in (a very natural thing to do, and the pattern shown in several tutorials for coloring/animating sub-ranges of a Text mobject).

t2c/t2s/t2w/t2f/t2g are unaffected since they match by substring search rather than raw index, so this only bites manual index-based slicing.

Expected behavior

Either:

  • text[a:b] should be documented clearly as operating on the whitespace-stripped internal string, with a pointer to t2c/similar substring-based APIs as the recommended way to style/animate a specific piece of text that may span or sit near whitespace, or
  • indexing should be made to match the original input string by default (e.g. always going through something like the existing _gen_chars() placeholder-insertion logic, which is currently gated behind disable_ligatures=True).

I'd lean towards the documentation fix as the safer option given Text's indexing behavior is presumably relied upon elsewhere, but wanted to raise it either way — happy to send a docs PR if that is the preferred fix.

How to reproduce the issue

Code for reproducing the problem
from manim import Text

s = "Tokenization is unbelievable"
t = Text(s)

print(repr(t.original_text))   # 'Tokenization is unbelievable'
print(repr(t.text))            # 'Tokenizationisunbelievable'  <- spaces gone
print(len(s), len(t.submobjects))  # 28 26

print(s[13:15])        # 'is'  <- what a caller would expect from t[13:15]
print(t.text[13:15])   # 'su'  <- what t[13:15] actually corresponds to

System specifications

System Details
  • OS: macOS 26.5.1
  • Python version: 3.12.13
  • Manim version: 0.20.1

Additional comments

Found this while building an educational animation that colors/moves specific substrings of a sentence by index — worth a docs callout at minimum since the discrepancy is silent (no error, no warning, just a wrong-looking substring).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions