The Rich Text Format (RTF) is a proprietary document file format developed by Microsoft from 1987 until 2008 for cross-platform document interchange with Microsoft products.
An RTF document starts and ends with curly braces {}
, which encapsulate the entire document's content. Inside the braces, various control words define text formatting, styles, and other document features in a plain-text format that can be interpreted by various word processors.
An RTF file consists of unformatted text, control words, control symbols, and groups.
{\rtf1\ansi\deff0
{\fonttbl{\f0 Times New Roman;}}
{\colortbl;\red255\green0\blue0;}
\s0 This is a normal paragraph.
\s1\ul This is an underlined paragraph.
\par This is another paragraph.
\s2\cf1 This text is colored.
}
In this example:
\rtf1\ansi\deff0
: Specifies RTF version, character set, and default font.\fonttbl
: Defines fonts used.\colortbl
: Defines colors.
A control word is a specially formatted command that RTF uses to mark printer control codes and information that applications use to manage documents, which follows the form:
\LetterSequence<Delimiter>
Note that a backslash begins each control word.
The LetterSequence is made up of lowercase alphabetic characters between "a" and "z" inclusive. RTF is case sensitive, and all RTF control words must be lowercase.
The delimiter marks the end of an RTF control word, and can be one of the following:
- A space. In this case, the space is part of the control word.
- A digit or a hyphen (-), which indicates that a numeric parameter follows. The subsequent digital sequence is then delimited by a space or any character other than a letter or a digit. If a numeric parameter immediately follows the control word, this parameter becomes part of the control word.
- The control word is then delimited by a space or a nonalphabetic or nonnumeric character in the same manner as any other control word.
- Any character other than a letter or a digit. In this case, the delimiting character terminates the control word but is not actually part of the control word.
- If a space delimits the control word, the space does not appear in the document. Any characters following the delimiter, including spaces, will appear in the document. For this reason, you should use spaces only where necessary; do not use spaces merely to break up RTF code.
-
Defines the end of a paragraph
This is a paragraph.\par This is another paragraph.
-
Applies a specific paragraph style (defined earlier in the document).
-
The number following
\s
corresponds to a style definition (usually set in a\stylesheet
section).\s0 This paragraph uses style 0. \s1 This paragraph uses style 1.
-
Underlines text.
\ul This text is underlined.
-
Sets the color of text using a reference to the color table (defined by
\colortbl
).\cf1 This text is red.
-
Makes the text bold.
\b This text is bold.
-
Italicizes the text.
\i This text is italicized.
-
Document Header:
- Starts with
{\rtf1\ansi
indicating it is an RTF 1.0 document with ANSI character set. - Example:
{\rtf1\ansi
- Starts with
-
Font Table (
\fonttbl
)**:- Defines the fonts used in the document.
- Example:
{\fonttbl{\f0 Times New Roman;}}
-
Color Table (
\colortbl
)**:- Defines the colors used in the document.
- Example:
{\colortbl;\red255\green0\blue0;}
-
Sectioning (
\sect
/\sectd
)**:- RTF also supports sections, and you can separate sections using the
\sect
control word. - Example:
\sect This is the beginning of a new section.
- RTF also supports sections, and you can separate sections using the
-
Style Sheet (
\stylesheet
)**:- Defines paragraph styles (
\s
) used throughout the document. - Example:
{\stylesheet{\s0 Normal;}{\s1 Heading;}}
- Defines paragraph styles (
{\rtf1\ansi\deff0
{\fonttbl{\f0 Times New Roman;}}
{\colortbl;\red0\green0\blue255;}
\s0 This is a paragraph with style 0.\par
\s1\b This is a bold paragraph with style 1.\par
\s1\i\cf1 This is an italic and colored paragraph with style 1.\par
\s2\ul This is an underlined paragraph with style 2.\par
}
This example shows how fonts, colors, paragraph styles, and formatting (like bold, italic, underline) are applied in RTF. The curly braces encapsulate sections of the document, and the backslashes introduce the control words.
\par
: Marks a paragraph break.\sN
: Applies paragraph style N.\b
: Bold text.\i
: Italic text.\ul
: Underlined text.\cfN
: Changes text color using a color table index.\fonttbl
: Defines available fonts.\colortbl
: Defines available colors.
I suppose to retain correct DOM hierarchy, ChatGPT output uses ###
for what is actually h1
in a Word file.
h2
: isChat History
I can’t find the h1
element in their DOM. Let me know if you do. 🙃