Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
78 commits
Select commit Hold shift + click to select a range
6ad6ee3
init segment
ultmaster Jul 14, 2025
c602a27
add tests
ultmaster Jul 14, 2025
6e855ba
.
ultmaster Jul 14, 2025
9d3484b
.
ultmaster Jul 14, 2025
4575d3a
.
ultmaster Jul 15, 2025
d201013
.
ultmaster Jul 15, 2025
66e7a2c
update to ast implementation
ultmaster Jul 15, 2025
dffbc80
.
ultmaster Jul 15, 2025
5fe7dd3
Update poml_extended.md
ultmaster Jul 15, 2025
4a970a5
Add cst
ultmaster Jul 15, 2025
f77b2e0
Add lexer
ultmaster Jul 16, 2025
106b1a5
.
ultmaster Jul 16, 2025
9409e75
pass all tests
ultmaster Jul 16, 2025
3c3b226
.
ultmaster Jul 16, 2025
321a342
.
ultmaster Jul 16, 2025
35e478f
add more tests
ultmaster Jul 17, 2025
c4a9466
update poml extended proposal
ultmaster Jul 21, 2025
756f9f8
unverified cst impl
ultmaster Jul 24, 2025
5eeb2a5
merge commit
ultmaster Aug 26, 2025
f57785d
move to next
ultmaster Aug 26, 2025
d8b272e
error and source
ultmaster Aug 26, 2025
3726698
.
ultmaster Aug 26, 2025
c092ed7
add ast
ultmaster Aug 26, 2025
82e028d
fix cst
ultmaster Aug 26, 2025
69ee9f4
Merge branch 'main' of https://github.com/microsoft/poml into reader/…
ultmaster Aug 29, 2025
297a50c
update nodes
ultmaster Aug 29, 2025
af9c69c
.
ultmaster Aug 30, 2025
97e8ba2
.
ultmaster Aug 30, 2025
ddf120c
.
ultmaster Aug 30, 2025
7e2d3f0
.
ultmaster Aug 30, 2025
b070c72
.
ultmaster Aug 30, 2025
f03315d
.
ultmaster Aug 30, 2025
675ae89
.
ultmaster Aug 30, 2025
f02c735
.
ultmaster Aug 30, 2025
9ff1f76
finish cst nodes
ultmaster Aug 31, 2025
563715b
update lexer
ultmaster Aug 31, 2025
bdc91b7
.
ultmaster Aug 31, 2025
41866ab
.
ultmaster Aug 31, 2025
e929b5f
update lexer tests
ultmaster Sep 1, 2025
b4fde29
.
ultmaster Sep 1, 2025
393ce43
.
ultmaster Sep 1, 2025
377d0c1
minor fix
ultmaster Sep 1, 2025
c938ed3
.
ultmaster Sep 1, 2025
300bffa
.
ultmaster Sep 1, 2025
4b54185
.
ultmaster Sep 1, 2025
038b683
.
ultmaster Sep 1, 2025
22827c9
.
ultmaster Sep 1, 2025
0beb690
cst update
ultmaster Sep 2, 2025
bb7ca37
update nodes
ultmaster Sep 2, 2025
5aa67f7
.
ultmaster Sep 2, 2025
0a7b888
review to element
ultmaster Sep 4, 2025
1ae4819
.
ultmaster Sep 4, 2025
182ad14
fix lint issues
ultmaster Sep 4, 2025
8013a52
.
ultmaster Sep 5, 2025
e6a8427
add test
ultmaster Sep 5, 2025
feeb689
token rules
ultmaster Sep 5, 2025
5fe27c0
update nodes
ultmaster Sep 5, 2025
11f58fd
pass self check
ultmaster Sep 5, 2025
d352886
fix
ultmaster Sep 5, 2025
292dfb5
fix peek
ultmaster Sep 5, 2025
82b0441
fix cst
ultmaster Sep 5, 2025
0bd904a
.
ultmaster Sep 5, 2025
410c769
.
ultmaster Sep 5, 2025
9de1b54
.
ultmaster Sep 5, 2025
80145d2
.
ultmaster Sep 5, 2025
586897f
.
ultmaster Sep 5, 2025
1f4acff
cst test names and locations
ultmaster Sep 8, 2025
d31297c
cst more tests
ultmaster Sep 8, 2025
0b8c3af
lookahead
ultmaster Sep 9, 2025
b675aca
ast instruction
ultmaster Sep 9, 2025
ae6c12a
ast init
ultmaster Sep 9, 2025
eca09b4
.
ultmaster Sep 9, 2025
c5ae41f
.
ultmaster Sep 9, 2025
03cca4a
ast review
ultmaster Sep 9, 2025
49d3542
ast tests
ultmaster Sep 9, 2025
3412504
bug fixes
ultmaster Sep 9, 2025
c500b0d
bug fixes
ultmaster Sep 9, 2025
6d8c8b2
.
ultmaster Sep 9, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions .claude/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
{
"permissions": {
"allow": [
"Bash(npm run lint)",
"Bash(npm run test*)",
"Read(~/.zshrc)"
]
}
}
132 changes: 84 additions & 48 deletions docs/deep-dive/proposals/poml_extended.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ The current POML implementation requires files to be fully enclosed within `<pom
1. **Backward Compatibility**: Most of existing POML files should continue to work without changes
2. **Flexibility**: Support pure text files with embedded POML elements
3. **Seamless Integration**: Allow switching between text and POML modes within a single file
<!-- 4. **Component Discovery**: Automatically detect POML elements from `componentDocs.json` -->
4. **Controlled Evolution of Tags**: behaviour of new/experimental tags is opt‑in via `<meta enable="...">`, preventing accidental breakage when upgrading the tool‑chain.

## File Format Specification

Expand All @@ -38,6 +38,7 @@ The system will assume the whole file is a pure text file and detects certain pa
1. Loading component definitions from `componentDocs.json` and extracting valid POML component names and their aliases.
2. Scanning for opening tags that match these components, and scanning until the corresponding closing tag is found.
3. If a special tag `<text>...</text>` is found within a POML segment, it will be treated as pure text content and processed following the rules above (step 1 and 2).
4. Unknown or disabled tags are treated as literal text and, by default, raise a diagnostic warning.

An example is shown below:

Expand Down Expand Up @@ -104,81 +105,124 @@ There can be some intervening text here as well.

Metadatas are information that is useful when parsing and rendering the file, such as context variables, stylesheets, version information, file paths, etc.
File-level metadata can be included at any place of the file in a special `<meta>` tag. This metadata will be processed before any content parsing.
By default, metadata has no child contents. When child contents exist, `<meta>` tag must have type to specify what kind of content is provided.

**Example:**

```xml
<meta minimalPomlVersion="0.3" />
<meta stylesheet="/path/to/stylesheet.json" />
<meta components="+reference,-table" unknownComponents="warning" />
<meta context="/path/to/contextFile.json" />
<meta type="context"></meta>
{ "foo": "bar" }
</meta>
```

## Architecture Design

### High-level Processing Pipeline

The core of the new architecture is a three-pass process: Segmentation, Metadata Extraction, and Recursive Rendering.
The core of the new architecture is a three-pass process: Tokenization and AST Parsing, and Recursive Rendering.

#### I. Segmentation Pass
#### I. Tokenization and AST Parsing

This initial pass is a crucial preprocessing step that scans the raw file content and partitions it into a hierarchical tree of segments. It does **not** parse the full XML structure of POML blocks; it only identifies their boundaries.
This phase processes the raw file content into an Abstract Syntax Tree (AST). It leverages the provided ExtendedPomlLexer.

- **Objective**: To classify every part of the file as `META`, `POML`, or `TEXT` and build a nested structure.
- **Algorithm**:
1. Load all valid POML component tag names (including aliases) from `componentDocs.json`. This set of tags will be used for detection.
2. Initialize the root of the segment tree as a single, top-level `TEXT` segment spanning the entire file, unless the root segment is a single `<poml>...</poml>` block spanning the whole file (in which case it will be treated as a `POML` segment).
3. Use a stack-based algorithm to scan the text.
- When an opening tag (e.g., `<task>`) that matches a known POML component is found, push its name and start position onto the stack. This marks the beginning of a potential `POML` segment.
- When a closing tag (e.g., `</task>`) is found that matches the tag at the top of the stack, pop the stack. This marks a complete `POML` segment. This new segment is added as a child to the current parent segment in the tree.
- The special `<text>` tag is handled recursively. If a `<text>` tag is found _inside_ a `POML` segment, the scanner will treat its content as a nested `TEXT` segment. This `TEXT` segment can, in turn, contain more `POML` children.
- Any content not enclosed within identified `POML` tags remains part of its parent `TEXT` segment.
4. `<meta>` tags are treated specially. They are identified and parsed into `META` segments at any level but are logically hoisted and processed first. They should not have children.
- **Tokenization**: The ExtendedPomlLexer (using chevrotain) scans the entire input string and breaks it into a flat stream of tokens (TagOpen, Identifier, TextContent, TemplateOpen, etc.). This single lexing pass is sufficient for the entire mixed-content file. The distinction between "text" and "POML" is not made at this stage; it's simply a stream of tokens.
- **AST Parsing Algorithm**: A CST (Concrete Syntax Tree) or AST parser will consume the token stream from the lexer. The parser is stateful, using a `PomlContext` object to track parsing configurations.
1. The parser starts in "text mode". It consumes TextContent, TemplateOpen/TemplateClose, and other non-tag tokens, bundling them into TEXT or TEMPLATE nodes.
2. When a TagOpen (`<`) token is followed by the Identifier "meta", a META node is created. Its attributes are immediately parsed to populate the `PomlContext`. This allows metadata to control the parsing of the remainder of the file (e.g., by enabling new tags). The META node is added to the AST but will be ignored during rendering.
3. When a TagOpen (`<`) token is followed by an Identifier that matches a known POML component (from componentDocs.json and enabled via PomlContext), the parser switches to "POML mode" and creates a POML node.
4. In "POML mode," it parses attributes (Identifier, Equals, DoubleQuote/SingleQuote), nested tags, and content until it finds a matching TagClosingOpen (`<`) token. Template variables `{{}}` within attribute values or content are parsed into child TEMPLATE nodes.
5. If the tag is `<text>`, it creates a POML node for `<text>` itself, but its _children_ are parsed by recursively applying the "text mode" logic (step 1), allowing for nested POML within `<text>`.
6. If a TagOpen is followed by an Identifier that is _not_ a known POML component, the parser treats these tokens (`<`, tagname, `>`) as literal text and reverts to "text mode".
7. The parser closes the current POML node when the corresponding TagClosingOpen (`<`) and Identifier are found. After closing the top-level POML tag, it reverts to "text mode".

- **Output**: A `Segment` tree. For backward compatibility, if the root segment is a single `<poml>...</poml>` block spanning the whole file, the system can revert to the original, simpler parsing model.
- **Error Tolerance**: The parser will be designed to be error-tolerant. If a closing tag is missing, it can infer closure at the end of the file or when a new top-level tag begins, logging a diagnostic warning.

**`Segment` Interface**: The `children` property is key to representing the nested structure of mixed-content files.
- **Source Mapping**: The chevrotain tokens inherently contain offset, line, and column information. This data is directly transferred to the ASTNode during parsing, enabling robust code intelligence features.

- **Output**: An AST representing the hierarchical structure of the document, where each node contains source position information and type metadata.

**`ASTNode` Interface**: The AST nodes represent the parsed structure with source mapping.

```typescript
interface Segment {
id: string; // Unique ID for caching and React keys
kind: 'META' | 'TEXT' | 'POML';
interface SourceRange {
start: number;
end: number;
content: string; // The raw string content of the segment
parent?: Segment; // Reference to the parent segment
children: Segment[]; // Nested segments (e.g., a POML block within text)
tagName?: string; // For POML segments, the name of the root tag (e.g., 'task')
}
```

#### II. Metadata Processing

Once the segment tree is built, all `META` segments are processed.
interface AttributeInfo {
key: string;
value: (ASTNode & { kind: 'TEXT' | 'TEMPLATE' })[]; // Mixed content: array of text/template nodes
keyRange: SourceRange; // Position of attribute name
valueRange: SourceRange; // Position of attribute value (excluding quotes)
fullRange: SourceRange; // Full attribute including key="value"
}

- **Extraction**: Traverse the tree to find all `META` segments.
- **Population**: Parse the content of each `<meta>` tag and populate the global `PomlContext` object.
- **Removal**: After processing, `META` segments are removed from the tree to prevent them from being rendered.
interface ASTNode {
id: string; // Unique ID for caching and React keys
kind: 'META' | 'TEXT' | 'POML' | 'TEMPLATE';
start: number; // Source position start of entire node
end: number; // Source position end of entire node
content: string; // The raw string content
parent?: ASTNode; // Reference to the parent node
children: ASTNode[]; // Child nodes

// For POML and META nodes
tagName?: string; // Tag name (e.g., 'task', 'meta')
attributes?: AttributeInfo[]; // Detailed attribute information

// Detailed source positions
openingTag?: {
start: number; // Position of '<'
end: number; // Position after '>'
nameRange: SourceRange; // Position of tag name
};

closingTag?: {
start: number; // Position of '</'
end: number; // Position after '>'
nameRange: SourceRange; // Position of tag name in closing tag
};

contentRange?: SourceRange; // Position of content between tags (excluding nested tags)

// For TEXT nodes
textSegments?: SourceRange[]; // Multiple ranges for text content (excluding nested POML)

// For TEMPLATE nodes
expression?: string; // The full expression content between {{}}
}
```

**`PomlContext` Interface**: This context object is the single source of truth for the entire file, passed through all readers. It's mutable, allowing stateful operations like `<let>` to have a file-wide effect.

```typescript
interface PomlContext {
variables: { [key: string]: any }; // For {{ substitutions }} and <let> (Read/Write)
texts: { [key: string]: React.ReactElement }; // Maps TEXT_ID to content for <text> replacement (Read/Write)
stylesheet: { [key: string]: string }; // Merged styles from all <meta> tags (Read-Only during render)
minimalPomlVersion?: string; // From <meta> (Read-Only)
sourcePath: string; // File path for resolving includes (Read-Only)
}
```

#### III. Text/POML Dispatching (Recursive Rendering)
#### II. Text/POML Dispatching (Recursive Rendering)

Rendering starts at the root of the segment tree and proceeds recursively. A controller dispatches segments to the appropriate reader.
Rendering starts at the root of the AST and proceeds recursively. A controller dispatches AST nodes to the appropriate reader.

- **`PureTextReader`**: Handles `TEXT` segments.
- **`PureTextReader`**: Handles `TEXT` nodes.
- Currently we directly render the pure-text contents as a single React element. In future, we can:
- Renders the text content, potentially using a Markdown processor.
- Performs variable substitutions (`{{...}}`) using the `variables` from `PomlContext`. The logic from `handleText` in the original `PomlFile` should be extracted into a shared utility for this.
- Iterates through its `children` segments. For each child `POML` segment, it calls the `PomlReader`.
- Iterates through its `children` nodes. For each child `POML` node, it calls the `PomlReader`.

- **`PomlReader`**: Handles `POML` segments.
- **Pre-processing**: Before parsing, it replaces any direct child `<text>` regions with a self-closing placeholder tag containing a unique ID: `<text ref="TEXT_ID_123" />`. The original content of the `<text>` segment is stored in `context.texts`. This ensures the XML parser inside `PomlFile` doesn't fail on non-XML content (like Markdown).
- **Delegation**: Instantiates a modified `PomlFile` class with the processed segment content and the shared `PomlContext`.
- **Rendering**: Calls the `pomlFile.react(context)` method to render the segment.
- **`PomlReader`**: Handles `POML` nodes.
- **Delegation**: Instantiates a modified `PomlFile` class with the processed node content and the shared `PomlContext`.
- **Rendering**: Calls the `pomlFile.react(context)` method to render the node.

- **`IntelliSense Layer`**: The segment tree makes it easy to provide context-aware IntelliSense. By checking the `kind` of the segment at the cursor's offset, the request can be routed to the correct provider—either the `PomlReader`'s XML-aware completion logic or a simpler text/variable completion provider for `TEXT` segments.
- **`IntelliSense Layer`**: The AST makes it easy to provide context-aware IntelliSense. By checking the `kind` of the node at the cursor's offset, the request can be routed to the correct provider—either the `PomlReader`'s XML-aware completion logic or a simpler text/variable completion provider for `TEXT` nodes.

**`Reader` Interface**: This interface defines the contract for both `PureTextReader` and `PomlReader`.

Expand Down Expand Up @@ -208,11 +252,3 @@ To achieve this design, the existing `PomlFile` class needs significant refactor
3. **Handling `<include>`**:

- The `handleInclude` method should be **removed** from `PomlFile`. Inclusion is now handled at a higher level by the main processing pipeline. When the `PomlReader` encounters an `<include>` tag, it will invoke the entire pipeline (Segmentation, Metadata, Rendering) on the included file and insert the resulting React elements.

4. **Parsing `TEXT` Placeholders**:

- The core `parseXmlElement` method needs a new branch to handle the `<text ref="..." />` placeholder.
- When it encounters this element:
1. It extracts the `ref` attribute (e.g., `"TEXT_ID_123"`).
2. It looks up the corresponding raw text from `context.texts`.
3. It fetches from the `context.texts` map and returns a React element containing the pure text content.
10 changes: 9 additions & 1 deletion package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -408,6 +408,7 @@
"@rollup/plugin-json": "^6.1.0",
"@stylistic/eslint-plugin": "^5.2.3",
"@types/d3-dsv": "~2.0.0",
"@types/he": "^1.2.3",
"@types/jquery": "^3.5.32",
"@types/js-yaml": "^4.0.9",
"@types/lodash.throttle": "^4.1.9",
Expand Down Expand Up @@ -460,6 +461,7 @@
"cheerio": "^1.0.0",
"closest-match": "^1.3.3",
"d3-dsv": "~2.0.0",
"he": "^1.2.0",
"jquery": "^3.7.1",
"js-tiktoken": "^1.0.20",
"js-yaml": "^4.1.0",
Expand Down
11 changes: 10 additions & 1 deletion packages/poml/base.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -137,7 +137,10 @@ export interface PropsBase {

// Experimental
writerOptions?: object;
whiteSpace?: 'pre' | 'filter' | 'trim';
whiteSpace?: 'pre' | 'filter' | 'trim' | 'collapse';

// Enforce inline on every element.
inline?: boolean;

/** Soft character limit before truncation is applied. */
charLimit?: number;
Expand Down Expand Up @@ -996,3 +999,9 @@ export function findComponentByAliasOrUndefined(alias: string, disabled?: Set<st
export function listComponents() {
return ComponentRegistry.instance.listComponents();
}

export function listComponentAliases() {
return listComponents()
.map((c) => c.getAliases())
.flat();
}
Loading