Comparing changes

) - Add --stdin flag to parse_args for reading from stdin - Add convert_stdin() function to handle stdin parsing - Update main() to call convert_stdin() when --stdin flag is used - Add comprehensive tests for stdin functionality and CLI behavior Signed-off-by: Matěj Cepl <mcepl@cepl.eu>

* Initial plan * 📚 DOCS: Add AGENTS.md and copilot-setup-steps.yml workflow Co-authored-by: chrisjsewell <2997570+chrisjsewell@users.noreply.github.com> * 👌 IMPROVE: Add tox-uv note to AGENTS.md Co-authored-by: chrisjsewell <2997570+chrisjsewell@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: chrisjsewell <2997570+chrisjsewell@users.noreply.github.com>

Changes `state_inline.Scanner` into a `typing.NamedTuple` instead of a `collections.namedtuple`. They should be functionally equivalent except that the former has typing for its members while the latter does not.

Optimize adjacent-token joining in both inline cleanup stages by replacing repeated pairwise string concatenation with a single `"".join(...)` over each contiguous run. ## Details - `fragments_join` merges adjacent `text` tokens left behind after emphasis/strikethrough post-processing and recalculates token levels - `text_join` converts `text_special` tokens to `text` and performs the final adjacent-text merge in the inline token stream Both rules previously rebuilt growing strings incrementally, which can become quadratic for long runs. ## Why Tested on an adversarial ~190 KB document with ~30k intraword underscores on a single line. With `tracemalloc` running: | | render time | peak Python alloc | |-----------|-------------|-------------------| | before | 2.2s | 4476 MB | | after | 0.6s | 23 MB | It's not just a contrived attack input - this kind of thing also shows up naturally in Markdown produced by OCR pipelines, where tables of identifiers / references can easily contain very long runs of underscores or other delimiter characters. ## Tests Added focused tests for both rules: - `fragments_join`: verifies raw adjacent text fragments remain when both join stages are disabled, and that `fragments_join` alone collapses them when `text_join` is disabled - `text_join`: verifies escaped characters remain as multiple `text_special` tokens when `text_join` is disabled, and are converted and merged into a single `text` token when enabled ## Result No behavioral change in parser output, with less unnecessary work when joining long runs of adjacent tokens. --------- Co-authored-by: Chris Sewell <chrisj_sewell@hotmail.com>

The inline `text` rule used a hardcoded, unexpandable set of terminator characters, forcing plugins that need to trigger on non-terminator characters (e.g. `w` for GFM `www.` autolinks) to resort to core-rule post-processing workarounds. ## Changes - **`parser_inline.py`**: Moves the terminator set onto `ParserInline` as `_terminator_chars` (a `set[str]` seeded from `_DEFAULT_TERMINATORS`) with a pre-compiled `terminator_re: re.Pattern[str]` attribute. Exposes `add_terminator_char(ch)` to extend the set; the regex is rebuilt eagerly only when a genuinely new character is added, keeping zero per-call overhead in the hot path. - **`rules_inline/text.py`**: Drops the module-level `_TerminatorChars` set and `@functools.cache`-decorated factory. The `text` rule now reads `state.md.inline.terminator_re` directly. - **`docs/contributing.md`**: Updates the "Why is my inline rule not executed?" FAQ to document the new API. ## Usage ```python def gfm_autolink_plugin(md: MarkdownIt) -> None: md.inline.add_terminator_char("w") md.inline.ruler.push("gfm_autolink_www", _www_rule) ``` Fully backward-compatible — the default terminator set is unchanged. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: chrisjsewell <2997570+chrisjsewell@users.noreply.github.com> Co-authored-by: Chris Sewell <chrisj_sewell@hotmail.com>

…rikethrough (#388) ### Summary Adds a new `gfm-like2` preset that extends `gfm-like` with three GFM features: - **Task lists** — `- [x] done` / `- [ ] todo` checkbox syntax in list items - **Alerts** — `> [!NOTE]`, `> [!TIP]`, `> [!WARNING]`, etc. inside blockquotes - **Single-tilde strikethrough** — `~text~` in addition to `~~text~~` These are enabled via the `gfm-like2` preset or individually through the `tasklists`, `alerts`, and `strikethrough_single_tilde` options. The existing `gfm-like` preset is unchanged, so as to remain back-compatible. ### Why in markdown-it-py, not mdit-py-plugins? Task lists and alerts are implemented by integrating detection directly into the existing block-level parsers (list.py and blockquote.py), rather than as post-processing rules: - Checkbox detection happens during list item parsing, before the sub-parser runs on the item content - Alert detection happens during blockquote parsing, before the inner content is tokenized This design is not achievable from a plugin: plugins can only add new rules or post-process the token stream — they cannot modify the internals of `list_block()` or `blockquote()` to inject detection at the right point in the parsing pipeline. Implementing these as post-processing core rules would work functionally, but it means re-walking and mutating the token stream after the fact, which is less clean and less consistent with how the block parsers are designed to work. Single-tilde strikethrough similarly extends the existing strikethrough rule's matching logic (opener/closer width matching), which is more naturally done inside the rule than bolted on externally. ### Changes - **`rules_block/list.py`** — Detect `[ ]`/`[x]`/`[X]` at content start during list item parsing; set `token.meta["checked"]`; advance `bMarks` past the checkbox; add CSS classes (`task-list-item`, `contains-task-list`) after the list loop - **`rules_block/blockquote.py`** — Detect `[!TYPE]` on the first content line; emit `alert_open`/`alert_close` + title tokens instead of `blockquote_open`/`blockquote_close`; skip the marker line during tokenization - **`rules_inline/strikethrough.py`** — When `strikethrough_single_tilde` is enabled, accept 1 or 2 tildes (reject 3+); enforce opener/closer width matching in `_postProcess` - **renderer.py** — Add `list_item_open` render method that injects checkbox HTML when `meta["checked"]` is present - **`presets/__init__.py`** — Add `gfm_like2` preset class - **`main.py`** — Register `gfm-like2` in `_PRESETS` - **utils.py** — Add `tasklists`, `alerts`, `strikethrough_single_tilde` keys to `OptionsType` - **pyproject.toml** — Add `pytest-timeout` to test deps; set 10s default timeout - **Test fixtures** — 11 tasklist cases, 15 alert cases, 13 single-tilde strikethrough cases ### Usage ```python from markdown_it import MarkdownIt md = MarkdownIt("gfm-like2") md.render("- [x] done\n- [ ] todo") md.render("> [!NOTE]\n> This is a note.") md.render("~strikethrough~") ```

Adds a `make_fence_rule()` factory function that generates fence parsing rules with configurable options, enabling reuse of the fence logic for custom marker characters (e.g. `:` for colon fences) without code duplication. ## Motivation The `colon_fence` plugin in `mdit-py-plugins` duplicates nearly all of the fence parsing logic, differing only in the marker character (`:` vs `` ` ``/`~`), token type, and info-string restrictions. This makes maintenance harder and prevents features like exact-match closing from being shared. ## Changes - **New `make_fence_rule()` factory** in fence.py with parameters: - `markers`: tuple of valid fence marker characters (default `("~", "\`")`) - `token_type`: token type name to emit (default `"fence"`) - `exact_match`: when `True`, closing fence must have **exactly** the same marker count as the opener (default `False`). This enables nested fences (e.g. `::::` wrapping `:::`). - `disallow_marker_in_info`: marker characters that reject the fence if found in the info string (default `("\`",)` per CommonMark) - `min_markers`: minimum marker count to form a fence (default `3`) - **`fence` remains a module-level export**: `fence = make_fence_rule()` — fully backwards compatible, identical behavior to the previous implementation. - **Exported from `markdown_it.rules_block`** — added `make_fence_rule` to `__all__`. ## Usage ```python from markdown_it import MarkdownIt from markdown_it.rules_block.fence import make_fence_rule # Colon fence (replaces mdit-py-plugins/colon_fence duplicated logic) md = MarkdownIt() md.block.ruler.before("fence", "colon_fence", make_fence_rule(markers=(":",), token_type="colon_fence", disallow_marker_in_info=())) # Override standard fence with exact-match closing (for MyST nested directives) md.block.ruler.at("fence", make_fence_rule(exact_match=True)) # Combine: all markers with exact matching md.block.ruler.at("fence", make_fence_rule( markers=("~", "`", ":"), exact_match=True)) ``` ## Performance No regression by design — configuration is captured in the closure at rule-creation time (no runtime dict lookups or extra branches in the hot path). The `closing_matcher` callable is resolved once at factory time. ## Tests 21 new tests covering: - Colon-fence-like marker behavior - Exact-match closing (nesting patterns) - `ruler.at()` override of standard fence - `min_markers` customization - `disallow_marker_in_info` variations Full existing test suite (981 tests) passes unchanged.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparing changes

Open a pull request

Commits on Dec 24, 2025

Commits on Feb 18, 2026

Commits on May 6, 2026

Commits on May 7, 2026

This comparison is taking too long to generate.

Uh oh!