Markdown Parser: Core Implementation

## Description
Implementation of the Markdown Parser (GFM) according to specification `04_markdown_parser.adoc`.

## Scope
The MarkdownParser is a **lightweight component** for parsing GitHub Flavored Markdown. It is NOT a full GFM parser.

### What it does:
- Extract document structure (headings → hierarchical sections)
- Identify addressable elements (code blocks, tables, images)
- Parse YAML frontmatter metadata
- Map folder hierarchy to document structure
- Track source file + line numbers for all elements

### What it does NOT do:
- Render HTML
- Parse inline formatting (bold, italic, inline links)
- Analyze table contents
- Support Setext headings, footnotes, or math blocks

## Implementation Tasks

### Core Parsing
- [ ] **Heading Extraction** (AC-MD-01): `#` to `######` (ATX-style only)
- [ ] **YAML Frontmatter** (AC-MD-02): `---` block at file start
- [ ] **Code Block Extraction** (AC-MD-03): Fenced blocks with language
- [ ] **Table Recognition** (AC-MD-04): GFM pipe-tables (structure only)
- [ ] **Image Extraction**: `![alt](src "title")` pattern

### Folder-as-Document
- [ ] **Folder Scanning** (AC-MD-05): Recursive directory traversal
- [ ] **Sorting** (AC-MD-06):
  - `index.md` / `README.md` always first
  - Numeric prefixes: `01_`, `02_`, ... `10_`, `11_` (natural sort)
  - Alphabetic fallback
- [ ] **Hierarchy Mapping**: Folder depth → section level offset

### Data Models
```python
@dataclass
class MarkdownDocument:
    file_path: Path
    frontmatter: dict[str, Any]
    title: str
    sections: list[Section]  # Reuse from models.py
    elements: list[Element]  # Reuse from models.py

@dataclass
class FolderDocument:
    root_path: Path
    documents: list[MarkdownDocument]
    structure: list[Section]  # Combined hierarchy
```

### Interface Methods
```python
class MarkdownParser:
    def parse_file(self, file_path: Path) -> MarkdownDocument
    def parse_folder(self, folder_path: Path) -> FolderDocument
    def get_section(self, doc: MarkdownDocument, path: str) -> Section | None
    def get_elements(self, doc: MarkdownDocument, element_type: str | None = None) -> list[Element]
```

## Acceptance Criteria
- [ ] **AC-MD-01**: Heading extraction with correct hierarchy
- [ ] **AC-MD-02**: YAML frontmatter parsing (strings, numbers, lists, nested objects)
- [ ] **AC-MD-03**: Fenced code blocks with language detection
- [ ] **AC-MD-04**: Table detection with column/row count
- [ ] **AC-MD-05**: Folder structure correctly mapped
- [ ] **AC-MD-06**: Numeric prefix sorting (1, 2, 10 not 1, 10, 2)

## Regex Patterns (from spec)
```python
HEADING_PATTERN = r'^(#{1,6})\s+(.+?)(?:\s+#*)?$'
CODE_FENCE_OPEN = r'^(`{3,}|~{3,})(\w*)?$'
FRONTMATTER_PATTERN = r'^---\s*\n(.*?)\n---\s*\n'
IMAGE_PATTERN = r'!\[([^\]]*)\]$([^)\s]+)(?:\s+"([^"]*)")?$'
TABLE_ROW_PATTERN = r'^\|(.+)\|$'
```

## Dependencies
- **PyYAML** or **ruamel.yaml** for frontmatter parsing
- Reuse `Section`, `Element`, `SourceLocation` from `models.py`

## References
- `src/docs/spec/04_markdown_parser.adoc`
- `src/mcp_server/models.py` (shared data models)
- `src/mcp_server/asciidoc_parser.py` (reference implementation)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Markdown Parser: Core Implementation #4

Description

Scope

What it does:

What it does NOT do:

Implementation Tasks

Core Parsing

Folder-as-Document

Data Models

Interface Methods

Acceptance Criteria

Regex Patterns (from spec)

Dependencies

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Markdown Parser: Core Implementation #4

Description

Description

Scope

What it does:

What it does NOT do:

Implementation Tasks

Core Parsing

Folder-as-Document

Data Models

Interface Methods

Acceptance Criteria

Regex Patterns (from spec)

Dependencies

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions