Skip to content

PR 2: BashLexer + opaque-region scanner#5

Merged
Aaronontheweb merged 1 commit into
devfrom
pr2-lexer
May 10, 2026
Merged

PR 2: BashLexer + opaque-region scanner#5
Aaronontheweb merged 1 commit into
devfrom
pr2-lexer

Conversation

@Aaronontheweb

Copy link
Copy Markdown
Owner

Summary

PR 2 of the v0.1.0-alpha shipping plan. Implements the bash lexer per SPEC §5 plus the shared opaque-region scanner. Locked interpretation #2 (command substitution → DynamicSkip; arithmetic + complex param expansion → IsUnparseable) materialized at the lex level — the parser in PR 3 just consumes these tokens.

What landed

  • Internal/Lexing/OpaqueRegionScanner.cs — shared, grammar-agnostic boundary scanner. Scan(input, startIdx, openChar, closeChar) for (…) style + ScanSymmetric(input, startIdx, delim) for backtick-style. Honors \X escapes, single-quote literal preservation, double-quote escape table.
  • Internal/Bash/Lexing/BashLexer.cs + BashToken (record struct) + BashTokenKind (enum). Single entry point Tokenize(string) -> IReadOnlyList<BashToken>.
  • Token kinds: Word, QuotedString, Operator (incl. <<-), Whitespace, Continuation, OpaqueSubstitution (for $() / backticks), UnparseableSentinel (for $((expr)) / ${var//…}).
  • 78 new unit tests (16 OpaqueRegionScanner + 62 BashLexer). Combined with PR 1's 18: 96/96 passing.
  • InternalsVisibleTo added on the library project so tests can reach internal types.

SPEC.md updates

  • §1 non-goals: command substitution explicitly noted as marked DynamicSkip (not executed).
  • §5 token kinds: adds OPAQUE_SUBSTITUTION, UNPARSEABLE_SENTINEL, <<- operator; documents simple ${VAR} absorption into Word; clarifies newlines outside heredoc body as Whitespace.
  • §11 IsUnparseable triggers: adds arithmetic expansion $((…)) and complex parameter expansion ${var//…} (parser materializes the flag in PR 3).

Verification

  • dotnet build -c Release — clean (0 warnings, 0 errors with TreatWarningsAsErrors=true)
  • dotnet test -c Release — 96/96 passing
  • pwsh ./scripts/Add-FileHeaders.ps1 -Verify — all files have headers
  • openspec validate v0.1-locked-interpretations --strict — passes
  • ✅ Public API surface unchanged (no new public types; PublicApiSnapshotTests still green)

Test plan

  • CI passes on Test-ubuntu-latest
  • CI passes on Test-windows-latest

Next

PR 3 implements verb tables (BashArity, CwdVerbs, FileVerbs, FlagsWithValue) and the BashCommandParser core — wiring the lexer's tokens into clauses, surfacing OpaqueSubstitution as Arg{Kind=DynamicSkip} and UnparseableSentinel as ParsedCommand.IsUnparseable=true.

Implements the bash lexer per SPEC §5 plus the shared opaque-region
scanner that bash uses for $(...) and backtick boundary tracking
(grammar-agnostic so v0.2 PowerShell can reuse it). Locked interpretation
#2 from openspec/changes/v0.1-locked-interpretations applied:

- $(cmd) and `cmd` collapse into one OpaqueSubstitution token; the
  parser will consume these as Arg{Kind=DynamicSkip, IsPath=false} so
  surrounding clauses stay parseable.
- $((expr)) and ${var//pat/repl} emit UnparseableSentinel tokens; the
  parser will set ParsedCommand.IsUnparseable=true.

All new types under src/ShellSyntaxTree/Internal/{Lexing,Bash/Lexing}/
are internal. InternalsVisibleTo("ShellSyntaxTree.Tests") added so the
test project can exercise them directly.

78 new unit tests (16 OpaqueRegionScanner + 62 BashLexer) covering every
SPEC §5 token kind, all operators, quote handling, escape handling,
opaque-region collapse, unbalanced quotes, heredoc skip, and the four
unparseable-sentinel triggers. Combined with PR 1's 18 tests: 96/96
passing on net10.0.

SPEC.md updated:
- §1 non-goals now explicitly addresses command substitution behavior
- §5 adds OPAQUE_SUBSTITUTION + UNPARSEABLE_SENTINEL token kinds, the
  <<- heredoc operator, simple ${VAR} absorption into Word
- §11 adds arithmetic + complex-param-expansion to IsUnparseable
  triggers (parser will materialize this in PR 3)

OpenSpec tasks.md updated to mark Phase 1 fully complete and Phase 2
implementation done; Phase 2 commit/push happens via this PR.
@Aaronontheweb Aaronontheweb enabled auto-merge (squash) May 10, 2026 18:09
@Aaronontheweb Aaronontheweb merged commit a58e5db into dev May 10, 2026
2 checks passed
@Aaronontheweb Aaronontheweb deleted the pr2-lexer branch May 10, 2026 18:11
Aaronontheweb added a commit that referenced this pull request May 10, 2026
Completes the SPEC §9 + §10 behaviors deferred from earlier PRs.
Locked interpretations #4 (bash -c overflow → outer
ParsedCommand.IsUnparseable), #5 (only cd/chdir propagate; pushd/popd
parse but don't), and #6 (cd $VAR → synthetic DynamicSkip attribution
arg) now fully materialized.

What's wired:

- Internal/Bash/Parsing/CdAttributionContext.cs (parser-internal,
  mutable):
  * SubshellStack of monotonic IDs handles sibling subshells cleanly
    (depth alone doesn't distinguish (a) && (b))
  * SetLiteralAttribution / SetDynamicAttribution
  * HasAttribution flag

- BashCommandParser updates:
  * Only cd/chdir update attribution context (interp #5)
  * Subsequent clauses get synthetic IsCwdAttribution arg:
    - Literal cd target → Kind=Literal, IsPath=true, Resolved=<cwd>
    - DynamicSkip cd target → Kind=DynamicSkip, IsPath=false,
      Raw="<dynamic-cwd>" (interp #6)
  * Subshell ( ... ) entry pushes attribution stack, exit pops; clauses
    inside get IsSubshell=true; subshell mutations don't leak out
  * bash -c / sh -c real recursion (was single-clause framework in PR 3):
    inner ParsedCommand surfaced with IsBashCWrapped=true on each clause;
    outer bash -c clause consumed (not emitted)
  * Recursion depth cap at 5 → outer ParsedCommand.IsUnparseable=true
    with reason "bash -c recursion depth exceeded (>5)" (interp #4)
  * Outer cd attribution does NOT leak into bash -c inner clauses (v0.1
    decision; tracked for v0.1.x revisit)

- BashResolver internal overload Resolve(raw, treatAsPath, options,
  workingDirectoryUnknown): lets the propagator force DynamicSkip on
  relative-path args under dynamic cd (interp #6) without polluting
  the public BashParserOptions surface.

Tests:
- 8 new parser tests: attribution propagation, subshell isolation,
  sequential cd, pushd non-propagation, dynamic cd, sibling subshells,
  bash -c depth-2 success, bash -c depth-6 overflow.
- CorpusRunnerTests: added IsCwdAttribution field on ExpectedArg.
- 30 new corpus entries (71-100): 10 cd-in-compound + 10 subshell +
  10 bash -c. 5 PR 3-4 corpus entries refreshed (25, 28, 34, 35, 52).
- Total: 337/337 passing (was 296 at PR 4 baseline).

SPEC.md updated:
- §9 rule 3 explicit on which CwdVerbs propagate (only cd/chdir);
  added "Dynamic-cd attribution" subsection per interp #6
- §10 subshell flattening rephrased; bash -c recursion-limit replaced
  with "set ParsedCommand.IsUnparseable=true" per interp #4

Public API surface unchanged. PublicApiSnapshotTests still 18/18 green.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant