PR 2: BashLexer + opaque-region scanner#5
Merged
Merged
Conversation
Implements the bash lexer per SPEC §5 plus the shared opaque-region scanner that bash uses for $(...) and backtick boundary tracking (grammar-agnostic so v0.2 PowerShell can reuse it). Locked interpretation #2 from openspec/changes/v0.1-locked-interpretations applied: - $(cmd) and `cmd` collapse into one OpaqueSubstitution token; the parser will consume these as Arg{Kind=DynamicSkip, IsPath=false} so surrounding clauses stay parseable. - $((expr)) and ${var//pat/repl} emit UnparseableSentinel tokens; the parser will set ParsedCommand.IsUnparseable=true. All new types under src/ShellSyntaxTree/Internal/{Lexing,Bash/Lexing}/ are internal. InternalsVisibleTo("ShellSyntaxTree.Tests") added so the test project can exercise them directly. 78 new unit tests (16 OpaqueRegionScanner + 62 BashLexer) covering every SPEC §5 token kind, all operators, quote handling, escape handling, opaque-region collapse, unbalanced quotes, heredoc skip, and the four unparseable-sentinel triggers. Combined with PR 1's 18 tests: 96/96 passing on net10.0. SPEC.md updated: - §1 non-goals now explicitly addresses command substitution behavior - §5 adds OPAQUE_SUBSTITUTION + UNPARSEABLE_SENTINEL token kinds, the <<- heredoc operator, simple ${VAR} absorption into Word - §11 adds arithmetic + complex-param-expansion to IsUnparseable triggers (parser will materialize this in PR 3) OpenSpec tasks.md updated to mark Phase 1 fully complete and Phase 2 implementation done; Phase 2 commit/push happens via this PR.
2 tasks
Aaronontheweb
added a commit
that referenced
this pull request
May 10, 2026
Completes the SPEC §9 + §10 behaviors deferred from earlier PRs. Locked interpretations #4 (bash -c overflow → outer ParsedCommand.IsUnparseable), #5 (only cd/chdir propagate; pushd/popd parse but don't), and #6 (cd $VAR → synthetic DynamicSkip attribution arg) now fully materialized. What's wired: - Internal/Bash/Parsing/CdAttributionContext.cs (parser-internal, mutable): * SubshellStack of monotonic IDs handles sibling subshells cleanly (depth alone doesn't distinguish (a) && (b)) * SetLiteralAttribution / SetDynamicAttribution * HasAttribution flag - BashCommandParser updates: * Only cd/chdir update attribution context (interp #5) * Subsequent clauses get synthetic IsCwdAttribution arg: - Literal cd target → Kind=Literal, IsPath=true, Resolved=<cwd> - DynamicSkip cd target → Kind=DynamicSkip, IsPath=false, Raw="<dynamic-cwd>" (interp #6) * Subshell ( ... ) entry pushes attribution stack, exit pops; clauses inside get IsSubshell=true; subshell mutations don't leak out * bash -c / sh -c real recursion (was single-clause framework in PR 3): inner ParsedCommand surfaced with IsBashCWrapped=true on each clause; outer bash -c clause consumed (not emitted) * Recursion depth cap at 5 → outer ParsedCommand.IsUnparseable=true with reason "bash -c recursion depth exceeded (>5)" (interp #4) * Outer cd attribution does NOT leak into bash -c inner clauses (v0.1 decision; tracked for v0.1.x revisit) - BashResolver internal overload Resolve(raw, treatAsPath, options, workingDirectoryUnknown): lets the propagator force DynamicSkip on relative-path args under dynamic cd (interp #6) without polluting the public BashParserOptions surface. Tests: - 8 new parser tests: attribution propagation, subshell isolation, sequential cd, pushd non-propagation, dynamic cd, sibling subshells, bash -c depth-2 success, bash -c depth-6 overflow. - CorpusRunnerTests: added IsCwdAttribution field on ExpectedArg. - 30 new corpus entries (71-100): 10 cd-in-compound + 10 subshell + 10 bash -c. 5 PR 3-4 corpus entries refreshed (25, 28, 34, 35, 52). - Total: 337/337 passing (was 296 at PR 4 baseline). SPEC.md updated: - §9 rule 3 explicit on which CwdVerbs propagate (only cd/chdir); added "Dynamic-cd attribution" subsection per interp #6 - §10 subshell flattening rephrased; bash -c recursion-limit replaced with "set ParsedCommand.IsUnparseable=true" per interp #4 Public API surface unchanged. PublicApiSnapshotTests still 18/18 green.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PR 2 of the v0.1.0-alpha shipping plan. Implements the bash lexer per SPEC §5 plus the shared opaque-region scanner. Locked interpretation #2 (command substitution → DynamicSkip; arithmetic + complex param expansion → IsUnparseable) materialized at the lex level — the parser in PR 3 just consumes these tokens.
What landed
Internal/Lexing/OpaqueRegionScanner.cs— shared, grammar-agnostic boundary scanner.Scan(input, startIdx, openChar, closeChar)for(…)style +ScanSymmetric(input, startIdx, delim)for backtick-style. Honors\Xescapes, single-quote literal preservation, double-quote escape table.Internal/Bash/Lexing/BashLexer.cs+BashToken(record struct) +BashTokenKind(enum). Single entry pointTokenize(string) -> IReadOnlyList<BashToken>.Word,QuotedString,Operator(incl.<<-),Whitespace,Continuation,OpaqueSubstitution(for$()/ backticks),UnparseableSentinel(for$((expr))/${var//…}).InternalsVisibleToadded on the library project so tests can reach internal types.SPEC.md updates
OPAQUE_SUBSTITUTION,UNPARSEABLE_SENTINEL,<<-operator; documents simple${VAR}absorption intoWord; clarifies newlines outside heredoc body asWhitespace.$((…))and complex parameter expansion${var//…}(parser materializes the flag in PR 3).Verification
dotnet build -c Release— clean (0 warnings, 0 errors withTreatWarningsAsErrors=true)dotnet test -c Release— 96/96 passingpwsh ./scripts/Add-FileHeaders.ps1 -Verify— all files have headersopenspec validate v0.1-locked-interpretations --strict— passesTest plan
Test-ubuntu-latestTest-windows-latestNext
PR 3 implements verb tables (
BashArity,CwdVerbs,FileVerbs,FlagsWithValue) and theBashCommandParsercore — wiring the lexer's tokens into clauses, surfacingOpaqueSubstitutionasArg{Kind=DynamicSkip}andUnparseableSentinelasParsedCommand.IsUnparseable=true.