Skip to content

fix: redirect should consume exactly one destination#331

Open
fprochazka wants to merge 1 commit into
tree-sitter:masterfrom
fprochazka:fix/redirect-single-destination
Open

fix: redirect should consume exactly one destination#331
fprochazka wants to merge 1 commit into
tree-sitter:masterfrom
fprochazka:fix/redirect-single-destination

Conversation

@fprochazka
Copy link
Copy Markdown

Summary

Fixes #233.

In bash, each redirect operator (>, <, >>, etc.) takes exactly one destination. The file_redirect rule previously used repeat1($._literal) which greedily consumed all subsequent tokens as redirect destinations.

This caused commands like grep 2>/dev/null -q "pattern" /etc/shells to incorrectly parse -q, "pattern", and /etc/shells as redirect destinations instead of command arguments.

Changes

  • file_redirect: change repeat1($._literal) to $._literal (single destination)
  • redirected_statement: allow argument fields after redirects so that tokens following a mid-command redirect are correctly preserved as arguments

Breaking changes

  • file_redirect.destination is no longer multiple: true in node-types.json — it is now always a single node
  • redirected_statement gains a new optional argument field for tokens that follow a mid-command redirect

Examples

Before (broken):

$ echo 'grep 2>/dev/null -q "pattern" /etc/shells' | tree-sitter parse
(redirected_statement
  (command (command_name (word)))       # grep
  (file_redirect (file_descriptor)
    (word) (word) (string) (word)))     # /dev/null -q "pattern" /etc/shells all as destinations

After (fixed):

(redirected_statement
  body: (command (command_name (word))) # grep
  redirect: (file_redirect
    (file_descriptor)
    destination: (word))               # only /dev/null
  argument: (word)                     # -q
  argument: (string)                   # "pattern"
  argument: (word))                    # /etc/shells

Test plan

  • All 100 existing corpus tests pass (1 test expectation updated)
  • Added regression test for find /path -name "*.sql" 2>/dev/null -exec grep -l "pattern" {} \;
  • Verified edge cases: end-of-command redirects, heredocs, herestrings, backtick substitutions, test commands ([...]), pipes, compound statements, function definitions

In bash, each redirect operator (>, <, >>, etc.) takes exactly one
destination. The file_redirect rule previously used repeat1($._literal)
which greedily consumed all subsequent tokens as redirect destinations.

This caused commands like `grep 2>/dev/null -q "pattern" /etc/shells`
to incorrectly parse -q, "pattern", and /etc/shells as redirect
destinations instead of command arguments.

Changes:
- file_redirect: change repeat1($._literal) to $._literal
- redirected_statement: allow argument fields after redirects so that
  tokens following a mid-command redirect are correctly preserved
AMZN-hgoffin added a commit to AMZN-hgoffin/tree-sitter-bash that referenced this pull request Apr 7, 2026
Redirects between arguments (e.g. grep 2>/dev/null -q pattern file)
previously caused the parser to misattach subsequent arguments as
redirect destinations, because file_redirect uses repeat1 for its
destination field.

This patch uses the external scanner to peek ahead past redirect
operators and their destinations. If more non-redirect words follow,
the scanner emits a mid-command token that keeps the redirect inside
command's repeat. If no words follow, the redirect is trailing and
uses redirected_statement as before.

Covers all redirect operators: > >> < 2> 2>&1 >& <& >| &> &>> >&- <&-.
Handles chained redirects, process substitution destinations, herestrings,
and close-fd operators. Also adds redirect support inside [ ] test
brackets (e.g. [ -f file 2>/dev/null ]).

The mid-command redirect rules are aliased to file_redirect, so
node-types.json is unchanged. Consumers that already handle
redirect fields on command nodes (which is required for pre-command
redirects like '2>&1 cmd') need no changes.

Related: tree-sitter#233, tree-sitter#331

PR tree-sitter#331 takes a different approach: changing file_redirect to accept
a single destination and adding argument fields to redirected_statement.
That is a simpler grammar change but a breaking change to node-types.json
that requires consumers to handle the new argument field. This patch
preserves the existing tree structure for trailing redirects and only
changes behavior for inputs that previously produced incorrect trees.
AMZN-hgoffin added a commit to AMZN-hgoffin/tree-sitter-bash that referenced this pull request Apr 7, 2026
Redirects between arguments (e.g. grep 2>/dev/null -q pattern file)
previously caused the parser to misattach subsequent arguments as
redirect destinations, because file_redirect uses repeat1 for its
destination field.

This patch uses the external scanner to peek ahead past redirect
operators and their destinations. If more non-redirect words follow,
the scanner emits a mid-command token that keeps the redirect inside
command's repeat. If no words follow, the redirect is trailing and
uses redirected_statement as before.

Covers all redirect operators: > >> < 2> 2>&1 >& <& >| &> &>> >&- <&-.
Handles chained redirects, process substitution destinations, herestrings,
and close-fd operators. Also adds redirect support inside [ ] test
brackets (e.g. [ -f file 2>/dev/null ]).

The mid-command redirect rules are aliased to file_redirect, so
node-types.json is unchanged. Trailing redirects still produce
redirected_statement exactly as before.

The one known tree shape change is for trailing redirects inside
backtick command substitutions (e.g. echo `cmd >file` arg), where
the redirect moves from redirected_statement to a direct child of
command. Backticks use the same symbol to open and close, making it
impractical for the scanner's lookahead to determine whether a
backtick terminates the current context. This is safe for consumers
because command already accepts redirect children for the pre-name
case (e.g. 2>&1 cmd), so all consumers must already handle redirects
on command nodes.

Related: tree-sitter#233, tree-sitter#331

PR tree-sitter#331 takes a different approach: changing file_redirect to accept
a single destination and adding argument fields to redirected_statement.
That is a simpler grammar change but a breaking change to node-types.json
that requires consumers to handle the new argument field. This patch
preserves the existing tree structure for trailing redirects and only
changes behavior for inputs that previously produced incorrect trees.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Incorrect parse of arguments after a redirect

1 participant