Skip to content

[SPARK-50416][CORE] A more portable terminal / pipe test needed for bin/load-spark-env.sh #48937

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

philwalk
Copy link
Contributor

@philwalk philwalk commented Nov 22, 2024

What changes were proposed in this pull request?

The last action in bin/load-spark-env.sh performs a test to determine whether running in a terminal or not, and whether stdin is reading from a pipe. A more portable test is needed.

Why are the changes needed?

The current approach relies on ps with options that vary significantly between different Unix-like systems. Specifically, it prints an error message in both cygwin and msys2 (and by extension, in all of the variations of git-for-windows). It doesn't print an error message, but fails to detect a terminal session in Linux and Osx/Darwin homebrew (always thinks STDIN is a pipe).

Here's what the problem looks like in a cygwin64 session (with set -x just ahead of the section of interest):

If called directly:

$ bin/load-spark-env.sh
++ ps -o stat= -p 1947
ps: unknown option -- o
Try `ps --help' for more information.
+ [[ ! '' =~ \+ ]]
+ [[ -p /dev/stdin ]]
+ export 'SPARK_BEELINE_OPTS= -Djline.terminal=jline.UnsupportedTerminal'
+ SPARK_BEELINE_OPTS=' -Djline.terminal=jline.UnsupportedTerminal'

Interestingly, due to the 2-part test, it does the right thing w.r.t. the Terminal test, the main problem being the error message.
If called downstream from a pipe:

$ echo "yo" | bin/load-spark-env.sh
++ ps -o stat= -p 1955
ps: unknown option -- o
Try `ps --help' for more information.
+ [[ ! '' =~ \+ ]]
+ [[ -p /dev/stdin ]]

Again, it correctly detects the pipe environment, but with an error message.

In WSL2 Ubuntu, the test doesn't correctly detect a non-pipe terminal session:

# /opt/spark$ bin/load-spark-env.sh
++ ps -o stat= -p 1423
+ [[ ! S+ =~ \+ ]]
# echo "yo!" | bin/load-spark-env.sh
++ ps -o stat= -p 1416
+ [[ ! S+ =~ \+ ]]

In #134-Ubuntu SMP Fri Sep 27 20:20:17 UTC 2024, the same failure occurs (it doesn't recognize terminal environments).

Does this PR introduce any user-facing change?

This is a proposed bug fix, and, other than fixing the bug, should be invisible to users.

How was this patch tested?

The patch was verified to behave as intended in terminal sessions, both interactive and piped, in the following 5 environments.


- Linux quadd 5.15.0-124-generic #134-Ubuntu SMP Fri Sep 27 20:20:17 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
- Linux d5 5.15.153.1-microsoft-standard-WSL2 #1 SMP Fri Mar 29 23:14:13 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
- MINGW64_NT-10.0-22631 d5 3.5.4-0bc1222b.x86_64 2024-09-04 18:28 UTC x86_64 Msys
- CYGWIN_NT-10.0-22631 d5 3.5.3-1.x86_64 2024-04-03 17:25 UTC x86_64 Cygwin
- Darwin suemac.local 23.6.0 Darwin Kernel Version 23.6.0: Mon Jul 29 21:14:21 PDT 2024; root:xnu-10063.141.2~1/RELEASE_ARM64_T8103 arm64

The test was to manually run the following script, verifying the expected response to both pipe and terminal sessions.

#!/bin/bash
if [ -e /usr/bin/tty -a "`tty`" != "not a tty" -a ! -p /dev/stdin ]; then
  echo "not a pipe"
else
  echo "is a pipe"
fi

The output of the manual test in all 5 tested environments.

philwalk@quadd:/opt/spark
$ isPipe
not a pipe
#
$ echo "yo" | isPipe
is a pipe
#

Was this patch authored or co-authored using generative AI tooling?

No

@philwalk
Copy link
Contributor Author

Not sure what "workflow run detection" is, or how to correct it.
The output of git remote -vv is:

# git remote -vv
origin  https://github.com/philwalk/spark.git (fetch)
origin  https://github.com/philwalk/spark.git (push)

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for making a PR, @philwalk .

Do you enable GitHub Action in your repository?

There is a community documentation about Pull Request. Please check the section, Pull Request.

@philwalk
Copy link
Contributor Author

I enabled Build and Test on my forked repo.

@HyukjinKwon
Copy link
Member

Let's also file a JIRA, and add it into the PR description, see also https://spark.apache.org/contributing.html

@pan3793
Copy link
Member

pan3793 commented Nov 25, 2024

The touched code seems to be borrowed from Apache Hive, would you like to also report the issue to the Hive community?

@philwalk
Copy link
Contributor Author

Let's also file a JIRA, and add it into the PR description, see also https://spark.apache.org/contributing.html
I requested a JIRA account, not yet created ...

@dongjoon-hyun
Copy link
Member

Got it. I approved it a few minutes ago, @philwalk .

Screenshot 2024-11-25 at 09 39 33

@philwalk
Copy link
Contributor Author

jira bug report

@philwalk
Copy link
Contributor Author

philwalk commented Nov 25, 2024

The touched code seems to be borrowed from Apache Hive, would you like to also report the issue to the Hive community?
Not at this time ..
It seems that hive needs work to be compatible with cygwin or msys2 bash shell environments. There are a number of problems and I'm not able to work on it at this time.

By contrast, spark-shell starts right up (after applying this PR):

$ bin/spark-shell
WARNING: Using incubator modules: jdk.incubator.vector
Using Spark's default log4j profile: org/apache/spark/log4j2-pattern-layout-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 4.0.0-preview2
      /_/

Using Scala version 2.13.14 (OpenJDK 64-Bit Server VM, Java 21.0.4)
Type in expressions to have them evaluated.
Type :help for more information.
Spark context Web UI available at http://d5:4040
Spark context available as 'sc' (master = local[*], app id = local-1732575351925).
Spark session available as 'spark'.

scala>

@HyukjinKwon HyukjinKwon changed the title A more portable terminal / pipe test needed for bin/load-spark-env.sh [SPARK-50416][CORE] A more portable terminal / pipe test needed for bin/load-spark-env.sh Nov 26, 2024
@philwalk
Copy link
Contributor Author

philwalk commented Dec 5, 2024

Is this waiting on me to do something?

Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Mar 16, 2025
@github-actions github-actions bot closed this Mar 17, 2025
@LuciferYang LuciferYang reopened this Mar 17, 2025
@LuciferYang LuciferYang removed the Stale label Mar 17, 2025
Copy link
Contributor Author

@philwalk philwalk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@LuciferYang LuciferYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM

@LuciferYang
Copy link
Contributor

@dongjoon-hyun @HyukjinKwon @pan3793 Do you need to take another look?

@philwalk
Copy link
Contributor Author

philwalk commented Mar 18, 2025

@dongjoon-hyun @HyukjinKwon @pan3793 Do you need to take another look?

I re-tested the modified line in MINGW64_NT-10.0-22631 d5 3.5.4-0bc1222b.x86_64 2024-09-04 18:28 UTC x86_64 Msys and the new version appears to be functionally identical, and the changes appear to be recommended style.

@LuciferYang
Copy link
Contributor

LuciferYang commented Mar 18, 2025

I re-tested the modified line in MINGW64_NT-10.0-22631 d5 3.5.4-0bc1222b.x86_64 2024-09-04 18:28 UTC x86_64 Msys and the new version appears to be functionally identical, and the changes appear to be recommended style.

Thank you @philwalk , let's wait one more day. I will merge this one if there are no other new comments

LuciferYang pushed a commit that referenced this pull request Mar 20, 2025
…in/load-spark-env.sh

### What changes were proposed in this pull request?
The last action in [bin/load-spark-env.sh](https://github.com/apache/spark/blob/d5da49d56d7dec5f8a96c5252384d865f7efd4d9/bin/load-spark-env.sh#L68) performs a test to determine whether running in a terminal or not, and whether `stdin` is reading from a pipe.   A more portable test is needed.

### Why are the changes needed?
The current approach relies on `ps` with options that vary significantly between different Unix-like systems.  Specifically, it prints an error message in both `cygwin` and `msys2` (and by extension, in all of the variations of `git-for-windows`).   It doesn't print an error message, but fails to detect a terminal session in `Linux` and `Osx/Darwin homebrew` (always thinks STDIN is a pipe).

Here's what the problem looks like in a `cygwin64` session (with `set -x` just ahead of the section of interest):

If called directly:
```bash
$ bin/load-spark-env.sh
++ ps -o stat= -p 1947
ps: unknown option -- o
Try `ps --help' for more information.
+ [[ ! '' =~ \+ ]]
+ [[ -p /dev/stdin ]]
+ export 'SPARK_BEELINE_OPTS= -Djline.terminal=jline.UnsupportedTerminal'
+ SPARK_BEELINE_OPTS=' -Djline.terminal=jline.UnsupportedTerminal'
```
Interestingly, due to the 2-part test, it does the right thing w.r.t. the Terminal test, the main problem being the error message.
If called downstream from a pipe:
```bash
$ echo "yo" | bin/load-spark-env.sh
++ ps -o stat= -p 1955
ps: unknown option -- o
Try `ps --help' for more information.
+ [[ ! '' =~ \+ ]]
+ [[ -p /dev/stdin ]]
```
Again, it correctly detects the pipe environment, but with an error message.

In WSL2 Ubuntu, the test doesn't correctly detect a non-pipe terminal session:
```bash
# /opt/spark$ bin/load-spark-env.sh
++ ps -o stat= -p 1423
+ [[ ! S+ =~ \+ ]]
# echo "yo!" | bin/load-spark-env.sh
++ ps -o stat= -p 1416
+ [[ ! S+ =~ \+ ]]
```
In `#134-Ubuntu SMP Fri Sep 27 20:20:17 UTC 2024`, the same failure occurs (it doesn't recognize terminal environments).

### Does this PR introduce _any_ user-facing change?
This is a proposed bug fix, and, other than fixing the bug,  should be invisible to users.

### How was this patch tested?
The patch was verified to behave as intended in terminal sessions, both interactive and piped, in the following 5 environments.
```

- Linux quadd 5.15.0-124-generic #134-Ubuntu SMP Fri Sep 27 20:20:17 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
- Linux d5 5.15.153.1-microsoft-standard-WSL2 #1 SMP Fri Mar 29 23:14:13 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
- MINGW64_NT-10.0-22631 d5 3.5.4-0bc1222b.x86_64 2024-09-04 18:28 UTC x86_64 Msys
- CYGWIN_NT-10.0-22631 d5 3.5.3-1.x86_64 2024-04-03 17:25 UTC x86_64 Cygwin
- Darwin suemac.local 23.6.0 Darwin Kernel Version 23.6.0: Mon Jul 29 21:14:21 PDT 2024; root:xnu-10063.141.2~1/RELEASE_ARM64_T8103 arm64

```
The test was to manually run the following script, verifying the expected response to both pipe and terminal sessions.
```bash
#!/bin/bash
if [ -e /usr/bin/tty -a "`tty`" != "not a tty" -a ! -p /dev/stdin ]; then
  echo "not a pipe"
else
  echo "is a pipe"
fi
```
The output of the manual test in all 5 tested environments.
```
philwalkquadd:/opt/spark
$ isPipe
not a pipe
#
$ echo "yo" | isPipe
is a pipe
#
```

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #48937 from philwalk/portability-fix-for-load-spark-env.sh.

Authored-by: philwalk <philwalk9@gmail.com>
Signed-off-by: yangjie01 <yangjie01@baidu.com>
(cherry picked from commit 8d26008)
Signed-off-by: yangjie01 <yangjie01@baidu.com>
LuciferYang pushed a commit that referenced this pull request Mar 20, 2025
…in/load-spark-env.sh

### What changes were proposed in this pull request?
The last action in [bin/load-spark-env.sh](https://github.com/apache/spark/blob/d5da49d56d7dec5f8a96c5252384d865f7efd4d9/bin/load-spark-env.sh#L68) performs a test to determine whether running in a terminal or not, and whether `stdin` is reading from a pipe.   A more portable test is needed.

### Why are the changes needed?
The current approach relies on `ps` with options that vary significantly between different Unix-like systems.  Specifically, it prints an error message in both `cygwin` and `msys2` (and by extension, in all of the variations of `git-for-windows`).   It doesn't print an error message, but fails to detect a terminal session in `Linux` and `Osx/Darwin homebrew` (always thinks STDIN is a pipe).

Here's what the problem looks like in a `cygwin64` session (with `set -x` just ahead of the section of interest):

If called directly:
```bash
$ bin/load-spark-env.sh
++ ps -o stat= -p 1947
ps: unknown option -- o
Try `ps --help' for more information.
+ [[ ! '' =~ \+ ]]
+ [[ -p /dev/stdin ]]
+ export 'SPARK_BEELINE_OPTS= -Djline.terminal=jline.UnsupportedTerminal'
+ SPARK_BEELINE_OPTS=' -Djline.terminal=jline.UnsupportedTerminal'
```
Interestingly, due to the 2-part test, it does the right thing w.r.t. the Terminal test, the main problem being the error message.
If called downstream from a pipe:
```bash
$ echo "yo" | bin/load-spark-env.sh
++ ps -o stat= -p 1955
ps: unknown option -- o
Try `ps --help' for more information.
+ [[ ! '' =~ \+ ]]
+ [[ -p /dev/stdin ]]
```
Again, it correctly detects the pipe environment, but with an error message.

In WSL2 Ubuntu, the test doesn't correctly detect a non-pipe terminal session:
```bash
# /opt/spark$ bin/load-spark-env.sh
++ ps -o stat= -p 1423
+ [[ ! S+ =~ \+ ]]
# echo "yo!" | bin/load-spark-env.sh
++ ps -o stat= -p 1416
+ [[ ! S+ =~ \+ ]]
```
In `#134-Ubuntu SMP Fri Sep 27 20:20:17 UTC 2024`, the same failure occurs (it doesn't recognize terminal environments).

### Does this PR introduce _any_ user-facing change?
This is a proposed bug fix, and, other than fixing the bug,  should be invisible to users.

### How was this patch tested?
The patch was verified to behave as intended in terminal sessions, both interactive and piped, in the following 5 environments.
```

- Linux quadd 5.15.0-124-generic #134-Ubuntu SMP Fri Sep 27 20:20:17 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
- Linux d5 5.15.153.1-microsoft-standard-WSL2 #1 SMP Fri Mar 29 23:14:13 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
- MINGW64_NT-10.0-22631 d5 3.5.4-0bc1222b.x86_64 2024-09-04 18:28 UTC x86_64 Msys
- CYGWIN_NT-10.0-22631 d5 3.5.3-1.x86_64 2024-04-03 17:25 UTC x86_64 Cygwin
- Darwin suemac.local 23.6.0 Darwin Kernel Version 23.6.0: Mon Jul 29 21:14:21 PDT 2024; root:xnu-10063.141.2~1/RELEASE_ARM64_T8103 arm64

```
The test was to manually run the following script, verifying the expected response to both pipe and terminal sessions.
```bash
#!/bin/bash
if [ -e /usr/bin/tty -a "`tty`" != "not a tty" -a ! -p /dev/stdin ]; then
  echo "not a pipe"
else
  echo "is a pipe"
fi
```
The output of the manual test in all 5 tested environments.
```
philwalkquadd:/opt/spark
$ isPipe
not a pipe
#
$ echo "yo" | isPipe
is a pipe
#
```

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #48937 from philwalk/portability-fix-for-load-spark-env.sh.

Authored-by: philwalk <philwalk9@gmail.com>
Signed-off-by: yangjie01 <yangjie01@baidu.com>
(cherry picked from commit 8d26008)
Signed-off-by: yangjie01 <yangjie01@baidu.com>
@LuciferYang
Copy link
Contributor

Merged into master/branch-4.0/branch-3.5. Thanks @philwalk @dongjoon-hyun @HyukjinKwon @pan3793

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants