-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-50416][CORE] A more portable terminal / pipe test needed for bin/load-spark-env.sh #48937
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-50416][CORE] A more portable terminal / pipe test needed for bin/load-spark-env.sh #48937
Conversation
Not sure what "workflow run detection" is, or how to correct it.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for making a PR, @philwalk .
Do you enable GitHub Action in your repository?
There is a community documentation about Pull Request
. Please check the section, Pull Request
.
- https://spark.apache.org/contributing.html
Go to “Actions” tab on your forked repository and enable “Build and test” and “Report test results” workflows
I enabled Build and Test on my forked repo. |
Let's also file a JIRA, and add it into the PR description, see also https://spark.apache.org/contributing.html |
The touched code seems to be borrowed from Apache Hive, would you like to also report the issue to the Hive community? |
|
Got it. I approved it a few minutes ago, @philwalk . |
By contrast, $ bin/spark-shell
WARNING: Using incubator modules: jdk.incubator.vector
Using Spark's default log4j profile: org/apache/spark/log4j2-pattern-layout-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 4.0.0-preview2
/_/
Using Scala version 2.13.14 (OpenJDK 64-Bit Server VM, Java 21.0.4)
Type in expressions to have them evaluated.
Type :help for more information.
Spark context Web UI available at http://d5:4040
Spark context available as 'sc' (master = local[*], app id = local-1732575351925).
Spark session available as 'spark'.
scala>
|
Is this waiting on me to do something? |
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM
@dongjoon-hyun @HyukjinKwon @pan3793 Do you need to take another look? |
I re-tested the modified line in |
Thank you @philwalk , let's wait one more day. I will merge this one if there are no other new comments |
…in/load-spark-env.sh ### What changes were proposed in this pull request? The last action in [bin/load-spark-env.sh](https://github.com/apache/spark/blob/d5da49d56d7dec5f8a96c5252384d865f7efd4d9/bin/load-spark-env.sh#L68) performs a test to determine whether running in a terminal or not, and whether `stdin` is reading from a pipe. A more portable test is needed. ### Why are the changes needed? The current approach relies on `ps` with options that vary significantly between different Unix-like systems. Specifically, it prints an error message in both `cygwin` and `msys2` (and by extension, in all of the variations of `git-for-windows`). It doesn't print an error message, but fails to detect a terminal session in `Linux` and `Osx/Darwin homebrew` (always thinks STDIN is a pipe). Here's what the problem looks like in a `cygwin64` session (with `set -x` just ahead of the section of interest): If called directly: ```bash $ bin/load-spark-env.sh ++ ps -o stat= -p 1947 ps: unknown option -- o Try `ps --help' for more information. + [[ ! '' =~ \+ ]] + [[ -p /dev/stdin ]] + export 'SPARK_BEELINE_OPTS= -Djline.terminal=jline.UnsupportedTerminal' + SPARK_BEELINE_OPTS=' -Djline.terminal=jline.UnsupportedTerminal' ``` Interestingly, due to the 2-part test, it does the right thing w.r.t. the Terminal test, the main problem being the error message. If called downstream from a pipe: ```bash $ echo "yo" | bin/load-spark-env.sh ++ ps -o stat= -p 1955 ps: unknown option -- o Try `ps --help' for more information. + [[ ! '' =~ \+ ]] + [[ -p /dev/stdin ]] ``` Again, it correctly detects the pipe environment, but with an error message. In WSL2 Ubuntu, the test doesn't correctly detect a non-pipe terminal session: ```bash # /opt/spark$ bin/load-spark-env.sh ++ ps -o stat= -p 1423 + [[ ! S+ =~ \+ ]] # echo "yo!" | bin/load-spark-env.sh ++ ps -o stat= -p 1416 + [[ ! S+ =~ \+ ]] ``` In `#134-Ubuntu SMP Fri Sep 27 20:20:17 UTC 2024`, the same failure occurs (it doesn't recognize terminal environments). ### Does this PR introduce _any_ user-facing change? This is a proposed bug fix, and, other than fixing the bug, should be invisible to users. ### How was this patch tested? The patch was verified to behave as intended in terminal sessions, both interactive and piped, in the following 5 environments. ``` - Linux quadd 5.15.0-124-generic #134-Ubuntu SMP Fri Sep 27 20:20:17 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux - Linux d5 5.15.153.1-microsoft-standard-WSL2 #1 SMP Fri Mar 29 23:14:13 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux - MINGW64_NT-10.0-22631 d5 3.5.4-0bc1222b.x86_64 2024-09-04 18:28 UTC x86_64 Msys - CYGWIN_NT-10.0-22631 d5 3.5.3-1.x86_64 2024-04-03 17:25 UTC x86_64 Cygwin - Darwin suemac.local 23.6.0 Darwin Kernel Version 23.6.0: Mon Jul 29 21:14:21 PDT 2024; root:xnu-10063.141.2~1/RELEASE_ARM64_T8103 arm64 ``` The test was to manually run the following script, verifying the expected response to both pipe and terminal sessions. ```bash #!/bin/bash if [ -e /usr/bin/tty -a "`tty`" != "not a tty" -a ! -p /dev/stdin ]; then echo "not a pipe" else echo "is a pipe" fi ``` The output of the manual test in all 5 tested environments. ``` philwalkquadd:/opt/spark $ isPipe not a pipe # $ echo "yo" | isPipe is a pipe # ``` ### Was this patch authored or co-authored using generative AI tooling? No Closes #48937 from philwalk/portability-fix-for-load-spark-env.sh. Authored-by: philwalk <philwalk9@gmail.com> Signed-off-by: yangjie01 <yangjie01@baidu.com> (cherry picked from commit 8d26008) Signed-off-by: yangjie01 <yangjie01@baidu.com>
…in/load-spark-env.sh ### What changes were proposed in this pull request? The last action in [bin/load-spark-env.sh](https://github.com/apache/spark/blob/d5da49d56d7dec5f8a96c5252384d865f7efd4d9/bin/load-spark-env.sh#L68) performs a test to determine whether running in a terminal or not, and whether `stdin` is reading from a pipe. A more portable test is needed. ### Why are the changes needed? The current approach relies on `ps` with options that vary significantly between different Unix-like systems. Specifically, it prints an error message in both `cygwin` and `msys2` (and by extension, in all of the variations of `git-for-windows`). It doesn't print an error message, but fails to detect a terminal session in `Linux` and `Osx/Darwin homebrew` (always thinks STDIN is a pipe). Here's what the problem looks like in a `cygwin64` session (with `set -x` just ahead of the section of interest): If called directly: ```bash $ bin/load-spark-env.sh ++ ps -o stat= -p 1947 ps: unknown option -- o Try `ps --help' for more information. + [[ ! '' =~ \+ ]] + [[ -p /dev/stdin ]] + export 'SPARK_BEELINE_OPTS= -Djline.terminal=jline.UnsupportedTerminal' + SPARK_BEELINE_OPTS=' -Djline.terminal=jline.UnsupportedTerminal' ``` Interestingly, due to the 2-part test, it does the right thing w.r.t. the Terminal test, the main problem being the error message. If called downstream from a pipe: ```bash $ echo "yo" | bin/load-spark-env.sh ++ ps -o stat= -p 1955 ps: unknown option -- o Try `ps --help' for more information. + [[ ! '' =~ \+ ]] + [[ -p /dev/stdin ]] ``` Again, it correctly detects the pipe environment, but with an error message. In WSL2 Ubuntu, the test doesn't correctly detect a non-pipe terminal session: ```bash # /opt/spark$ bin/load-spark-env.sh ++ ps -o stat= -p 1423 + [[ ! S+ =~ \+ ]] # echo "yo!" | bin/load-spark-env.sh ++ ps -o stat= -p 1416 + [[ ! S+ =~ \+ ]] ``` In `#134-Ubuntu SMP Fri Sep 27 20:20:17 UTC 2024`, the same failure occurs (it doesn't recognize terminal environments). ### Does this PR introduce _any_ user-facing change? This is a proposed bug fix, and, other than fixing the bug, should be invisible to users. ### How was this patch tested? The patch was verified to behave as intended in terminal sessions, both interactive and piped, in the following 5 environments. ``` - Linux quadd 5.15.0-124-generic #134-Ubuntu SMP Fri Sep 27 20:20:17 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux - Linux d5 5.15.153.1-microsoft-standard-WSL2 #1 SMP Fri Mar 29 23:14:13 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux - MINGW64_NT-10.0-22631 d5 3.5.4-0bc1222b.x86_64 2024-09-04 18:28 UTC x86_64 Msys - CYGWIN_NT-10.0-22631 d5 3.5.3-1.x86_64 2024-04-03 17:25 UTC x86_64 Cygwin - Darwin suemac.local 23.6.0 Darwin Kernel Version 23.6.0: Mon Jul 29 21:14:21 PDT 2024; root:xnu-10063.141.2~1/RELEASE_ARM64_T8103 arm64 ``` The test was to manually run the following script, verifying the expected response to both pipe and terminal sessions. ```bash #!/bin/bash if [ -e /usr/bin/tty -a "`tty`" != "not a tty" -a ! -p /dev/stdin ]; then echo "not a pipe" else echo "is a pipe" fi ``` The output of the manual test in all 5 tested environments. ``` philwalkquadd:/opt/spark $ isPipe not a pipe # $ echo "yo" | isPipe is a pipe # ``` ### Was this patch authored or co-authored using generative AI tooling? No Closes #48937 from philwalk/portability-fix-for-load-spark-env.sh. Authored-by: philwalk <philwalk9@gmail.com> Signed-off-by: yangjie01 <yangjie01@baidu.com> (cherry picked from commit 8d26008) Signed-off-by: yangjie01 <yangjie01@baidu.com>
Merged into master/branch-4.0/branch-3.5. Thanks @philwalk @dongjoon-hyun @HyukjinKwon @pan3793 |
What changes were proposed in this pull request?
The last action in bin/load-spark-env.sh performs a test to determine whether running in a terminal or not, and whether
stdin
is reading from a pipe. A more portable test is needed.Why are the changes needed?
The current approach relies on
ps
with options that vary significantly between different Unix-like systems. Specifically, it prints an error message in bothcygwin
andmsys2
(and by extension, in all of the variations ofgit-for-windows
). It doesn't print an error message, but fails to detect a terminal session inLinux
andOsx/Darwin homebrew
(always thinks STDIN is a pipe).Here's what the problem looks like in a
cygwin64
session (withset -x
just ahead of the section of interest):If called directly:
Interestingly, due to the 2-part test, it does the right thing w.r.t. the Terminal test, the main problem being the error message.
If called downstream from a pipe:
Again, it correctly detects the pipe environment, but with an error message.
In WSL2 Ubuntu, the test doesn't correctly detect a non-pipe terminal session:
In
#134-Ubuntu SMP Fri Sep 27 20:20:17 UTC 2024
, the same failure occurs (it doesn't recognize terminal environments).Does this PR introduce any user-facing change?
This is a proposed bug fix, and, other than fixing the bug, should be invisible to users.
How was this patch tested?
The patch was verified to behave as intended in terminal sessions, both interactive and piped, in the following 5 environments.
The test was to manually run the following script, verifying the expected response to both pipe and terminal sessions.
The output of the manual test in all 5 tested environments.
Was this patch authored or co-authored using generative AI tooling?
No