Skip to content

[SPARK-51430][PYTHON] Stop PySpark context logger from propagating logs to stdout #50198

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

allisonwang-db
Copy link
Contributor

What changes were proposed in this pull request?

This PR stops PySpark context logger from propagating logs to stdout.

Why are the changes needed?

To improve user experience. Currently you can see this logging message in pyspark shell

In [1]: from pyspark.sql.functions import col

In [2]: spark.range(10).select(col("id2")).show()
{"ts": "2025-03-06 16:55:36.678", "level": "ERROR", "logger": "DataFrameQueryContextLogger", "msg": "[UNRESOLVED_COLUMN.WITH_SUGGESTION] A column, variable, or function parameter with name `id2` cannot be resolved. Did you mean one of the following? [`id`]. SQLSTATE: 42703", "context": {"file": "<ipython-input-2-bac6211f25a7>", "line": "1", "fragment": "col", "errorClass": "UNRESOLVED_COLUMN.WITH_SUGGESTION"}, "exception": {"class": "Py4JJavaError", "msg": "An error occurred while calling o34.select...

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing tests

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the PYTHON label Mar 7, 2025
@allisonwang-db
Copy link
Contributor Author

cc @HyukjinKwon @itholic

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After this PR, where I can see the error, @allisonwang-db ?

@allisonwang-db
Copy link
Contributor Author

@dongjoon-hyun you still see the analysis exception

AnalysisException: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column, variable, or function parameter with name `id2` cannot be resolved. Did you mean one of the following? [`id`]. SQLSTATE: 42703;
'Project ['id2]
+- Range (0, 10, step=1, splits=Some(16))

This PR just removed this huge chunk of stack traces from the output
Screenshot 2025-03-18 at 5 22 30 PM

@allisonwang-db
Copy link
Contributor Author

cc @HyukjinKwon we should backport this to branch-4.0 since it's really impacting the user experience for pyspark shell

@xinrong-meng
Copy link
Member

LGTM thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants