Skip to content

Commit 88cc153

Browse files
itholicHyukjinKwon
authored andcommitted
[SPARK-48650][PYTHON] Display correct call site from IPython Notebook
### What changes were proposed in this pull request? This PR proposes to display correct call site information from IPython Notebook. ### Why are the changes needed? We added `DataFrameQueryContext` for PySpark error message from #45377, but it does not working very well from IPython Notebook. ### Does this PR introduce _any_ user-facing change? No API changes, but the user-facing error message from IPython Notebook will be improved: **Before** <img width="1124" alt="Screenshot 2024-06-18 at 5 15 56 PM" src="https://github.com/apache/spark/assets/44108233/3e3aee2c-5bb0-4858-b392-e845b7280d31"> **After** <img width="1163" alt="Screenshot 2024-06-19 at 8 45 05 AM" src="https://github.com/apache/spark/assets/44108233/81741d15-cac9-41e7-815a-5d84f1176c73"> **NOTE:** This also works when command is executed across multiple cells: <img width="1175" alt="Screenshot 2024-06-19 at 8 42 29 AM" src="https://github.com/apache/spark/assets/44108233/d65fbf79-d621-4ae0-b220-2f7923cc3666"> ### How was this patch tested? Manually tested with IPython Notebook. ### Was this patch authored or co-authored using generative AI tooling? No Closes #47009 from itholic/error_context_on_notebook. Authored-by: Haejoon Lee <haejoon.lee@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
1 parent e972dae commit 88cc153

File tree

1 file changed

+23
-2
lines changed

1 file changed

+23
-2
lines changed

python/pyspark/errors/utils.py

Lines changed: 23 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121
import os
2222
import threading
2323
from typing import Any, Callable, Dict, Match, TypeVar, Type, Optional, TYPE_CHECKING
24+
import pyspark
2425
from pyspark.errors.error_classes import ERROR_CLASSES_MAP
2526

2627
if TYPE_CHECKING:
@@ -164,9 +165,29 @@ def _capture_call_site(spark_session: "SparkSession", depth: int) -> str:
164165
The call site information is used to enhance error messages with the exact location
165166
in the user code that led to the error.
166167
"""
167-
stack = list(reversed(inspect.stack()))
168+
# Filtering out PySpark code and keeping user code only
169+
pyspark_root = os.path.dirname(pyspark.__file__)
170+
stack = [
171+
frame_info for frame_info in inspect.stack() if pyspark_root not in frame_info.filename
172+
]
173+
168174
selected_frames = stack[:depth]
169-
call_sites = [f"{frame.filename}:{frame.lineno}" for frame in selected_frames]
175+
176+
# We try import here since IPython is not a required dependency
177+
try:
178+
from IPython import get_ipython
179+
180+
ipython = get_ipython()
181+
except ImportError:
182+
ipython = None
183+
184+
# Identifying the cell is useful when the error is generated from IPython Notebook
185+
if ipython:
186+
call_sites = [
187+
f"line {frame.lineno} in cell [{ipython.execution_count}]" for frame in selected_frames
188+
]
189+
else:
190+
call_sites = [f"{frame.filename}:{frame.lineno}" for frame in selected_frames]
170191
call_sites_str = "\n".join(call_sites)
171192

172193
return call_sites_str

0 commit comments

Comments
 (0)