Skip to content

AttributeError: 'ScalaMonitor' object has no attribute 'comm' #7

@nlspoon

Description

@nlspoon

Summary of Issue

Note - Related container log error: ERROR JupyterSparkMonitorListener: Exception sending socket message:

Hello! We are running a Jupyterhub service that spawns Kubernetes pods for single user servers. We also run our jupyterhub proxy as a separate service. We've been experiencing a very persistent, but flaky, issue with the Sparkmonitor extension. When creating a spark session, I am seeing the following error AttributeError: 'ScalaMonitor' object has no attribute 'comm'. However, this does not happen consistently. I oftentimes am able to create a session where the monitor will work but then if I restart the kernel, the monitor is broken with the above error (or vice versa).

For example, on my first attempt at creating a spark session, the extension works fine. Container log:

INFO:SparkMonitorKernel:Starting Kernel Extension
INFO:SparkMonitorKernel:Socket Listening on port 59845
INFO:SparkMonitorKernel:Starting socket thread, going to accept
INFO:SparkMonitorKernel:59845
INFO:SparkMonitorKernel:Adding jar from /work/venv_core_ml_kernel/lib/python3.7/site-packages/sparkmonitor/listener_2.12.jar 
[I 2021-12-21 23:45:52.892 SingleUserNotebookApp log:189] 101 GET /user/username/api/kernels/e5800417-4831-4bb4-9eec-85d7fb16b388/channels?session_id=aca66ed3-e383-4dfd-bd21-eae11e96ccb2 (username@36.72.215.148) 1877.03ms
[I 2021-12-21 23:45:53.002 SingleUserNotebookApp log:189] 101 GET /user/username/api/kernels/e5800417-4831-4bb4-9eec-85d7fb16b388/channels?session_id=26868351-57cd-43c6-b14b-c85bd1842316 (username@36.72.212.123) 2.18ms
[I 2021-12-21 23:45:53.147 SingleUserNotebookApp log:189] 101 GET /user/username/api/kernels/e5800417-4831-4bb4-9eec-85d7fb16b388/channels?session_id=0ccb7d08-e61b-44e6-a08c-57a36d2797ed (username@36.72.215.148) 2.12ms
[I 2021-12-21 23:45:55.711 SingleUserNotebookApp log:189] 200 GET /user/username/api/terminals?1640130355597 (username@36.72.212.123) 1.13ms
[I 2021-12-21 23:45:56.253 SingleUserNotebookApp log:189] 200 GET /user/username/api/sessions?1640130356141 (username@36.72.212.123) 1.09ms
[I 2021-12-21 23:45:56.254 SingleUserNotebookApp log:189] 200 GET /user/username/api/kernels?1640130356142 (username@36.72.215.148) 1.01ms
INFO:SparkMonitorKernel:SparkMonitor comm opened from frontend.
INFO:SparkMonitorKernel:SparkMonitor comm opened from frontend.
[I 2021-12-21 23:46:03.074 SingleUserNotebookApp log:189] 200 GET /user/username/api/contents?content=1&1640130360080 (username@36.72.215.148) 2874.67ms
[I 2021-12-21 23:46:05.820 SingleUserNotebookApp log:189] 200 GET /user/username/api/terminals?1640130365706 (username@36.72.215.148) 1.36ms
[I 2021-12-21 23:46:05.981 SingleUserNotebookApp log:189] 200 GET /user/username/api/kernelspecs?1640130365868 (username@36.72.212.123) 2.78ms
[I 2021-12-21 23:46:07.068 SingleUserNotebookApp log:189] 200 GET /user/username/api/sessions?1640130366954 (username@36.72.212.123) 1.18ms
[I 2021-12-21 23:46:07.070 SingleUserNotebookApp log:189] 200 GET /user/username/api/kernels?1640130366956 (username@36.72.215.148) 1.06ms
[I 2021-12-21 23:46:16.011 SingleUserNotebookApp log:189] 200 GET /user/username/api/contents?content=1&1640130373104 (username@36.72.212.123) 2795.89ms
[I 2021-12-21 23:46:16.014 SingleUserNotebookApp log:189] 200 GET /user/username/api/terminals?1640130375820 (username@36.72.212.123) 0.98ms
[I 2021-12-21 23:46:17.181 SingleUserNotebookApp log:189] 200 GET /user/username/api/sessions?1640130377069 (username@36.72.215.148) 1.22ms
[I 2021-12-21 23:46:17.182 SingleUserNotebookApp log:189] 200 GET /user/username/api/kernels?1640130377071 (username@36.72.212.123) 0.90ms
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
I1221 23:46:20.384598   242 sched.cpp:232] Version: 1.7.2
I1221 23:46:20.385978   225 sched.cpp:336] New master detected at master@36.72.23.53:5050
I1221 23:46:20.386255   225 sched.cpp:401] Authenticating with master master@36.72.23.53:5050
I1221 23:46:20.386270   225 sched.cpp:408] Using default CRAM-MD5 authenticatee
I1221 23:46:20.386425   234 authenticatee.cpp:97] Initializing client SASL
I1221 23:46:20.387233   234 authenticatee.cpp:121] Creating new client SASL connection
W1221 23:46:20.397209   241 process.cpp:838] Failed to recv on socket 437 to peer '36.72.23.53:34312': Decoder error
I1221 23:46:20.398273   229 authenticatee.cpp:213] Received SASL authentication mechanisms: CRAM-MD5
I1221 23:46:20.398306   229 authenticatee.cpp:239] Attempting to authenticate with mechanism 'CRAM-MD5'
I1221 23:46:20.398959   238 authenticatee.cpp:259] Received SASL authentication step
I1221 23:46:20.399524   239 authenticatee.cpp:299] Authentication success
I1221 23:46:20.399653   236 sched.cpp:513] Successfully authenticated with master master@36.72.23.53:5050
I1221 23:46:20.401082   230 sched.cpp:744] Framework registered with 195f7369-64e5-4bc5-a8ed-8b576994450a-11035
21/12/21 23:46:20 WARN MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
INFO:SparkMonitorKernel:Client Connected ('127.0.0.1', 35478)
[I 2021-12-21 23:46:26.141 SingleUserNotebookApp log:189] 200 GET /user/username/api/terminals?1640130386026 (username@36.72.215.148) 1.00ms
[I 2021-12-21 23:46:28.410 SingleUserNotebookApp log:189] 200 GET /user/username/api/contents?content=1&1640130386068 (username@36.72.212.123) 2230.40ms
[I 2021-12-21 23:46:28.416 SingleUserNotebookApp log:189] 200 GET /user/username/api/sessions?1640130387180 (username@36.72.212.123) 3.12ms
[I 2021-12-21 23:46:28.417 SingleUserNotebookApp log:189] 200 GET /user/username/api/kernels?1640130387182 (username@36.72.215.148) 3.31ms
[I 2021-12-21 23:46:39.097 SingleUserNotebookApp log:189] 200 GET /user/username/api/contents?content=1&1640130396071 (username@36.72.215.148) 2914.93ms
[I 2021-12-21 23:46:39.100 SingleUserNotebookApp log:189] 200 GET /user/username/api/terminals?1640130396136 (username@36.72.212.123) 1.93ms
[I 2021-12-21 23:46:39.101 SingleUserNotebookApp log:189] 200 GET /user/username/api/sessions?1640130398473 (username@36.72.212.123) 2.26ms
[I 2021-12-21 23:46:39.101 SingleUserNotebookApp log:189] 200 GET /user/username/api/kernels?1640130398474 (username@36.72.215.148) 2.46ms
/source/virtualenv_run_jupyter/lib/python3.7/site-packages/jupyter_client/manager.py:358: FutureWarning: Method cleanup(connection_file=True) is deprecated, use cleanup_resources(restart=False).
  FutureWarning)
[I 2021-12-21 23:46:39.217 SingleUserNotebookApp multikernelmanager:534] Kernel restarted: e5800417-4831-4bb4-9eec-85d7fb16b388

But if I restart the kernel and try again, the extension fails with AttributeError: 'ScalaMonitor' object has no attribute 'comm' (Note: it does not consistently follow this pattern of first time it works, second time it fails. Often the first session will fail and then it works after a kernel restart. Or it will fail three times in a row then work.)

INFO:SparkMonitorKernel:Client Connected ('127.0.0.1', 35472)
Exception in thread Thread-8:
Traceback (most recent call last):
  File "/usr/lib/python3.7/threading.py", line 926, in _bootstrap_inner
    self.run()
  File "/work/venv_core_ml_kernel/lib/python3.7/site-packages/sparkmonitor/kernelextension.py", line 126, in run
    self.onrecv(msg)
  File "/work/venv_core_ml_kernel/lib/python3.7/site-packages/sparkmonitor/kernelextension.py", line 145, in onrecv
    'msg': msg
  File "/work/venv_core_ml_kernel/lib/python3.7/site-packages/sparkmonitor/kernelextension.py", line 223, in sendToFrontEnd
    monitor.send(msg)
  File "/work/venv_core_ml_kernel/lib/python3.7/site-packages/sparkmonitor/kernelextension.py", line 57, in send
    self.comm.send(msg)
AttributeError: 'ScalaMonitor' object has no attribute 'comm'

Container log on failure:

[I 2021-12-21 23:46:39.219 SingleUserNotebookApp log:189] 200 POST /user/username/api/kernels/e5800417-4831-4bb4-9eec-85d7fb16b388/restart?1640130391290 (username@36.72.215.148) 7817.75ms
[I 2021-12-21 23:46:39.388 SingleUserNotebookApp log:189] 101 GET /user/username/api/kernels/e5800417-4831-4bb4-9eec-85d7fb16b388/channels?session_id=26868351-57cd-43c6-b14b-c85bd1842316 (username@36.72.215.148) 2.70ms
INFO:SparkMonitorKernel:Starting Kernel Extension
INFO:SparkMonitorKernel:Socket Listening on port 53939
INFO:SparkMonitorKernel:Starting socket thread, going to accept
INFO:SparkMonitorKernel:53939
INFO:SparkMonitorKernel:Adding jar from /work/venv_core_ml_kernel/lib/python3.7/site-packages/sparkmonitor/listener_2.12.jar 
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
I1221 23:46:45.923640   389 sched.cpp:232] Version: 1.7.2
I1221 23:46:45.925024   378 sched.cpp:336] New master detected at master@36.72.23.53:5050
I1221 23:46:45.925308   378 sched.cpp:401] Authenticating with master master@36.72.23.53:5050
I1221 23:46:45.925323   378 sched.cpp:408] Using default CRAM-MD5 authenticatee
I1221 23:46:45.925534   382 authenticatee.cpp:97] Initializing client SASL
I1221 23:46:45.926290   382 authenticatee.cpp:121] Creating new client SASL connection
W1221 23:46:45.928249   388 process.cpp:838] Failed to recv on socket 437 to peer '36.72.23.53:37450': Decoder error
I1221 23:46:45.929303   375 authenticatee.cpp:213] Received SASL authentication mechanisms: CRAM-MD5
I1221 23:46:45.929327   375 authenticatee.cpp:239] Attempting to authenticate with mechanism 'CRAM-MD5'
I1221 23:46:45.929982   383 authenticatee.cpp:259] Received SASL authentication step
I1221 23:46:45.930640   386 authenticatee.cpp:299] Authentication success
I1221 23:46:45.930742   385 sched.cpp:513] Successfully authenticated with master master@36.72.23.53:5050
I1221 23:46:45.933359   377 sched.cpp:744] Framework registered with 195f7369-64e5-4bc5-a8ed-8b576994450a-11038
21/12/21 23:46:46 WARN MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
INFO:SparkMonitorKernel:Client Connected ('127.0.0.1', 35472)
INFO:SparkMonitorKernel:SparkMonitor comm opened from frontend.
[I 2021-12-21 23:46:49.219 SingleUserNotebookApp log:189] 200 GET /user/username/api/terminals?1640130409107 (username@36.72.215.148) 1.17ms
[I 2021-12-21 23:46:49.220 SingleUserNotebookApp log:189] 200 GET /user/username/api/sessions?1640130409109 (username@36.72.215.148) 1.04ms
[I 2021-12-21 23:46:49.222 SingleUserNotebookApp log:189] 200 GET /user/username/api/kernels?1640130409111 (username@36.72.212.123) 0.99ms
21/12/21 23:46:51 ERROR JupyterSparkMonitorListener: Exception sending socket message: 
java.net.SocketException: Broken pipe (Write failed)
        at java.net.SocketOutputStream.socketWrite0(Native Method)
        at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
        at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
        at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
        at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291)
        at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:295)
        at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141)
        at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229)
        at sparkmonitor.listener.JupyterSparkMonitorListener.send(CustomListener.scala:54)
        at sparkmonitor.listener.JupyterSparkMonitorListener.onExecutorAdded(CustomListener.scala:652)
        at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:63)
        at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
        at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
        at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
        at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:115)
        at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:99)
        at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105)
        at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105)
        at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
        at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100)
        at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96)
        at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1319)
        at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96)

I'm experiencing this issue even with a minimal configuration such as this:

from pyspark.sql import SparkSession
spark = SparkSession.builder\
        .config('spark.extraListeners', 'sparkmonitor.listener.JupyterSparkMonitorListener')\
        .config('spark.driver.extraClassPath', '/source/virtualenv_run_jupyter/lib/python3.7/site-packages/sparkmonitor/listener_2.12.jar:/opt/spark/extra_jars/guava-27.0.1-jre.jar')\
        .getOrCreate()

Additional Details

We are using the following Jupyter related packages/dependencies in our containers:

ipykernel==5.3.4
ipython==7.20.0
jupyter-client==6.1.5
jupyter-core==4.7.1
jupyter-packaging==0.10.2
jupyter-server==1.7.0
jupyter-server-proxy==1.6.0
jupyter-telemetry==0.1.0
jupyterhub==1.4.2
jupyterlab==3.0.16
jupyterlab-pygments==0.1.2
jupyterlab-server==2.5.2
jupyterlab-widgets==1.0.0
nbconvert==5.5.0
nbformat==5.1.3
traitlets==4.3.3

Any assistance with troubleshooting this issue would be greatly appreciated. A lot of our users really like the Sparkmonitor! Please let me know if I can provide any other details about our setup that would be helpful. Thank you!

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions