Open
Description
I am writing the map function in spark to parse xml within the log. But I got the NotSerializableException. I cannot figure it out the reason. The trace stack is followed. How to walk around it? Anyone has suggestion?
org.apache.spark.SparkException: Task not serializable
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:345)
at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:335)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:159)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2292)
at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:371)
at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:370)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.RDD.map(RDD.scala:370)
at parse(<console>:33)
... 49 elided
Caused by: java.io.NotSerializableException: scala.xml.NodeSeq$$anon$1
Serialization stack:
- object not serializable (class: scala.xml.NodeSeq$$anon$1, value: <ns18:userID>4536000170315902</ns18:userID>)
The way I am using is
rows.mapPartitions(rows => {
val XMLParser = scala.xml.XML
rows.map(row => {
val xmlContent = sliceLogHeader(row)
val xmlDom = XMLParser.loadString(xmlContent)
val headerDOM = xmlDom\ "header"
val userID = (headerDOM \"userID").text
val clientSessionID = (headerDOM \"clientSessionID").text
Account(userID, clientSessionID)
})
Metadata
Metadata
Assignees
Labels
No labels