Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in training LSTM model #9097

Open
gdg1212 opened this issue Oct 7, 2023 · 3 comments
Open

Error in training LSTM model #9097

gdg1212 opened this issue Oct 7, 2023 · 3 comments
Assignees

Comments

@gdg1212
Copy link

gdg1212 commented Oct 7, 2023

val model = Sequential[Float]()
  .add(LSTM(inputSize = 3, hiddenSize = 50))
  .add(Linear(inputSize = 50, outputSize = 10))

// .add(LogSoftMax())

val optimizer = Optimizer(model = model,
  sampleRDD = data,
  criterion = MSECriterion[Float](),
  batchSize = 10)
optimizer
  .setOptimMethod(new Adam(0.01))
  .setEndWhen(Trigger.maxEpoch(10))
  .optimize()

data的格式是data: RDD[Sample[Float]]

训练模型报错
java.lang.ClassCastException: com.intel.analytics.bigdl.tensor.DenseTensor cannot be cast to com.intel.analytics.bigdl.utils.Table
at com.intel.analytics.bigdl.nn.Cell.updateOutput(Cell.scala:48)
at com.intel.analytics.bigdl.nn.abstractnn.AbstractModule.forward(AbstractModule.scala:282)
at com.intel.analytics.bigdl.nn.Sequential.updateOutput(Sequential.scala:39)
at com.intel.analytics.bigdl.nn.abstractnn.AbstractModule.forward(AbstractModule.scala:282)

23/10/07 18:00:14 ERROR [Executor task launch worker for task 4.0 in stage 14.0 (TID 26)] Executor: Exception in task 4.0 in stage 14.0 (TID 26)
com.intel.analytics.bigdl.utils.LayerException: null
at com.intel.analytics.bigdl.nn.abstractnn.AbstractModule.forward(AbstractModule.scala:288) ~[bigdl-SPARK_3.1-0.13.0.jar:?]
at com.intel.analytics.bigdl.nn.Sequential.updateOutput(Sequential.scala:39) ~[bigdl-SPARK_3.1-0.13.0.jar:?]
at com.intel.analytics.bigdl.nn.abstractnn.AbstractModule.forward(AbstractModule.scala:282) ~[bigdl-SPARK_3.1-0.13.0.jar:?]

@gdg1212
Copy link
Author

gdg1212 commented Oct 8, 2023

Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 14.0 failed 1 times, most recent failure: Lost task 3.0 in stage 14.0 (TID 25) (master-1-1.c-52c86fc1cf6fe4b8.ap-southeast-5.emr.aliyuncs.com executor driver): Layer info: Sequential[929196ee]{
[input -> (1) -> (2) -> output]
(1): LSTM(3, 50, 0.0)
(2): Linear[ed0e8842](50 -> 10)
}/LSTM(3, 50, 0.0)
java.lang.ClassCastException: com.intel.analytics.bigdl.tensor.DenseTensor cannot be cast to com.intel.analytics.bigdl.utils.Table
at com.intel.analytics.bigdl.nn.Cell.updateOutput(Cell.scala:48)
at com.intel.analytics.bigdl.nn.abstractnn.AbstractModule.forward(AbstractModule.scala:282)
at com.intel.analytics.bigdl.nn.Sequential.updateOutput(Sequential.scala:39)
at com.intel.analytics.bigdl.nn.abstractnn.AbstractModule.forward(AbstractModule.scala:282)
at com.intel.analytics.bigdl.optim.DistriOptimizer$.$anonfun$optimize$8(DistriOptimizer.scala:269)
at scala.runtime.java8.JFunction0$mcI$sp.apply(JFunction0$mcI$sp.java:23)
at com.intel.analytics.bigdl.utils.ThreadPool$$anon$4.call(ThreadPool.scala:160)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)

    at com.intel.analytics.bigdl.nn.abstractnn.AbstractModule.forward(AbstractModule.scala:288)
    at com.intel.analytics.bigdl.nn.Sequential.updateOutput(Sequential.scala:39)
    at com.intel.analytics.bigdl.nn.abstractnn.AbstractModule.forward(AbstractModule.scala:282)
    at com.intel.analytics.bigdl.optim.DistriOptimizer$.$anonfun$optimize$8(DistriOptimizer.scala:269)
    at scala.runtime.java8.JFunction0$mcI$sp.apply(JFunction0$mcI$sp.java:23)
    at com.intel.analytics.bigdl.utils.ThreadPool$$anon$4.call(ThreadPool.scala:160)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750)

Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2712)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2648)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2647)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2647)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1189)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1189)
at scala.Option.foreach(Option.scala:407)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1189)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2900)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2842)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2831)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:959)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2228)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2323)
at org.apache.spark.rdd.RDD.$anonfun$reduce$1(RDD.scala:1111)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:406)
at org.apache.spark.rdd.RDD.reduce(RDD.scala:1093)
at com.intel.analytics.bigdl.optim.DistriOptimizer$.optimize(DistriOptimizer.scala:353)
at com.intel.analytics.bigdl.optim.DistriOptimizer.optimize(DistriOptimizer.scala:908)
at LSTMDemo2$.main(LSTMDemo2.scala:112)
at LSTMDemo2.main(LSTMDemo2.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: Layer info: Sequential[929196ee]{
[input -> (1) -> (2) -> output]
(1): LSTM(3, 50, 0.0)
(2): Linear[ed0e8842](50 -> 10)
}/LSTM(3, 50, 0.0)
java.lang.ClassCastException: com.intel.analytics.bigdl.tensor.DenseTensor cannot be cast to com.intel.analytics.bigdl.utils.Table
at com.intel.analytics.bigdl.nn.Cell.updateOutput(Cell.scala:48)
at com.intel.analytics.bigdl.nn.abstractnn.AbstractModule.forward(AbstractModule.scala:282)
at com.intel.analytics.bigdl.nn.Sequential.updateOutput(Sequential.scala:39)
at com.intel.analytics.bigdl.nn.abstractnn.AbstractModule.forward(AbstractModule.scala:282)
at com.intel.analytics.bigdl.optim.DistriOptimizer$.$anonfun$optimize$8(DistriOptimizer.scala:269)
at scala.runtime.java8.JFunction0$mcI$sp.apply(JFunction0$mcI$sp.java:23)
at com.intel.analytics.bigdl.utils.ThreadPool$$anon$4.call(ThreadPool.scala:160)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)

    at com.intel.analytics.bigdl.nn.abstractnn.AbstractModule.forward(AbstractModule.scala:288)
    at com.intel.analytics.bigdl.nn.Sequential.updateOutput(Sequential.scala:39)
    at com.intel.analytics.bigdl.nn.abstractnn.AbstractModule.forward(AbstractModule.scala:282)
    at com.intel.analytics.bigdl.optim.DistriOptimizer$.$anonfun$optimize$8(DistriOptimizer.scala:269)
    at scala.runtime.java8.JFunction0$mcI$sp.apply(JFunction0$mcI$sp.java:23)
    at com.intel.analytics.bigdl.utils.ThreadPool$$anon$4.call(ThreadPool.scala:160)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750)

@gdg1212
Copy link
Author

gdg1212 commented Oct 8, 2023

optimizer
  .setOptimMethod(new Adam(0.01))
  .setEndWhen(Trigger.maxEpoch(10))
  .optimize()

在setEndWhen(Trigger.maxEpoch(10))这一行报错

@qiuxin2012
Copy link
Contributor

LSTM should be add to a Recurrent, your model definition is wrong.
You can see the model definition for help in this example https://github.com/intel-analytics/BigDL/tree/main/scala/dllib/src/main/scala/com/intel/analytics/bigdl/dllib/example/languagemodel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants