why optimize throws java.lang.ArithmeticException(divide by zero) #2278

Justontheway · 2018-02-06T09:27:28Z

Env Info

BigDL - v0.4.0
Spark - 1.6.3

Error Info

18/02/02 16:16:21 WARN TaskSetManager: Lost task 37.0 in stage 128.0 (TID 4699, node17.bigdata): java.lang.ArithmeticException: / by zero
	at com.intel.analytics.bigdl.dataset.CachedDistriDataSet$$anonfun$data$2$$anon$2.next(DataSet.scala:277)
	at com.intel.analytics.bigdl.dataset.SampleToMiniBatch$$anon$2.next(Transformer.scala:331)
	at com.intel.analytics.bigdl.dataset.SampleToMiniBatch$$anon$2.next(Transformer.scala:323)
	at com.intel.analytics.bigdl.optim.DistriOptimizer$$anonfun$9.apply(DistriOptimizer.scala:195)
	at com.intel.analytics.bigdl.optim.DistriOptimizer$$anonfun$9.apply(DistriOptimizer.scala:186)
	at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
	at org.apache.spark.scheduler.Task.run(Task.scala:89)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

The text was updated successfully, but these errors were encountered:

qiuxin2012 · 2018-02-07T01:32:30Z

The error is thrown by CachedDistriDataSet, when SampleMiniBatch call prev.next()

        override def next(): T = {
          val i = _offset.getAndIncrement()
          if (_train) {
            localData(indexes(i % localData.length))
          } else {
            if (i < localData.length) {
              localData(indexes(i))
            } else {
              null.asInstanceOf[T]
            }
          }

Seems localData.length is zero.

@Justontheway May I ask how did you get this error message? Which example are you running?

Justontheway · 2018-02-07T01:59:37Z

The original data format like this

day	x	y	value
1	2	3	1.4
1	3	2	2.1
2	1	2	1.1

qiuxin2012 · 2018-02-07T02:15:17Z

@Justontheway
Could you show me your codes about how to create RDD[Sample]?
Seems your data's distribution is uneven, some of your partition is empty.

Justontheway · 2018-02-07T02:23:03Z

And then I do the following transformations

read this into a Spark DataFrame.
groupBy day and collect_list("x") and collect_list("y") and collect_list("value") in agg
map the DataFrame to form (day, Tensor.sparse(Array(xArray, yArray), tensorShape))
use several rows to form (day, Array(SparseTensor1, SparseTensor2, SparseTensor3)
in a map do the following

// create a new tensor
val input = Tensor(tensorSize, tensorHeight, tensorWidth)
// then 
Range(1, sparseTensorArray.size+1).foreach(ndimTensor => input(ndimTensor) = Tensor.dense(sparseTensorArray(ndimTensor-1)))

build the model and optimize.
model like this

I just want a spark version like [STRes](https://github.com/lucktroy/DeepST).

Clear?

qiuxin2012 · 2018-02-07T05:09:11Z

@Justontheway
I just go through the DeepST. The dataset looks relatively small, DenseTensor is enough for you, I think. Why not just use an DenseTensor to store the data?

Justontheway · 2018-02-07T06:15:10Z

I just use the similar model, but dataset is different. The real dataset is about N x 600 x 800, and non-zero element is about 100 more or less for each 2D matrix.

qiuxin2012 · 2018-02-08T03:22:40Z

@Justontheway The non-zero elements will increase kW * kH maximum times after each conv. The internal data will become dense and dense.
So is it OK to provide an new SpatialConvolution which conv a SparseTensor to a DenseTensor? Like SparseLinear.

Justontheway · 2018-02-09T04:50:45Z

@qiuxin2012 SpatialConvolution needs 3D tensor, while SparseTensor only support 2D. Also, I have transformed SparseTensor to DenseTensor before trainning in order to get 3D tensor.

Maybe it is great to pvovide something like np.stack tf.stack scipy.sparse.coo_matrix.toarray.

qiuxin2012 · 2018-03-27T08:04:00Z

@Justontheway I'm so sorry, I forget this issue after Spring Festival. How is everything on going?

Justontheway · 2018-10-10T09:45:10Z

@qiuxin2012 go back to tensorflow ~ ~ ~

Adamage · 2021-05-13T12:18:02Z

@Justontheway I had a similar error, was solved by increasing dataset size and batch size. No idea why

qiuxin2012 · 2021-05-14T03:24:29Z

@Justontheway I had a similar error, was solved by increasing dataset size and batch size. No idea why

I think it's the size of dataset is too small, as bigdl will split the dataset(random split, don't ensure perfectly uniform size) into num-executor paritions, one of the partition's lenght maybe empty. After you increase the dataset size, the empty parition has data.

yiheng assigned qiuxin2012 Feb 6, 2018

Le-Zheng added a commit to Le-Zheng/BigDL that referenced this issue Oct 20, 2021

expose k8s doc (intel-analytics#2278)

01ae80e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

why optimize throws java.lang.ArithmeticException(divide by zero) #2278

why optimize throws java.lang.ArithmeticException(divide by zero) #2278

Justontheway commented Feb 6, 2018 •

edited

Loading

qiuxin2012 commented Feb 7, 2018

Justontheway commented Feb 7, 2018

qiuxin2012 commented Feb 7, 2018

Justontheway commented Feb 7, 2018 •

edited

Loading

qiuxin2012 commented Feb 7, 2018

Justontheway commented Feb 7, 2018

qiuxin2012 commented Feb 8, 2018 •

edited

Loading

Justontheway commented Feb 9, 2018 •

edited

Loading

qiuxin2012 commented Mar 27, 2018

Justontheway commented Oct 10, 2018

Adamage commented May 13, 2021

qiuxin2012 commented May 14, 2021

why optimize throws java.lang.ArithmeticException(divide by zero) #2278

why optimize throws java.lang.ArithmeticException(divide by zero) #2278

Comments

Justontheway commented Feb 6, 2018 • edited Loading

Env Info

Error Info

qiuxin2012 commented Feb 7, 2018

Justontheway commented Feb 7, 2018

qiuxin2012 commented Feb 7, 2018

Justontheway commented Feb 7, 2018 • edited Loading

qiuxin2012 commented Feb 7, 2018

Justontheway commented Feb 7, 2018

qiuxin2012 commented Feb 8, 2018 • edited Loading

Justontheway commented Feb 9, 2018 • edited Loading

qiuxin2012 commented Mar 27, 2018

Justontheway commented Oct 10, 2018

Adamage commented May 13, 2021

qiuxin2012 commented May 14, 2021

Justontheway commented Feb 6, 2018 •

edited

Loading

Justontheway commented Feb 7, 2018 •

edited

Loading

qiuxin2012 commented Feb 8, 2018 •

edited

Loading

Justontheway commented Feb 9, 2018 •

edited

Loading