Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why optimize throws java.lang.ArithmeticException(divide by zero) #2278

Open
Justontheway opened this issue Feb 6, 2018 · 12 comments
Open
Assignees

Comments

@Justontheway
Copy link

Justontheway commented Feb 6, 2018

Env Info

BigDL - v0.4.0
Spark - 1.6.3

Error Info

18/02/02 16:16:21 WARN TaskSetManager: Lost task 37.0 in stage 128.0 (TID 4699, node17.bigdata): java.lang.ArithmeticException: / by zero
	at com.intel.analytics.bigdl.dataset.CachedDistriDataSet$$anonfun$data$2$$anon$2.next(DataSet.scala:277)
	at com.intel.analytics.bigdl.dataset.SampleToMiniBatch$$anon$2.next(Transformer.scala:331)
	at com.intel.analytics.bigdl.dataset.SampleToMiniBatch$$anon$2.next(Transformer.scala:323)
	at com.intel.analytics.bigdl.optim.DistriOptimizer$$anonfun$9.apply(DistriOptimizer.scala:195)
	at com.intel.analytics.bigdl.optim.DistriOptimizer$$anonfun$9.apply(DistriOptimizer.scala:186)
	at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
	at org.apache.spark.scheduler.Task.run(Task.scala:89)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
@qiuxin2012
Copy link
Contributor

The error is thrown by CachedDistriDataSet, when SampleMiniBatch call prev.next()

        override def next(): T = {
          val i = _offset.getAndIncrement()
          if (_train) {
            localData(indexes(i % localData.length))
          } else {
            if (i < localData.length) {
              localData(indexes(i))
            } else {
              null.asInstanceOf[T]
            }
          }

Seems localData.length is zero.

@Justontheway May I ask how did you get this error message? Which example are you running?

@Justontheway
Copy link
Author

The original data format like this

day x y value
1 2 3 1.4
1 3 2 2.1
2 1 2 1.1

@qiuxin2012
Copy link
Contributor

@Justontheway
Could you show me your codes about how to create RDD[Sample]?
Seems your data's distribution is uneven, some of your partition is empty.

@Justontheway
Copy link
Author

Justontheway commented Feb 7, 2018

And then I do the following transformations

  1. read this into a Spark DataFrame.
  2. groupBy day and collect_list("x") and collect_list("y") and collect_list("value") in agg
  3. map the DataFrame to form (day, Tensor.sparse(Array(xArray, yArray), tensorShape))
  4. use several rows to form (day, Array(SparseTensor1, SparseTensor2, SparseTensor3)
  5. in a map do the following
// create a new tensor
val input = Tensor(tensorSize, tensorHeight, tensorWidth)
// then 
Range(1, sparseTensorArray.size+1).foreach(ndimTensor => input(ndimTensor) = Tensor.dense(sparseTensorArray(ndimTensor-1)))
  1. build the model and optimize.
    model like this

I just want a spark version like [STRes](https://github.com/lucktroy/DeepST).

Clear?

@qiuxin2012
Copy link
Contributor

@Justontheway
I just go through the DeepST. The dataset looks relatively small, DenseTensor is enough for you, I think. Why not just use an DenseTensor to store the data?

@Justontheway
Copy link
Author

I just use the similar model, but dataset is different. The real dataset is about N x 600 x 800, and non-zero element is about 100 more or less for each 2D matrix.

@qiuxin2012
Copy link
Contributor

qiuxin2012 commented Feb 8, 2018

@Justontheway The non-zero elements will increase kW * kH maximum times after each conv. The internal data will become dense and dense.
So is it OK to provide an new SpatialConvolution which conv a SparseTensor to a DenseTensor? Like SparseLinear.

@Justontheway
Copy link
Author

Justontheway commented Feb 9, 2018

@qiuxin2012 SpatialConvolution needs 3D tensor, while SparseTensor only support 2D. Also, I have transformed SparseTensor to DenseTensor before trainning in order to get 3D tensor.

Maybe it is great to pvovide something like np.stack tf.stack scipy.sparse.coo_matrix.toarray.

@qiuxin2012
Copy link
Contributor

@Justontheway I'm so sorry, I forget this issue after Spring Festival. How is everything on going?

@Justontheway
Copy link
Author

@qiuxin2012 go back to tensorflow ~ ~ ~

@Adamage
Copy link

Adamage commented May 13, 2021

@Justontheway I had a similar error, was solved by increasing dataset size and batch size. No idea why

@qiuxin2012
Copy link
Contributor

@Justontheway I had a similar error, was solved by increasing dataset size and batch size. No idea why

I think it's the size of dataset is too small, as bigdl will split the dataset(random split, don't ensure perfectly uniform size) into num-executor paritions, one of the partition's lenght maybe empty. After you increase the dataset size, the empty parition has data.

Le-Zheng added a commit to Le-Zheng/BigDL that referenced this issue Oct 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants