Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

java.lang.IllegalArgumentException: requirement failed: self element number... Error during saving checkpoint #2978

Closed
LoannGio opened this issue Dec 4, 2019 · 6 comments · Fixed by #2981

Comments

@LoannGio
Copy link

LoannGio commented Dec 4, 2019

Hello,
I'm trying to train/validate a neural network with AnalyticsZoo(0.6)/BigDL(0.9) as follows

symbol_model = build_symbol_model(params)
train_symbol_rdd = create_bigdl_samples(sc, train_path, ...) #returns RDD<Sample>
val_symbol_rdd = create_bigdl_samples(sc, val_path, ...) # returns RDD<Sample>

symbol_optimizer = Optimizer(model=symbol_model, 
                    training_rdd=train_symbol_rdd, 
                    optim_method=Adam(),
                    end_trigger=MaxEpoch(5),
                    criterion=MSECriterion(),
                    batch_size=batchSize,
                    bigdl_type="float")
symbol_optimizer.set_validation(
    val_rdd=val_symbol_rdd,
    batch_size=batchSize,
    trigger=EveryEpoch(),
    val_method=[Loss(MSECriterion())]
)
symbol_model.set_checkpoint(EveryEpoch(), save_model_path , isOverWrite=True)
symbol_model.optimize()

The optimization's training works fine but when come the (first) validation, following error is raised :
java.lang.IllegalArgumentException: requirement failed: self element number(16) is not equal to source element number(32)

I've double checked my RDD of samples creation, seems like there's no trouble there.
I've tried to truncate my validation set so it matches a multiple of batchSize.
It doesn't seem to come from my model since the training works fine.

I have no clue where to look for anymore.
Any help would be appreciated.

Thanks.

Full logs :

2019-12-04 17:19:36 INFO  DistriOptimizer$:791 - caching training rdd ...
2019-12-04 17:19:52 INFO  DistriOptimizer$:629 - Cache thread models...
2019-12-04 17:19:52 INFO  ThreadPool$:95 - Set mkl threads to 1 on thread 652
2019-12-04 17:19:52 INFO  ThreadPool$:95 - Set mkl threads to 1 on thread 652
2019-12-04 17:19:52 INFO  ThreadPool$:95 - Set mkl threads to 1 on thread 652
2019-12-04 17:19:52 INFO  ThreadPool$:95 - Set mkl threads to 1 on thread 652
2019-12-04 17:19:52 INFO  ThreadPool$:95 - Set mkl threads to 1 on thread 652
2019-12-04 17:19:52 INFO  DistriOptimizer$:612 - model thread pool size is 1
2019-12-04 17:19:52 INFO  DistriOptimizer$:631 - Cache thread models... done
2019-12-04 17:19:52 INFO  DistriOptimizer$:148 - Count dataset
2019-12-04 17:19:52 INFO  DistriOptimizer$:152 - Count dataset complete. Time elapsed: 0.128190703s
2019-12-04 17:19:52 INFO  DistriOptimizer$:160 - config  {
	computeThresholdbatchSize: 100
	maxDropPercentage: 0.0
	warmupIterationNum: 200
	isLayerwiseScaled: false
	dropPercentage: 0.0
 }
2019-12-04 17:19:52 INFO  DistriOptimizer$:164 - Shuffle data
2019-12-04 17:19:52 INFO  DistriOptimizer$:167 - Shuffle data complete. Takes 0.012889339s
2019-12-04 17:19:53 INFO  DistriOptimizer$:406 - [Epoch 1 64/18880][Iteration 1][Wall Clock 0.501946693s] Trained 64 records in 0.501946693 seconds. Throughput is 127.50358 records/second. Loss is 0.24601407. 
2019-12-04 17:19:53 INFO  DistriOptimizer$:406 - [Epoch 1 128/18880][Iteration 2][Wall Clock 0.628867876s] Trained 64 records in 0.126921183 seconds. Throughput is 504.24997 records/second. Loss is 0.22389516. 
2019-12-04 17:19:53 INFO  DistriOptimizer$:406 - [Epoch 1 192/18880][Iteration 3][Wall Clock 0.744553932s] Trained 64 records in 0.115686056 seconds. Throughput is 553.2214 records/second. Loss is 0.21530338. 
2019-12-04 17:19:53 INFO  DistriOptimizer$:406 - [Epoch 1 256/18880][Iteration 4][Wall Clock 0.858995715s] Trained 64 records in 0.114441783 seconds. Throughput is 559.2363 records/second. Loss is 0.20601867. 
2019-12-04 17:19:53 INFO  DistriOptimizer$:406 - [Epoch 1 320/18880][Iteration 5][Wall Clock 0.962278502s] Trained 64 records in 0.103282787 seconds. Throughput is 619.65796 records/second. Loss is 0.19575086. 

[...]

2019-12-04 17:20:18 INFO  DistriOptimizer$:406 - [Epoch 1 18880/18880][Iteration 295][Wall Clock 25.264317586s] Trained 64 records in 0.072651999 seconds. Throughput is 880.9118 records/second. Loss is 0.016169826. 
2019-12-04 17:20:18 INFO  DistriOptimizer$:451 - [Epoch 1 18880/18880][Iteration 295][Wall Clock 25.264317586s] Epoch finished. Wall clock time is 25378.094366 ms
2019-12-04 17:20:18 INFO  DistriOptimizer$:111 - [Epoch 1 18880/18880][Iteration 295][Wall Clock 25.264317586s] Validate model...
2019-12-04 17:20:19 INFO  DistriOptimizer$:177 - [Epoch 1 18880/18880][Iteration 295][Wall Clock 25.264317586s] validate model throughput is 239.91006 records/second
2019-12-04 17:20:19 INFO  DistriOptimizer$:180 - [Epoch 1 18880/18880][Iteration 295][Wall Clock 25.264317586s] Loss is (Loss: 46.11404, count: 256, Average Loss: 0.18013297)
Traceback (most recent call last):
  File "/home/toto/Documents/giot/colinware1/app.py", line 67, in <module>
    symbol_optimizer.optimize()
  File "/home/toto/.local/lib/python3.6/site-packages/bigdl/share/lib/bigdl-0.9.0-python-api.zip/bigdl/optim/optimizer.py", line 764, in optimize
  File "/home/toto/.local/lib/python3.6/site-packages/bigdl/share/lib/bigdl-0.9.0-python-api.zip/bigdl/util/common.py", line 634, in callJavaFunc
  File "/home/toto/.local/lib/python3.6/site-packages/pyspark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "/home/toto/.local/lib/python3.6/site-packages/pyspark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o380.optimize.
: java.lang.IllegalArgumentException: requirement failed: self element number(16) is not equal to source element number(32)
	at scala.Predef$.require(Predef.scala:224)
	at com.intel.analytics.bigdl.tensor.DenseTensor$.copy(DenseTensor.scala:2623)
	at com.intel.analytics.bigdl.tensor.DenseTensor.copy(DenseTensor.scala:435)
	at com.intel.analytics.bigdl.nn.abstractnn.AbstractModule.setExtraParameter(AbstractModule.scala:375)
	at com.intel.analytics.bigdl.optim.DistriOptimizer$.getModel(DistriOptimizer.scala:659)
	at com.intel.analytics.bigdl.optim.AbstractOptimizer$$anonfun$checkpoint$1$$anonfun$apply$13.apply(AbstractOptimizer.scala:218)
	at com.intel.analytics.bigdl.optim.AbstractOptimizer$$anonfun$checkpoint$1$$anonfun$apply$13.apply(AbstractOptimizer.scala:216)
	at scala.Option.foreach(Option.scala:257)
	at com.intel.analytics.bigdl.optim.AbstractOptimizer$$anonfun$checkpoint$1.apply(AbstractOptimizer.scala:216)
	at com.intel.analytics.bigdl.optim.AbstractOptimizer$$anonfun$checkpoint$1.apply(AbstractOptimizer.scala:215)
	at scala.Option.foreach(Option.scala:257)
	at com.intel.analytics.bigdl.optim.AbstractOptimizer.checkpoint(AbstractOptimizer.scala:215)
	at com.intel.analytics.bigdl.optim.DistriOptimizer$.optimize(DistriOptimizer.scala:491)
	at com.intel.analytics.bigdl.optim.DistriOptimizer.optimize(DistriOptimizer.scala:881)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)

@LoannGio LoannGio changed the title Error during validation java.lang.IllegalArgumentException: requirement failed: self element number... Error during validation Dec 4, 2019
@i8run
Copy link
Contributor

i8run commented Dec 5, 2019

Hi @LoannGio , it seems not the problem with RDD[Sample] creation. Is there any BatchNormalization in your model?

@LoannGio
Copy link
Author

LoannGio commented Dec 5, 2019

Hi @i8run , thanks for your reply.
Yes, there are. But I don't understand why it would raise such an error.

my model layers (rather classic encoder) :

Input(image32x32x3)
Conv2D
BatchNormalization
MaxPool2D
Conv2D
BatchNormalization
MaxPool2D
Conv2D
BatchNormalization
MaxPool2D
Conv2D
BatchNormalization
MaxPool2D
Reshape
Dense(16)

Which is merged with a decoder before training but this last doesn't have any BatchNormalization.

Edit: after some debugging, the error seems to be raised by the presence of my 2 first BatchNormalizations (not the next ones).

encoder_input (Input)                   (None, 32, 32, 3)         0                                                   
________________________________________________________________________________________________________________________
Convolution2D38f90e9b (Convolution2D)   (None, 32, 32, 16)        448           encoder_input                         
________________________________________________________________________________________________________________________
BatchNormalization24e9e3b5 (BatchNormal (None, 32, 32, 16)        32            Convolution2D38f90e9b                 
________________________________________________________________________________________________________________________
MaxPooling2D25fbedb6 (MaxPooling2D)     (None, 16, 16, 16)        0             BatchNormalization24e9e3b5            
________________________________________________________________________________________________________________________
Convolution2D4addcc55 (Convolution2D)   (None, 16, 16, 8)         1160          MaxPooling2D25fbedb6                  
________________________________________________________________________________________________________________________
BatchNormalization2de838a3 (BatchNormal (None, 16, 16, 8)         16            Convolution2D4addcc55                 
________________________________________________________________________________________________________________________
MaxPooling2D541e609a (MaxPooling2D)     (None, 8, 8, 8)           0             BatchNormalization2de838a3            
________________________________________________________________________________________________________________________

@i8run
Copy link
Contributor

i8run commented Dec 9, 2019

It's a very strange issue. Have you ever set the dim_ordering to tf ? Because your model summary seems the channel is at the last dimension.

The threw exception says the tensor shape not the same, where the tensor is the extra parameters of BatchNormalization/SpatialBatchNormalization. On master, the first bn in model has 16 output channels of runningMean. But the trained model in clients has 32 output channels for runningMean.

The BN should have 4 parameters, weight, bias, runningMean, runningVariance, whose size should be the output channels.

  • If the dim_ordering is th, then the default input format is NCHW, and output channel is the dimension 1 (0 based).
  • If the dim_ordering is tf, then default input format is NHWC, so the output channel is the dimension 3 (0 based).

If possible please take a look at the channel.

@i8run i8run changed the title java.lang.IllegalArgumentException: requirement failed: self element number... Error during validation java.lang.IllegalArgumentException: requirement failed: self element number... Error during saving checkpoint Dec 9, 2019
@i8run
Copy link
Contributor

i8run commented Dec 9, 2019

Hi @LoannGio , I have found the root cause of your case. It's an BigDL bug.

Currently, based on the version you used, (zoo of 0.6, 0.9.0 of BigDL), you should make the model format to NCHW or th, not the tf.

Thanks for your issue. :)

@LoannGio
Copy link
Author

LoannGio commented Dec 9, 2019

Hi @i8run , thanks for you investigation :)
Indeed, I was using dim_ordering="tf". I'll try to switch to th as you advised. I'm surprised this kind of bug only happens during validation in my case.
Again, many thanks for your help

@i8run
Copy link
Contributor

i8run commented Dec 10, 2019

In fact, the validation has completed. It occurs at the saving checkpoint stage. Because of this bug, the runningMean and runningVariance will be changed to wrong shape when do training. So the model saved on the driver will not the same as models in the executors.

Le-Zheng pushed a commit to Le-Zheng/BigDL that referenced this issue Oct 20, 2021
* Fix code blocks indents in .md files

Previously a lot of the code blocks in markdown files were horribly indented with bad white spaces in the beginning of lines. Users can't just select, copy, paste, and run (in the case of python). I have fixed all these, so there is no longer any code block with bad white space at the beginning of the lines.
It would be nice if you could try to make sure in future commits that all code blocks are properly indented inside and have the right amount of white space in the beginning!

* Fix small style issue

* Fix indents

* Fix indent and add \ for multiline commands

Change indent from 3 spaces to 4, and add "\" for multiline bash commands

Co-authored-by: Yifan Zhu <fanzhuyifan@gmail.com>
dding3 pushed a commit to dding3/BigDL that referenced this issue Nov 17, 2021
* add hyperzoo for k8s support (intel-analytics#2140)

* add hyperzoo for k8s support

* format

* format

* format

* format

* run examples on k8s readme (intel-analytics#2163)

* k8s  readme

* fix jdk download issue (intel-analytics#2219)

* add doc for submit jupyter notebook and cluster serving to k8s (intel-analytics#2221)

* add hyperzoo doc

* add hyperzoo doc

* add hyperzoo doc

* add hyperzoo doc

* fix jdk download issue (intel-analytics#2223)

* bump to 0.9s (intel-analytics#2227)

* update jdk download url (intel-analytics#2259)

* update some previous docs (intel-analytics#2284)

* K8docsupdate (intel-analytics#2306)

* Update README.md

* Update s3 related links in readme  and documents (intel-analytics#2489)

* Update s3 related links in readme  and documents

* Update s3 related links in readme and documents

* Update s3 related links in readme and documents

* Update s3 related links in readme and documents

* Update s3 related links in readme and documents

* Update s3 related links in readme and documents

* update

* update

* modify line length limit

* update

* Update mxnet-mkl version in hyper-zoo dockerfile (intel-analytics#2720)

Co-authored-by: gaoping <pingx.gao@intel.com>

* update bigdl version (intel-analytics#2743)

* update bigdl version

* hyperzoo dockerfile add cluster-serving (intel-analytics#2731)

* hyperzoo dockerfile add cluster-serving

* update

* update

* update

* update jdk url

* update jdk url

* update

Co-authored-by: gaoping <pingx.gao@intel.com>

* Support init_spark_on_k8s (intel-analytics#2813)

* initial

* fix

* code refactor

* bug fix

* update docker

* style

* add conda to docker image (intel-analytics#2894)

* add conda to docker image

* Update Dockerfile

* Update Dockerfile

Co-authored-by: glorysdj <glorysdj@gmail.com>

* Fix code blocks indents in .md files (intel-analytics#2978)

* Fix code blocks indents in .md files

Previously a lot of the code blocks in markdown files were horribly indented with bad white spaces in the beginning of lines. Users can't just select, copy, paste, and run (in the case of python). I have fixed all these, so there is no longer any code block with bad white space at the beginning of the lines.
It would be nice if you could try to make sure in future commits that all code blocks are properly indented inside and have the right amount of white space in the beginning!

* Fix small style issue

* Fix indents

* Fix indent and add \ for multiline commands

Change indent from 3 spaces to 4, and add "\" for multiline bash commands

Co-authored-by: Yifan Zhu <fanzhuyifan@gmail.com>

* enable bigdl 0.12 (intel-analytics#3101)

* switch to bigdl 0.12

* Hyperzoo example ref (intel-analytics#3143)

* specify pip version to fix oserror 0 of proxy (intel-analytics#3165)

* Bigdl0.12.1 (intel-analytics#3155)

* bigdl 0.12.1

* bump 0.10.0-Snapshot (intel-analytics#3237)

* update runtime image name (intel-analytics#3250)

* update jdk download url (intel-analytics#3316)

* update jdk8 url (intel-analytics#3411)

Co-authored-by: ardaci <dongjie.shi@intel.com>

* update hyperzoo docker image (intel-analytics#3429)

* update hyperzoo image (intel-analytics#3457)

* fix jdk in az docker (intel-analytics#3478)

* fix jdk in az docker

* fix jdk for hyperzoo

* fix jdk in jenkins docker

* fix jdk in cluster serving docker

* fix jdk

* fix readme

* update python dep to fit cnvrg (intel-analytics#3486)

* update ray version doc (intel-analytics#3568)

* fix deploy hyperzoo issue (intel-analytics#3574)

Co-authored-by: gaoping <pingx.gao@intel.com>

* add spark fix and net-tools and status check (intel-analytics#3742)

* intsall netstat and add check status

* add spark fix for graphene

* bigdl 0.12.2 (intel-analytics#3780)

* bump to 0.11-S and fix version issues except ipynb

* add multi-stage build Dockerfile (intel-analytics#3916)

* add multi-stage build Dockerfile

* multi-stage build dockerfile

* multi-stage build dockerfile

* Rename Dockerfile.multi to Dockerfile

* delete Dockerfile.multi

* remove comments, add TINI_VERSION to common arg, remove Dockerfile.multi

* multi-stage add tf_slim

Co-authored-by: shaojie <shaojiex.bai@intel.com>

* update hyperzoo doc and k8s doc (intel-analytics#3959)

* update userguide of k8s

* update k8s guide

* update hyperzoo doc

* Update k8s.md

add note

* Update k8s.md

add note

* Update k8s.md

update notes

* fix 4087 issue (intel-analytics#4097)

Co-authored-by: shaojie <shaojiex.bai@intel.com>

* fixed 4086 and 4083 issues (intel-analytics#4098)

Co-authored-by: shaojie <shaojiex.bai@intel.com>

* Reduce image size (intel-analytics#4132)

* Reduce Dockerfile size
1. del redis stage
2. del flink stage
3. del conda & exclude some python packages
4. add copies layer stage

* update numpy version to 1.18.1

Co-authored-by: zzti-bsj <shaojiex.bai@intel.com>

* update hyperzoo image (intel-analytics#4250)

Co-authored-by: Adria777 <Adria777@github.com>

* bigdl 0.13 (intel-analytics#4210)

* bigdl 0.13

* update

* print exception

* pyspark2.4.6

* update release PyPI script

* update

* flip snapshot-0.12.0 and spark2.4.6 (intel-analytics#4254)

* s-0.12.0 master

* Update __init__.py

* Update python.md

* fix docker issues due to version update (intel-analytics#4280)

* fix docker issues

* fix docker issues

* update Dockerfile to support spark 3.1.2 && 2.4.6 (intel-analytics#4436)

Co-authored-by: shaojie <otnw_bsj@163.com>

* update hyperzoo, add lib for tf2 (intel-analytics#4614)

* delete tf 1.15.0 (intel-analytics#4719)

Co-authored-by: Le-Zheng <30695225+Le-Zheng@users.noreply.github.com>
Co-authored-by: pinggao18 <44043817+pinggao18@users.noreply.github.com>
Co-authored-by: pinggao187 <44044110+pinggao187@users.noreply.github.com>
Co-authored-by: gaoping <pingx.gao@intel.com>
Co-authored-by: Kai Huang <huangkaivision@gmail.com>
Co-authored-by: GavinGu07 <55721214+GavinGu07@users.noreply.github.com>
Co-authored-by: Yifan Zhu <zhuyifan@stanford.edu>
Co-authored-by: Yifan Zhu <fanzhuyifan@gmail.com>
Co-authored-by: Song Jiaming <litchy233@gmail.com>
Co-authored-by: ardaci <dongjie.shi@intel.com>
Co-authored-by: Yang Wang <yang3.wang@intel.com>
Co-authored-by: zzti-bsj <2779090360@qq.com>
Co-authored-by: shaojie <shaojiex.bai@intel.com>
Co-authored-by: Lingqi Su <33695124+Adria777@users.noreply.github.com>
Co-authored-by: Adria777 <Adria777@github.com>
Co-authored-by: shaojie <otnw_bsj@163.com>
dding3 pushed a commit to dding3/BigDL that referenced this issue Nov 17, 2021
* add hyperzoo for k8s support (intel-analytics#2140)

* add hyperzoo for k8s support

* format

* format

* format

* format

* run examples on k8s readme (intel-analytics#2163)

* k8s  readme

* fix jdk download issue (intel-analytics#2219)

* add doc for submit jupyter notebook and cluster serving to k8s (intel-analytics#2221)

* add hyperzoo doc

* add hyperzoo doc

* add hyperzoo doc

* add hyperzoo doc

* fix jdk download issue (intel-analytics#2223)

* bump to 0.9s (intel-analytics#2227)

* update jdk download url (intel-analytics#2259)

* update some previous docs (intel-analytics#2284)

* K8docsupdate (intel-analytics#2306)

* Update README.md

* Update s3 related links in readme  and documents (intel-analytics#2489)

* Update s3 related links in readme  and documents

* Update s3 related links in readme and documents

* Update s3 related links in readme and documents

* Update s3 related links in readme and documents

* Update s3 related links in readme and documents

* Update s3 related links in readme and documents

* update

* update

* modify line length limit

* update

* Update mxnet-mkl version in hyper-zoo dockerfile (intel-analytics#2720)

Co-authored-by: gaoping <pingx.gao@intel.com>

* update bigdl version (intel-analytics#2743)

* update bigdl version

* hyperzoo dockerfile add cluster-serving (intel-analytics#2731)

* hyperzoo dockerfile add cluster-serving

* update

* update

* update

* update jdk url

* update jdk url

* update

Co-authored-by: gaoping <pingx.gao@intel.com>

* Support init_spark_on_k8s (intel-analytics#2813)

* initial

* fix

* code refactor

* bug fix

* update docker

* style

* add conda to docker image (intel-analytics#2894)

* add conda to docker image

* Update Dockerfile

* Update Dockerfile

Co-authored-by: glorysdj <glorysdj@gmail.com>

* Fix code blocks indents in .md files (intel-analytics#2978)

* Fix code blocks indents in .md files

Previously a lot of the code blocks in markdown files were horribly indented with bad white spaces in the beginning of lines. Users can't just select, copy, paste, and run (in the case of python). I have fixed all these, so there is no longer any code block with bad white space at the beginning of the lines.
It would be nice if you could try to make sure in future commits that all code blocks are properly indented inside and have the right amount of white space in the beginning!

* Fix small style issue

* Fix indents

* Fix indent and add \ for multiline commands

Change indent from 3 spaces to 4, and add "\" for multiline bash commands

Co-authored-by: Yifan Zhu <fanzhuyifan@gmail.com>

* enable bigdl 0.12 (intel-analytics#3101)

* switch to bigdl 0.12

* Hyperzoo example ref (intel-analytics#3143)

* specify pip version to fix oserror 0 of proxy (intel-analytics#3165)

* Bigdl0.12.1 (intel-analytics#3155)

* bigdl 0.12.1

* bump 0.10.0-Snapshot (intel-analytics#3237)

* update runtime image name (intel-analytics#3250)

* update jdk download url (intel-analytics#3316)

* update jdk8 url (intel-analytics#3411)

Co-authored-by: ardaci <dongjie.shi@intel.com>

* update hyperzoo docker image (intel-analytics#3429)

* update hyperzoo image (intel-analytics#3457)

* fix jdk in az docker (intel-analytics#3478)

* fix jdk in az docker

* fix jdk for hyperzoo

* fix jdk in jenkins docker

* fix jdk in cluster serving docker

* fix jdk

* fix readme

* update python dep to fit cnvrg (intel-analytics#3486)

* update ray version doc (intel-analytics#3568)

* fix deploy hyperzoo issue (intel-analytics#3574)

Co-authored-by: gaoping <pingx.gao@intel.com>

* add spark fix and net-tools and status check (intel-analytics#3742)

* intsall netstat and add check status

* add spark fix for graphene

* bigdl 0.12.2 (intel-analytics#3780)

* bump to 0.11-S and fix version issues except ipynb

* add multi-stage build Dockerfile (intel-analytics#3916)

* add multi-stage build Dockerfile

* multi-stage build dockerfile

* multi-stage build dockerfile

* Rename Dockerfile.multi to Dockerfile

* delete Dockerfile.multi

* remove comments, add TINI_VERSION to common arg, remove Dockerfile.multi

* multi-stage add tf_slim

Co-authored-by: shaojie <shaojiex.bai@intel.com>

* update hyperzoo doc and k8s doc (intel-analytics#3959)

* update userguide of k8s

* update k8s guide

* update hyperzoo doc

* Update k8s.md

add note

* Update k8s.md

add note

* Update k8s.md

update notes

* fix 4087 issue (intel-analytics#4097)

Co-authored-by: shaojie <shaojiex.bai@intel.com>

* fixed 4086 and 4083 issues (intel-analytics#4098)

Co-authored-by: shaojie <shaojiex.bai@intel.com>

* Reduce image size (intel-analytics#4132)

* Reduce Dockerfile size
1. del redis stage
2. del flink stage
3. del conda & exclude some python packages
4. add copies layer stage

* update numpy version to 1.18.1

Co-authored-by: zzti-bsj <shaojiex.bai@intel.com>

* update hyperzoo image (intel-analytics#4250)

Co-authored-by: Adria777 <Adria777@github.com>

* bigdl 0.13 (intel-analytics#4210)

* bigdl 0.13

* update

* print exception

* pyspark2.4.6

* update release PyPI script

* update

* flip snapshot-0.12.0 and spark2.4.6 (intel-analytics#4254)

* s-0.12.0 master

* Update __init__.py

* Update python.md

* fix docker issues due to version update (intel-analytics#4280)

* fix docker issues

* fix docker issues

* update Dockerfile to support spark 3.1.2 && 2.4.6 (intel-analytics#4436)

Co-authored-by: shaojie <otnw_bsj@163.com>

* update hyperzoo, add lib for tf2 (intel-analytics#4614)

* delete tf 1.15.0 (intel-analytics#4719)

Co-authored-by: Le-Zheng <30695225+Le-Zheng@users.noreply.github.com>
Co-authored-by: pinggao18 <44043817+pinggao18@users.noreply.github.com>
Co-authored-by: pinggao187 <44044110+pinggao187@users.noreply.github.com>
Co-authored-by: gaoping <pingx.gao@intel.com>
Co-authored-by: Kai Huang <huangkaivision@gmail.com>
Co-authored-by: GavinGu07 <55721214+GavinGu07@users.noreply.github.com>
Co-authored-by: Yifan Zhu <zhuyifan@stanford.edu>
Co-authored-by: Yifan Zhu <fanzhuyifan@gmail.com>
Co-authored-by: Song Jiaming <litchy233@gmail.com>
Co-authored-by: ardaci <dongjie.shi@intel.com>
Co-authored-by: Yang Wang <yang3.wang@intel.com>
Co-authored-by: zzti-bsj <2779090360@qq.com>
Co-authored-by: shaojie <shaojiex.bai@intel.com>
Co-authored-by: Lingqi Su <33695124+Adria777@users.noreply.github.com>
Co-authored-by: Adria777 <Adria777@github.com>
Co-authored-by: shaojie <otnw_bsj@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants