Participants: Chris, Bobber
Our target is to get silver medal in this competation, it means Top 5%, refer https://www.kaggle.com/progression Meeting time: 5pm Toronto/ 11pm Munich
-
self introduce.
-
Chris reviewed all public notebooks and found following things:
- EDA, they uses same feature EDA
- Model, they uses bilstm
- Post/process, someone uses round to improve the score
-
Problems:
- Chris has no enough TPU. Solution: Chris can use TPU in kaggle team while Bobber uses TPU with Colab.
- TPU is stop after 12 hour of running https://github.com/bobbercheng/ventilator-pressure-prediction/blob/master/ventilator-bidirectional-lstm-modification.ipynb. It reachs to Fold 5. Solution: Download checkpoint_filepath from colab and continue train
-
Next plan
- EDA, Chris checks if there are other useful features
- Model, Bobber add CNN.
- Stack, Bobber adds stacks
- Post/process, Chris checks it.
Next meet time: 5pm Toronto/ 11pm Munich
What we did:
- EDA, Chris found https://www.kaggle.com/danofer/ts-windows-feature-engineering-ventilators.
- Model, Bobber added CNN and merged aboved EDA, https://github.com/bobbercheng/ventilator-pressure-prediction/blob/master/ventilator_pressure_1d_cnn.ipynb. CX: 0.38475, LP 0.345
In Progress:
- Run BiLSTM with Colab.
Next Plan:
- Model
- Chris will check https://www.kaggle.com/lucamassaron/rescaling-layer-for-discrete-output-in-tensorflow
- Bobber will try to add attention or combine CNN and BiLSTM
What we did:
- EDA: Bobber found pressure only have 950 values. In range 4-11, pressure only have 100 values and R/C only have 10 combination.
- Model
- Chris checked https://www.kaggle.com/lucamassaron/rescaling-layer-for-discrete-output-in-tensorflow and median_round. It improved PL 0.001
Next Plan:
- Model
- Bobber add simple attention. 0.158. https://github.com/bobbercheng/ventilator-pressure-prediction/blob/master/ventilator-bidirectional-lstm-modification_v3.ipynb
- Bobber will add new cat output what have 101 types corresponding 100 values of pressure.
- Chris try to combine CNN and BiLSTM.
What we did:
- EDA: Bobber confirmed that in range 4-11, predict pressure only use same 100 values as train data, refer to https://www.kaggle.com/bobber/cat-presure.
- Model: Bobber added attention and residul. 4 layer Attentions improves LB to 0.158. It takes 8 hours to get result.
In progress:
- Model:
- Bobber added residul. Version 14 is bad, LB 0.17
- Chris mix CNN and BiLSTM
- Submit https://www.kaggle.com/cdeotte/ensemble-folds-with-median-0-153
Next plan:
- Fine tune the best public model from https://www.kaggle.com/mistag/pred-ventilator-lstm-model/notebook. Don't use it, it only can get LP 0.157
- Fine tune the second public mode from https://www.kaggle.com/bobber/single-bi-lstm-model-pressure-predict-gpu-infer, LP 0.152
What we did:
- The organizer published https://arxiv.org/pdf/2102.06779.pdf. In the paper, they train different model with different R/C. The best MAE is 0.27 while the best in PL is 0.119. It's amazing.
- Bobber increased CNN to 2024 but CV doesn't improve as 1024. Refer to https://github.com/bobbercheng/ventilator-pressure-prediction/blob/master/ventilator_pressure_1d_cnn_2048_attention.ipynb
- Why we only need encoder instead encoder-decoder in this competition? Because we need to predict from hidden state or high demission features but we don't need generate different sentences like NLP.
Next plan:
- Meet new member.
- Model:
- Fine tune the second public mode from https://www.kaggle.com/bobber/single-bi-lstm-model-pressure-predict-gpu-infer, LP 0.152
- Combine CNN and BiLSTM - Chris
- Try transformer - Bobber
- WaveNet
- EfficientNet
What we did:
- Model:
- Chris combined CNN and BiLSTM. CNN as input to BiLSTM is not good.
- Concate CNN and BiLSTL. It looks promising and wait for result.
- Bobber implemented Transformer.
- Without Position Encode, PL: 0.26 CV: 0.26
- With Position Encode. CV.9.23, PL:
Next Plan:
- Pick up the best public BiLSTM model.
- Conccat Transformer and BiLSTM.
- Use embeding for R C.
- Randon seed/state. Fine tune. tf.set_random_seed(seed_value). TF 2.0 import tensorflow as tf tf.random.set_seed(221)
Sorry for no update many days.
What we did:
- GB+Rescaling has 0.1427 in PL. Great job, Chris
- Implemented full Transformer https://github.com/bobbercheng/ventilator-pressure-prediction/blob/master/ventilator_pressure_transformer_V6.ipynb. However, V6 doesn't converge as Transformer-Encoder.
- Analyze predict https://github.com/bobbercheng/ventilator-pressure-prediction/blob/master/Analyze_predict_data.ipynb.
- Mean of MAE has long tail. It's normal.
- No zero error, it may mean we can use catalog to reduce loss.
- When pressure changes dymatically, it cannot predict well.
- Different RC has different error distribution
20-10 - MAE 0.15342606138845297, count: 184106
- 20-20 - MAE 0.16133572280868774, count: 185841
- 20-50 - MAE 0.16003437620608627, count: 243184
- 5-10 - MAE 0.15912988705787456, count: 249386
- 5-20 - MAE 0.11116782628501401, count: 255571
- 5-50 - MAE 0.11376304575751983, count: 245700
- 50-10 - MAE 0.16271191963964562, count: 418114
- 50-20 - MAE 0.2426435067746749, count: 255953
- 50-50 - MAE 0.2504960925707726, count: 253113
What we plan to do:
- Data: try to use 40 instead 80 for train - Chris
- Add attention to learn pressure changes - Bobber
What we did:
- We need to run at lest one folder to test the model.
- Bobber is adding self attention
- Chris is doing ensambling
- We need to check tree based model like LGB/XGBoot for big errors like RC 50-50, 50-20
What we did:
- Fixed "Allocate more memory issue" for Kaggle GPU.
- created new DB to collect the best models.
- self attention is added, https://www.kaggle.com/bobber/ventilator-gb-rescaling-eda-v8-gpu, https://github.com/bobbercheng/ventilator-pressure-prediction/blob/master/ventilator_gb_rescaling_eda_V8_GPU.ipynb
- Previous transformer mode is not good because it's normalize all data and loss scale. We can consider to use transformer to generate weight and multiply inputs instead of add weights.
What we plan to do:
- blending
- Complete https://github.com/bobbercheng/ventilator-pressure-prediction/blob/master/ventilator_gb_rescaling_eda_V8_GPU.ipynb
- Check Temporal Fusion Transformers (TFT) for Interpretable Multi-horizon Time Series Forecasting, https://arxiv.org/abs/1912.09363
What we did:
- Blend. PL 0.138. Good job, Chris.
- Finding tree-based model
- Transformer encoder didn't improve CV or PL score.
No meeting
What we did:
- explain R/C, https://www.kaggle.com/c/ventilator-pressure-prediction/discussion/276599, https://www.kaggle.com/c/ventilator-pressure-prediction/discussion/276828
- added Auto encoder. https://github.com/bobbercheng/ventilator-pressure-prediction/blob/master/ventilator_gb_rescaling_V12_GPU.ipynb
- LightGBM model, https://github.com/bobbercheng/ventilator-pressure-prediction/blob/master/VPP_LGBM_GPU_0.4506.ipynb.
What we plan to do:
- Fix train issues of LightGBM model
- Change parameters for Auto encode.
- Try to use wavenet for 50__20, 50__50
What we did:
- Wavenet archives very good CV for 50__20, 50__50 but it doesn't improve public score. Wavenet has bad CV for fold#1 if whole train is used. It may be caused by that LSTM part is too powerful and LSTM part has fixed weight trained from fold #1.
What we plan to do:
- Run LSTM model with TPU fully with 66 features. It will give us base-model with 7 fold
- Retain LSTM and Wavenet together instead of fixing LSTM weight.
- Retain LSTM and Wavenet from begin.
What we did:
- Run LSTM model with TPU fully with 66 features. It will give us base-model with 7 fold. In progress.
- Retain LSTM and Wavenet together instead of fixing LSTM weight. No improve. CV 0.161
- Fixed LSTM to the -4 layer instead of -3 layer, no improve. Fixed LSTM to the -3, CV 1.59xxx, a little bit improvement.
- Change loss function to MeanSquaredError, no improve.
- Retain LSTM and Wavenet from begin. no improve. CV 0.163
- Increase loss of RC 50__50, 50__20 for LSTM/Wavenet model. No obvious change for both train from scratch or fix LSTM parts.
- Increase loss of RC 50__50, 50__20 for LSTM/Transformer_encoder
Chris's transformer:
- Batch size 64, 11 folder -> 32 folder
- 3 Output, pressure and pressure.diff()
- Connection weight for each transformer layer. 0.7: 0.3. feat_dim = train.shape[-1] + 32, embed_dim = 64, ff_dim=128
- Use Consin restart learning rate scheduler. No earlier stop.
- Use whole train data without validation reserve for the best result..
- Retain again with 32 shuffled folder data and different random seed.