Reproducing results with multiple GPUs #14

RuijieZhu94 · 2024-04-03T03:03:39Z

Hi Yuedong, thank you for open source your great work!

When I trained the model using 3 Nvidia RTX 3090s (batch size 4 per GPU), I got significantly worse results on the re10k.

psnr 22.12379274863242
ssim 0.7298626045353773
lpips 0.22073094525619313

Will fewer batchsize or multi-GPU training significantly affect the performance of the model?
By the way, I use the official weights and can get results consistent with the paper.

psnr 26.386906073201686
ssim 0.8690403559103327
lpips 0.12837660807718004

The text was updated successfully, but these errors were encountered:

donydchen · 2024-04-03T03:31:30Z

Hi @RuijieZhu94, thanks for your interest in our work.

Yes, there is a small bug regarding feature extraction due to code cleaning. It is mainly related to (batch, view) dimension conversion, it does not affect the testing since testing keeps batch_size=1. We have already corrected it in our last commit (297338f). We have re-trained the model (after fixing the aforementioned bug) using both single GPU and multi-GPUs configurations, and they both reproduced the results of the released model.

Would you mind updating the code following our last commit (297338f) and re-training the model? Let us keep this commit open for you to update the results. For a quicker debugging process, your model should reach around PSNR=23 at step 10K with the updated code, which is around PSNR=20 at step 10K if it contains the aforementioned feature extraction bug.

By the way, we use batch_size=14 by default (a smaller batch_size might slightly harm the performance but should not be that much). And the LPIPS weight is 0.05; the lr scheduler is 1cycle with lr=2.e-4, as we have updated in this commit (660f49c). Make sure you have also synchronised your code base (if you have made any changes) with the aforementioned commits.

RuijieZhu94 · 2024-04-03T05:14:58Z

Hi Yuedong, thanks for your prompt reply, I will retrain this model in the next few days.

RuijieZhu94 · 2024-04-07T02:50:16Z

Hi Yuedong, I retrained this model with bs=12, and got the result:

psnr 26.31555430801481
ssim 0.8676635705885196
lpips 0.12932708359573464

Thank you for your help.

boxuLibrary · 2024-05-23T06:11:55Z

@RuijieZhu94 Could you share the link of the training dataset? I reach out the author of the pixelsplat for link. However, i can not open the link.

RuijieZhu94 · 2024-05-23T11:04:00Z

@RuijieZhu94 Could you share the link of the training dataset? I reach out the author of the pixelsplat for link. However, i can not open the link.

Please contact me by email: ruijiezhu@mail.ustc.edu.cn.

donydchen self-assigned this Apr 3, 2024

RuijieZhu94 closed this as completed Apr 7, 2024

Yochengliu mentioned this issue Apr 16, 2024

less effective results by overfitting on re10k subset #18

Closed

donydchen pinned this issue Apr 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproducing results with multiple GPUs #14

Reproducing results with multiple GPUs #14

RuijieZhu94 commented Apr 3, 2024

donydchen commented Apr 3, 2024 •

edited

Loading

RuijieZhu94 commented Apr 3, 2024

RuijieZhu94 commented Apr 7, 2024

boxuLibrary commented May 23, 2024

RuijieZhu94 commented May 23, 2024

Reproducing results with multiple GPUs #14

Reproducing results with multiple GPUs #14

Comments

RuijieZhu94 commented Apr 3, 2024

donydchen commented Apr 3, 2024 • edited Loading

RuijieZhu94 commented Apr 3, 2024

RuijieZhu94 commented Apr 7, 2024

boxuLibrary commented May 23, 2024

RuijieZhu94 commented May 23, 2024

donydchen commented Apr 3, 2024 •

edited

Loading