Reproducing the training results on a megadepth dataset #253

FlyFish-space · 2023-03-20T11:52:32Z

Thank you very much for your excellent work.
I recently reproduce the training results on 4 3090 GPUs for 30 epochs based on README. The batch size each GPU is 2. I trained and tested on the D2 Net-undistorted megadepth dataset, and the results are as follows:
auc@5: 44.1 acu@10: 60.28 auc@20: 72.93
At the same time, I also saw that in the previous issue, it was recommended to set the image sizes of both val and test to 640, but the results did not improve.
What is the reason for this decline in accuracy？

benjaminkelenyi · 2023-04-26T16:01:00Z

Hello, I'm facing the same issue.

The loss is very fluctuating...

chicleee · 2023-06-01T06:42:23Z

Hi， Have you made any progress on this issue？

benjaminkelenyi · 2023-06-05T05:21:55Z

Hello, Thanks for your reply. Yes, I fixed the issue! Thank you!

…

On Thu, 1 Jun 2023 at 09:42, Xi Li ***@***.***> wrote: Hi， Have you made any progress on this issue？ — Reply to this email directly, view it on GitHub <#253 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFOHDZZFSQDSDNVZG5TUFHTXJA2VVANCNFSM6AAAAAAWA7GPK4> . You are receiving this because you commented.Message ID: ***@***.***>

-- Benjamin Kelenyi Student | Computer Science | Technical University m: +40743586598 e: ***@***.*** a: Str. G. Baritiu nr. 26-28, 400027 Cluj-Napoca, Romania <https://www.facebook.com/benjamin.kelenyi> <https://www.facebook.com/benjamin.kelenyi> <https://www.linkedin.com/in/benjamin-kelenyi-aa322710a/>

chen9run · 2023-06-07T02:37:04Z

Hello, Thanks for your reply. Yes, I fixed the issue! Thank you!
…
On Thu, 1 Jun 2023 at 09:42, Xi Li @.> wrote: Hi， Have you made any progress on this issue？ — Reply to this email directly, view it on GitHub <#253 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFOHDZZFSQDSDNVZG5TUFHTXJA2VVANCNFSM6AAAAAAWA7GPK4 . You are receiving this because you commented.Message ID: @.>
-- Benjamin Kelenyi Student | Computer Science | Technical University m: +40743586598 e: @.*** a: Str. G. Baritiu nr. 26-28, 400027 Cluj-Napoca, Romania https://www.facebook.com/benjamin.kelenyi https://www.facebook.com/benjamin.kelenyi https://www.linkedin.com/in/benjamin-kelenyi-aa322710a/
Hi，have you find the reeason?

Mysophobias · 2023-07-19T01:18:04Z

hello,my results are similar to yours,Have you tried changing your TRAIN_IMG_SIZE to 840.

Master-cai · 2023-07-28T13:44:42Z

I'm training outdoor_ds with the default setting(image size 640), I use 4 3090 GPUs too. Also, i use the origin megadepth data to train as the undistorted image is not accessible now.

After 11 epoches training, I got the following results(val):
auc@5: 45.6 acu@10: 62.4 auc@20: 75.1

it does not seem to grow anymore. I will try to train it 30 epoches and test the model on the test set(it may takes another two days).

Has anyone else already reproduced the results using a similar setting? Would setting TRAIN_IMG_SIZE to 840 help?

Master-cai · 2023-07-30T13:24:00Z

After 30 epoches training, I reproduced the test on megadepth and got:
'auc@10': 0.6676607412455137,
'auc@20': 0.7952598445093988,
'auc@5': 0.4983204021567033,
'prec@5e-04': 0.9549532078302655

3 points lower than the reported accuracy.

Mysophobias · 2023-07-31T01:09:36Z

@Master-cai Hello，This is the training result I obtained when I set 'TRAIN_IMG_SIZE' to 640. Your training result is much better than mine. Have you tried setting 'TRAIN_IMG_SIZE' to 840?

Master-cai · 2023-07-31T01:31:45Z

@Master-cai Hello，This is the training result I obtained when I set 'TRAIN_IMG_SIZE' to 640. Your training result is much better than mine. Have you tried setting 'TRAIN_IMG_SIZE' to 840?

no, i use the default settings. Your results are very similar to mine with 11 epoches training. So what device you use and how long you train it ?

Mysophobias · 2023-07-31T02:06:03Z

@Master-cai Hello，This is the training result I obtained when I set 'TRAIN_IMG_SIZE' to 640. Your training result is much better than mine. Have you tried setting 'TRAIN_IMG_SIZE' to 840?

no, i use the default settings. Your results are very similar to mine with 11 epoches training. So what device you use and how long you train it ?

I also used 4 Nvidia RTX 3090 GPUs and trained for approximately 100 hours. I have tried using D2-net to process the dataset, and these are the validation results I saved during the training process. I am really eager to know if setting 'TRAIN_IMG_SIZE' to 840 would improve the accuracy after training.

Master-cai · 2023-07-31T02:28:06Z

@Mysophobias I didn't process the megadepth via D2-net, and your ckpts seems similar to mine. I have no idea about your bad test results. I just used the default reproduce_test\outdoor_ds.sh script to test.

As to the image size=840, I think it might help as it was officially recommended after all. 3090 is enough to train with 840 and you can try it.

Mysophobias · 2023-07-31T03:58:25Z

``

@Mysophobias I didn't process the megadepth via D2-net, and your ckpts seems similar to mine. I have no idea about your bad test results. I just used the default reproduce_test\outdoor_ds.sh script to test.

As to the image size=840, I think it might help as it was officially recommended after all. 3090 is enough to train with 840 and you can try it.

Based on the code comments in configs/data/megadepth_trainval_840.py it is indicated that 32GB of GPU memory is required for training. Of course, I have also attempted training on four 24GB 3090GPUs, but it was not successful. I will try again later. Anyway, thank you.

Master-cai · 2023-07-31T04:15:56Z

@Mysophobias 3090 can train it with physical bs=1. I use accumulate grad=2 to make the bs=1 * 2 * 4=8, which is suggested by the author. i have trained it for one epoch but now i'm not available with GPUs💔. I hope my experience can help you and It would be nice if you could share the final results.

xmlyqing00 · 2023-09-02T17:51:51Z

May I ask how could you train the model on MegaDepth? I got stuck in getting the training images from D2-net. I noticed the authors of LoFTR says the differences are subtle, but I don't know how to create the symbol link. Do I need to download the MegaDepth SfM dataset?

Best,
yq

Master-cai · 2023-09-03T02:09:05Z

@xmlyqing00 I think this issue can help.

xmlyqing00 · 2023-09-03T03:42:19Z

@xmlyqing00 I think this issue can help.

Thanks, I just fixed the training on MegaDepth

RunyuZhu · 2024-01-15T12:08:54Z

I'm training outdoor_ds with the default setting(image size 640), I use 4 3090 GPUs too. Also, i use the origin megadepth data to train as the undistorted image is not accessible now.

After 11 epoches training, I got the following results(val): auc@5: 45.6 acu@10: 62.4 auc@20: 75.1

it does not seem to grow anymore. I will try to train it 30 epoches and test the model on the test set(it may takes another two days).

Has anyone else already reproduced the results using a similar setting? Would setting TRAIN_IMG_SIZE to 840 help?

can i ask about your device's memery capacity?
i train loftr on my device with a single 3090ti 24G, 13 i7, and 128G memery, i set batch size to 1, n_gpus_per_node=1, nums_workers=0. but the process got killed when training at epoch 2, i find that loftr nearly run out of my memry(full of swap & 125/126g main memery used).
so, i beg for your info of device, and have you ever met this issue? it will be so nice of you if you could give me some tips.
thanks.
zhu

Master-cai · 2024-01-15T14:52:02Z

@RunyuZhu That`s weird, I use 4 3090ti and 128GB memery(8GB swap) to get that results, nums_workers is 4. But memory consumption does indeed increase over time. I never met this bug, sorry i cannot help you. I suggest you to look at the system log and make sure that the process is killed due to OOM and if there are some other processes occupy a large amount of memory.

RunyuZhu · 2024-01-16T02:37:41Z

@RunyuZhu That`s weird, I use 4 3090ti and 128GB memery(8GB swap) to get that results, nums_workers is 4. But memory consumption does indeed increase over time. I never met this bug, sorry i cannot help you. I suggest you to look at the system log and make sure that the process is killed due to OOM and if there are some other processes occupy a large amount of memory.

thanks for your reply and precious suggestion!
i will run it again with a bigger nums_workers or batch_size, and log the info to locate the issue.
thanks again!
zhu

WJJLBJ · 2024-03-16T07:03:09Z

@xmlyqing00 I think this issue can help.

hello, how do you fix the problem in line 47 in LoFTR/src/datasets/megadepth.py ? line 47 in offical code is self.scene_info = np.load(npz_path, allow_pickle=True), which is different from this issue

WJJLBJ · 2024-03-16T07:06:17Z

@xmlyqing00 I think this issue can help.

可以问下您是怎么解决loftr无法下载d2-net预处理数据的问题的吗？这个issue里面的做法有帮助嘛我看LoFTR/src/datasets/megadepth.py 里面的第47行并不是他给的那个而是 self.scene_info = np.load(npz_path, allow_pickle=True) 想问下你是怎么改动这个文件的呀多谢！

Master-cai · 2024-03-16T08:37:19Z

@WJJLBJ 直接使用原始图像，根据issue里给的做法处理

WJJLBJ · 2024-03-17T02:06:47Z

@WJJLBJ 直接使用原始图像，根据issue里给的做法处理

多谢多谢问题解决了

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproducing the training results on a megadepth dataset #253

Reproducing the training results on a megadepth dataset #253

FlyFish-space commented Mar 20, 2023

benjaminkelenyi commented Apr 26, 2023

chicleee commented Jun 1, 2023

benjaminkelenyi commented Jun 5, 2023 via email

chen9run commented Jun 7, 2023

Mysophobias commented Jul 19, 2023

Master-cai commented Jul 28, 2023 •

edited

Loading

Master-cai commented Jul 30, 2023

Mysophobias commented Jul 31, 2023

Master-cai commented Jul 31, 2023

Mysophobias commented Jul 31, 2023

Master-cai commented Jul 31, 2023

Mysophobias commented Jul 31, 2023

Master-cai commented Jul 31, 2023

xmlyqing00 commented Sep 2, 2023

Master-cai commented Sep 3, 2023

xmlyqing00 commented Sep 3, 2023

RunyuZhu commented Jan 15, 2024 •

edited

Loading

Master-cai commented Jan 15, 2024

RunyuZhu commented Jan 16, 2024

WJJLBJ commented Mar 16, 2024

WJJLBJ commented Mar 16, 2024

Master-cai commented Mar 16, 2024

WJJLBJ commented Mar 17, 2024

Reproducing the training results on a megadepth dataset #253

Reproducing the training results on a megadepth dataset #253

Comments

FlyFish-space commented Mar 20, 2023

benjaminkelenyi commented Apr 26, 2023

chicleee commented Jun 1, 2023

benjaminkelenyi commented Jun 5, 2023 via email

chen9run commented Jun 7, 2023

Mysophobias commented Jul 19, 2023

Master-cai commented Jul 28, 2023 • edited Loading

Master-cai commented Jul 30, 2023

Mysophobias commented Jul 31, 2023

Master-cai commented Jul 31, 2023

Mysophobias commented Jul 31, 2023

Master-cai commented Jul 31, 2023

Mysophobias commented Jul 31, 2023

Master-cai commented Jul 31, 2023

xmlyqing00 commented Sep 2, 2023

Master-cai commented Sep 3, 2023

xmlyqing00 commented Sep 3, 2023

RunyuZhu commented Jan 15, 2024 • edited Loading

Master-cai commented Jan 15, 2024

RunyuZhu commented Jan 16, 2024

WJJLBJ commented Mar 16, 2024

WJJLBJ commented Mar 16, 2024

Master-cai commented Mar 16, 2024

WJJLBJ commented Mar 17, 2024

Master-cai commented Jul 28, 2023 •

edited

Loading

RunyuZhu commented Jan 15, 2024 •

edited

Loading