Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom dataset training #11

Closed
VillardX opened this issue Mar 29, 2024 · 1 comment
Closed

Custom dataset training #11

VillardX opened this issue Mar 29, 2024 · 1 comment
Assignees

Comments

@VillardX
Copy link

Hi, thanks for the great work. I have some questions about custom data training.

In the paper, re10k data training only input 2 context-view rgb images and corresponding intrinsics and extrinsics,and output a novel view rgb.

  1. About the znear and zfar, in “dataset_re10k.py”, it is set 1 and 100. Should znear and zfar be modified if trained on my custom dataset? What 1 and 100 mean? Meter?
  2. About extrinsic and intrinsic, according to pixelsplat, “Our extrinsics are OpenCV-style camera-to-world matrices. This means that +Z is the camera look vector, +X is the camera right vector, and -Y is the camera up vector. Our intrinsics are normalized, meaning that the first row is divided by image width, and the second row is divided by image height.”
    I don’t know what the dimension of T vector of extrinsic, is the T vector in meters? And according to your “dataset_re10k.py”, the extrinsic of raw data is “w2c” and you return “w2c.inverse()” as c2w in function “convert_poses()”. Is my understanding correct?
  3. The num of context view is 3 in my custom dataset. In the paper, it is trained with 2 context view. Where can I modify it?
    By the way, the paper use MVS cost volume, but the model is mainly trained with 2-input-view setting. Did you try to train with mutiple-input-view-setting?
@donydchen donydchen self-assigned this Mar 29, 2024
@donydchen
Copy link
Owner

Hi @VillardX, thanks for your interest in our work.

We empirically set the (near, far) as (1, 100), following our previous work MuRF (see the implementations HERE). If I remember correctly, these two values actually have no strict physical meanings, we just warp the images and find that they fit. Indeed, it needs to be set to other values if you work on other datasets. For example, you can set them according to the COLMAP data if you have it, more references can be found HERE. Or you can follow us to warp the input images to decide if you do not have the COLMAP data, see #4 (comment).


I am not sure whether T is in meters or not (I guess it is also relative value though, since it is reconstructed, not real ground truth). You may refer to the RE10K homepage for more details. Your understanding is correct, the raw data is 'w2c'.


For more information about how to train and test with more views, kindly refer to #4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants