Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added an option to flatten_dims in RobustCostFunction. #503
Added an option to flatten_dims in RobustCostFunction. #503
Changes from 4 commits
b9baf83
ed3118c
648ba78
875bdee
b2b63ca
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if something like
(error_loss.reshape(-1, self.dim()).sum(dim=-1, keepdims=True) + RobustCostFunction._EPS).sqrt()
is better and more consistent.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this change the result? The test I added checks that the current behavior is the same as
flatten_dims=False
for separate cost functions, which is what we want.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following my previous comments, this will be something like
torch.einsum("ni, nij->nij", rescale.view(-1, self.dims()), jacobians)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comment as above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With .diagonal, it presumes that the Jacobian matrix is a square matrix. And for a jacobian, it would be dE/dX, which determines its shape is (dim(E), dim(x)). My question is, will it be necessarily equal for dim(E) and dim(X)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good observation! I forgot to mention that this implementation assumes that we want to model N robust cost functions of dim=1, each with$k$ variables of dof=1, with a single robust cost function of dimension N and $k$ variables of dof=N. Basically, this assumes you want to use a single
RobustCost
object to model a problem of the formSupporting cost function of dim > 1 and other combinations of variables dof gets messier, and I'm not sure it's worth the effort.
Note that this feature is mostly for programming convenience. The alternative of creating individual cost functions objects doesn't hurt performance, because
TheseusLayer
by defaults vectorizes cost functions with the same pattern so they are evaluated in a single call.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it seems it's not feasible for the case like in tutorial 4, motion model (First figure). In that case, the Error is the combination of 2 vectors, which could be usual and convenient in many case. Therefore, in my original code, I obtain the dimension of the input X through the Jacobian matrix. Because the error itself only contains the batch size and the dimension of the error. And I believe based on the chain rule the dRobustLoss/dOriLoss should be able to be multiplied by the original Jocobian directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. You are right. Currently, I do not see an optimization problem that would result in where dim(E) != dim(X) or cannot be decoupled into multiple cost functions that fulfill dim(E) = dim(X). We can further discuss it if one encounters those problems. Thank you! I have learned a lot from your project!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem, and thank you for all your useful feedback!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xphsean12 I have to apologize, because turns out I was wrong in some of the above! We definitely want to support the case were dim(error) != dim(X). For example, in the regression problem, dim(error) = N_points, but dim(X) = 1 both for A and B. In the above I was mixing up optim vars with aux vars.
Luckily, the fix is still pretty simple, and I have rewritten the test to check for the case above, using the same regression example. You can look at the last few commits if you are curious.
The code still assumes that
flatten_dims
acts as if it was dim(err) independent costs of dim=1.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your feedback! By the way, for the radius, (maybe) I found a new issue. Since I do not have very concrete evidence, I am not going to raise an issue so far. In my previous commit, I change the radius = radius.exp() into .square(). @fantaosha mention that exp() is better for differentiating. However, the easiest approach should be just let radius = radius. And we just need to let the user input an radius >= 0.
The main difference is that (1) based on my observation, .exp() is much slower than radius = radius when I try to train the Huber radius (2) training an value, which will be further pass to the exp() function may be very difficult. Because when changing the trained value from 0 to 5, the actual radius being used is changed from 1 to 148. The exponential growth may cause training an accurate radius difficult. Because the radius will be enlarged by an exponential function. I am not sure whether my second concern is correct, just kind of thinking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fantaosha Any comments? I don't think using
exp()
should be particularly problematic. Maybe you need to lower your learning rate?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line might also need changes as well