-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added an option to flatten_dims in RobustCostFunction. #503
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Some minor comments on how to deal with rescaling.
squared_norm = torch.sum(weighted_error**2, dim=1, keepdim=True) | ||
error_loss = self.loss.evaluate(squared_norm, self.log_loss_radius.tensor) | ||
|
||
if self.flatten_dims: | ||
return (error_loss.reshape(-1, self.dim()) + RobustCostFunction._EPS).sqrt() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if something like (error_loss.reshape(-1, self.dim()).sum(dim=-1, keepdims=True) + RobustCostFunction._EPS).sqrt()
is better and more consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this change the result? The test I added checks that the current behavior is the same as flatten_dims=False
for separate cost functions, which is what we want.
squared_norm = torch.sum(weighted_error**2, dim=1, keepdim=True) | ||
rescale = ( | ||
self.loss.linearize(squared_norm, self.log_loss_radius.tensor) | ||
+ RobustCostFunction._EPS | ||
).sqrt() | ||
|
||
return [ | ||
rescaled_jacobians = [ | ||
rescale.view(-1, 1, 1) * jacobian for jacobian in weighted_jacobians |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following my previous comments, this will be something like torch.einsum("ni, nij->nij", rescale.view(-1, self.dims()), jacobians)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comment as above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With .diagonal, it presumes that the Jacobian matrix is a square matrix. And for a jacobian, it would be dE/dX, which determines its shape is (dim(E), dim(x)). My question is, will it be necessarily equal for dim(E) and dim(X)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good observation! I forgot to mention that this implementation assumes that we want to model N robust cost functions of dim=1, each with RobustCost
object to model a problem of the form
Supporting cost function of dim > 1 and other combinations of variables dof gets messier, and I'm not sure it's worth the effort.
Note that this feature is mostly for programming convenience. The alternative of creating individual cost functions objects doesn't hurt performance, because TheseusLayer
by defaults vectorizes cost functions with the same pattern so they are evaluated in a single call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it seems it's not feasible for the case like in tutorial 4, motion model (First figure). In that case, the Error is the combination of 2 vectors, which could be usual and convenient in many case. Therefore, in my original code, I obtain the dimension of the input X through the Jacobian matrix. Because the error itself only contains the batch size and the dimension of the error. And I believe based on the chain rule the dRobustLoss/dOriLoss should be able to be multiplied by the original Jocobian directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. You are right. Currently, I do not see an optimization problem that would result in where dim(E) != dim(X) or cannot be decoupled into multiple cost functions that fulfill dim(E) = dim(X). We can further discuss it if one encounters those problems. Thank you! I have learned a lot from your project!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem, and thank you for all your useful feedback!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xphsean12 I have to apologize, because turns out I was wrong in some of the above! We definitely want to support the case were dim(error) != dim(X). For example, in the regression problem, dim(error) = N_points, but dim(X) = 1 both for A and B. In the above I was mixing up optim vars with aux vars.
Luckily, the fix is still pretty simple, and I have rewritten the test to check for the case above, using the same regression example. You can look at the last few commits if you are curious.
The code still assumes that flatten_dims
acts as if it was dim(err) independent costs of dim=1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your feedback! By the way, for the radius, (maybe) I found a new issue. Since I do not have very concrete evidence, I am not going to raise an issue so far. In my previous commit, I change the radius = radius.exp() into .square(). @fantaosha mention that exp() is better for differentiating. However, the easiest approach should be just let radius = radius. And we just need to let the user input an radius >= 0.
The main difference is that (1) based on my observation, .exp() is much slower than radius = radius when I try to train the Huber radius (2) training an value, which will be further pass to the exp() function may be very difficult. Because when changing the trained value from 0 to 5, the actual radius being used is changed from 1 to 148. The exponential growth may cause training an accurate radius difficult. Because the radius will be enlarged by an exponential function. I am not sure whether my second concern is correct, just kind of thinking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fantaosha Any comments? I don't think using exp()
should be particularly problematic. Maybe you need to lower your learning rate?
rescale.view(-1, 1, 1) * jacobian for jacobian in weighted_jacobians | ||
], rescale * weighted_error | ||
] | ||
rescaled_error = rescale * weighted_error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line might also need changes as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Closes #497