-
Notifications
You must be signed in to change notification settings - Fork 69
Add ST_Azimuth()
#596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ST_Azimuth()
#596
Conversation
Maxxen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thanks a lot for picking this up!
I've left some small comments, when you address them you might want to switch the target branch to v1.3.0 too!
| auto left_x = FlatVector::GetData<double>(*left_entries[0]); | ||
| auto left_y = FlatVector::GetData<double>(*left_entries[1]); | ||
| auto right_x = FlatVector::GetData<double>(*right_entries[0]); | ||
| auto right_y = FlatVector::GetData<double>(*right_entries[1]); | ||
|
|
||
| auto &result_mask = FlatVector::Validity(result); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't assume that the input data is a FlatVector (unless you call .Flatten() on the DataChunk first). You can either do the vector access manually by setting up UnifiedVectorFormats but it's much easier to use the Binary/Unary/GenericExector helper classes in simple cases. For STRUCT inputs, you can use the GenericExectuor. As an example, have a look at e.g. ST_Distance_Spheroid:
//------------------------------------------------------------------------------------------------------------------
// POINT_2D
//------------------------------------------------------------------------------------------------------------------
static void ExecutePoint(DataChunk &args, ExpressionState &state, Vector &result) {
D_ASSERT(args.data.size() == 2);
auto &left = args.data[0];
auto &right = args.data[1];
auto count = args.size();
using POINT_TYPE = StructTypeBinary<double, double>;
using DISTANCE_TYPE = PrimitiveType<double>;
GenericExecutor::ExecuteBinary<POINT_TYPE, POINT_TYPE, DISTANCE_TYPE>(
left, right, result, count, [&](POINT_TYPE left, POINT_TYPE right) {
return sgl::math::haversine_distance(left.a_val, left.b_val, right.a_val, right.b_val);
});
}| } | ||
| } | ||
|
|
||
| static double calcAngle(double x1, double y1, double x2, double y2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we use PascalCase for functions - but its not super important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks for catching!
|
Thanks for the review! I'm still not very familiar with DuckDB internals, so that information is really helpful. |
| // If the points are the same, return NULL | ||
| if (left.a_val == right.a_val && left.b_val == right.b_val) { | ||
| // TODO | ||
| // result_mask.SetInvalid(i); | ||
| return 0.0; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unlike BinaryExecutor, it seems there's no variant of *WithNulls, so, if I understand correctly, it's not possible to return NULL.
Personally, I feel this should be NaN for this case instead of NULL, so using NaN might be an option. However, since PostGIS returns NULL, it might be confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, maybe it's easiest to just call .Flatten() on args, then you can assume that all input vectors are FlatVector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to illustrate, this is how it would look if you wanted to handle it the most general way possible, the following will handle all types of input vector layouts:
static void ExecutePoint(DataChunk &args, ExpressionState &state, Vector &result) {
D_ASSERT(args.data.size() == 2);
const auto count = args.size();
auto &lhs = args.data[0];
auto &rhs = args.data[1];
UnifiedVectorFormat lhs_format;
UnifiedVectorFormat rhs_format;
lhs.ToUnifiedFormat(count, lhs_format);
rhs.ToUnifiedFormat(count, rhs_format);
auto &lhs_coords = StructVector::GetEntries(lhs);
auto &rhs_coords = StructVector::GetEntries(rhs);
UnifiedVectorFormat lhs_x_format;
UnifiedVectorFormat lhs_y_format;
lhs_coords[0]->ToUnifiedFormat(count, lhs_x_format);
lhs_coords[1]->ToUnifiedFormat(count, lhs_y_format);
UnifiedVectorFormat rhs_x_format;
UnifiedVectorFormat rhs_y_format;
rhs_coords[0]->ToUnifiedFormat(count, rhs_x_format);
rhs_coords[1]->ToUnifiedFormat(count, rhs_y_format);
auto &validity = FlatVector::Validity(result);
for (idx_t out_idx = 0; out_idx < args.size(); out_idx++) {
const auto lhs_idx = lhs_format.sel->get_index(out_idx);
const auto rhs_idx = rhs_format.sel->get_index(out_idx);
// We assume that the coordinate vectors dont contain any nulls, so we just check if the struct itself is
// null, dont bother checking nested validity.
if (!lhs_format.validity.RowIsValid(lhs_idx) || !rhs_format.validity.RowIsValid(rhs_idx)) {
validity.SetInvalid(out_idx);
continue;
}
const auto lhs_x_idx = lhs_x_format.sel->get_index(out_idx);
const auto lhs_y_idx = lhs_y_format.sel->get_index(out_idx);
const auto rhs_x_idx = rhs_x_format.sel->get_index(out_idx);
const auto rhs_y_idx = rhs_y_format.sel->get_index(out_idx);
const auto &lhs_x_val = UnifiedVectorFormat::GetData<double>(lhs_x_format)[lhs_x_idx];
const auto &lhs_y_val = UnifiedVectorFormat::GetData<double>(lhs_y_format)[lhs_y_idx];
const auto &rhs_x_val = UnifiedVectorFormat::GetData<double>(rhs_x_format)[rhs_x_idx];
const auto &rhs_y_val = UnifiedVectorFormat::GetData<double>(rhs_y_format)[rhs_y_idx];
auto res = /* operation, involving lhs_x_val, lhs_y_val, rhs_x_val, rhs_y_val */ 0.0;
FlatVector::GetData<double>(result)[out_idx] = res;
}
if (args.AllConstant()) {
result.SetVectorType(VectorType::CONSTANT_VECTOR);
}
}But if you really want to maximize performance you switch on the vector.GetVectorType() and specialize cases when all inputs are Flat or Constant - thats kind of what the *Executor helpers do under-the-hood. But again, in this case I would just call .Flatten() to turn everything into flat vectors and call it a day.
|
Thanks for explaining such details! I agree |
|
Thanks! |
|
Thanks for reviewing! |
This pull request implements
ST_Azimuth().I'm not sure if this is very common function considering there's only one feature request in Discussions, but I found I need this.
(Maybe I should retry this after #588 gets merged...?)