Skip to content

Conversation

@yutannihilation
Copy link
Contributor

This pull request implements ST_Azimuth().

I'm not sure if this is very common function considering there's only one feature request in Discussions, but I found I need this.

SELECT degrees(ST_Azimuth(ST_Point(0, 0), ST_Point(0, 1)));
----
90.0

(Maybe I should retry this after #588 gets merged...?)

Copy link
Member

@Maxxen Maxxen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thanks a lot for picking this up!
I've left some small comments, when you address them you might want to switch the target branch to v1.3.0 too!

Comment on lines +2203 to +2208
auto left_x = FlatVector::GetData<double>(*left_entries[0]);
auto left_y = FlatVector::GetData<double>(*left_entries[1]);
auto right_x = FlatVector::GetData<double>(*right_entries[0]);
auto right_y = FlatVector::GetData<double>(*right_entries[1]);

auto &result_mask = FlatVector::Validity(result);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't assume that the input data is a FlatVector (unless you call .Flatten() on the DataChunk first). You can either do the vector access manually by setting up UnifiedVectorFormats but it's much easier to use the Binary/Unary/GenericExector helper classes in simple cases. For STRUCT inputs, you can use the GenericExectuor. As an example, have a look at e.g. ST_Distance_Spheroid:

	//------------------------------------------------------------------------------------------------------------------
	// POINT_2D
	//------------------------------------------------------------------------------------------------------------------
	static void ExecutePoint(DataChunk &args, ExpressionState &state, Vector &result) {
		D_ASSERT(args.data.size() == 2);
		auto &left = args.data[0];
		auto &right = args.data[1];
		auto count = args.size();

		using POINT_TYPE = StructTypeBinary<double, double>;
		using DISTANCE_TYPE = PrimitiveType<double>;

		GenericExecutor::ExecuteBinary<POINT_TYPE, POINT_TYPE, DISTANCE_TYPE>(
		    left, right, result, count, [&](POINT_TYPE left, POINT_TYPE right) {
			    return sgl::math::haversine_distance(left.a_val, left.b_val, right.a_val, right.b_val);
		    });
	}

}
}

static double calcAngle(double x1, double y1, double x2, double y2) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we use PascalCase for functions - but its not super important.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks for catching!

@yutannihilation
Copy link
Contributor Author

Thanks for the review! I'm still not very familiar with DuckDB internals, so that information is really helpful.

@yutannihilation yutannihilation changed the base branch from main to v1.3.0 June 1, 2025 12:38
Comment on lines 2215 to 2220
// If the points are the same, return NULL
if (left.a_val == right.a_val && left.b_val == right.b_val) {
// TODO
// result_mask.SetInvalid(i);
return 0.0;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlike BinaryExecutor, it seems there's no variant of *WithNulls, so, if I understand correctly, it's not possible to return NULL.

Personally, I feel this should be NaN for this case instead of NULL, so using NaN might be an option. However, since PostGIS returns NULL, it might be confusing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, maybe it's easiest to just call .Flatten() on args, then you can assume that all input vectors are FlatVector.

Copy link
Member

@Maxxen Maxxen Jun 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to illustrate, this is how it would look if you wanted to handle it the most general way possible, the following will handle all types of input vector layouts:

static void ExecutePoint(DataChunk &args, ExpressionState &state, Vector &result) {
	D_ASSERT(args.data.size() == 2);

	const auto count = args.size();

	auto &lhs =	args.data[0];
	auto &rhs = args.data[1];

	UnifiedVectorFormat lhs_format;
	UnifiedVectorFormat rhs_format;

	lhs.ToUnifiedFormat(count, lhs_format);
	rhs.ToUnifiedFormat(count, rhs_format);

	auto &lhs_coords = StructVector::GetEntries(lhs);
	auto &rhs_coords = StructVector::GetEntries(rhs);

	UnifiedVectorFormat lhs_x_format;
	UnifiedVectorFormat lhs_y_format;
	lhs_coords[0]->ToUnifiedFormat(count, lhs_x_format);
	lhs_coords[1]->ToUnifiedFormat(count, lhs_y_format);

	UnifiedVectorFormat rhs_x_format;
	UnifiedVectorFormat rhs_y_format;
	rhs_coords[0]->ToUnifiedFormat(count, rhs_x_format);
	rhs_coords[1]->ToUnifiedFormat(count, rhs_y_format);


	auto &validity = FlatVector::Validity(result);

	for (idx_t out_idx = 0; out_idx < args.size(); out_idx++) {
		const auto lhs_idx = lhs_format.sel->get_index(out_idx);
		const auto rhs_idx = rhs_format.sel->get_index(out_idx);

		// We assume that the coordinate vectors dont contain any nulls, so we just check if the struct itself is
		// null, dont bother checking nested validity.
		if (!lhs_format.validity.RowIsValid(lhs_idx) || !rhs_format.validity.RowIsValid(rhs_idx)) {
			validity.SetInvalid(out_idx);
			continue;
		}

		const auto lhs_x_idx = lhs_x_format.sel->get_index(out_idx);
		const auto lhs_y_idx = lhs_y_format.sel->get_index(out_idx);
		const auto rhs_x_idx = rhs_x_format.sel->get_index(out_idx);
		const auto rhs_y_idx = rhs_y_format.sel->get_index(out_idx);

		const auto &lhs_x_val = UnifiedVectorFormat::GetData<double>(lhs_x_format)[lhs_x_idx];
		const auto &lhs_y_val = UnifiedVectorFormat::GetData<double>(lhs_y_format)[lhs_y_idx];
		const auto &rhs_x_val = UnifiedVectorFormat::GetData<double>(rhs_x_format)[rhs_x_idx];
		const auto &rhs_y_val = UnifiedVectorFormat::GetData<double>(rhs_y_format)[rhs_y_idx];

		auto res = /* operation, involving lhs_x_val, lhs_y_val, rhs_x_val, rhs_y_val  */ 0.0;

		FlatVector::GetData<double>(result)[out_idx] = res;
	}

	if (args.AllConstant()) {
		result.SetVectorType(VectorType::CONSTANT_VECTOR);
	}
}

But if you really want to maximize performance you switch on the vector.GetVectorType() and specialize cases when all inputs are Flat or Constant - thats kind of what the *Executor helpers do under-the-hood. But again, in this case I would just call .Flatten() to turn everything into flat vectors and call it a day.

@yutannihilation yutannihilation marked this pull request as draft June 1, 2025 12:43
@yutannihilation
Copy link
Contributor Author

Thanks for explaining such details! I agree .Flatten() is handy here. Considering the POINT2D case would be probably less frequent than GEOMETRY, we don't need to worry much about the performance. Hopefully, DuckDB will eventually get GenericExecutor::ExecuteBinaryWithNulls so that I can switch the implementation.

@yutannihilation yutannihilation marked this pull request as ready for review June 3, 2025 00:16
@Maxxen
Copy link
Member

Maxxen commented Jun 3, 2025

Thanks!

@Maxxen Maxxen merged commit c7f5bd4 into duckdb:v1.3.0 Jun 3, 2025
22 checks passed
@yutannihilation yutannihilation deleted the feat/st_azimuth branch June 3, 2025 13:17
@yutannihilation
Copy link
Contributor Author

Thanks for reviewing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants