-
Notifications
You must be signed in to change notification settings - Fork 23
Closed
Labels
FlowMachineIssues related to FlowMachineIssues related to FlowMachine
Description
We would like to be able to define a query that returns a specified reference location (we are particularly interested in MajorityLocation
for each subscriber, only if they visited that location on more than 'threshold' days. This could be achieved with something like:
SELECT subscriber, location
FROM {reference_locations}
INNER JOIN {subscriber_locations_subset}
USING (subscriber, location)
where the parameters could be reference_locations=MajorityLocation(...)
and subscriber_locations_subset=LocationVisits(...).numeric_subset(low=threshold, high=max_days)
. Alternatively we could use INTERSECT
or INTERSECT ALL
instead of an inner join - either would treat duplicates differently from the join, and the choice would presumably have performance implications.
This whole step could be achieved using existing methods, using:
reference_locations_query.join(
location_visits_query.numeric_subset(low=threshold, high=max_days)),
how="inner",
left_on=('subscriber', <location_id_columns>),
right_on=('subscriber', <location_id_columns>),
but this is not ideal because:
- we'd like to use this query as an input to
CoalescedLocation
(and quite possibly other queries), which would expect to recognise it as a subscriber location query (i.e. having aspatial_unit
attribute), which is not the case if the query is a genericJoin
query, - exposing generic
Join
through the API wouldn't help, for much the same reason - only specific joins would be suitable parameters for the queries that could consume this one, so we'd need a specific marshmallow schema for those special-case joins, which would mirror the flowmachine query we're proposing to add here.
Metadata
Metadata
Assignees
Labels
FlowMachineIssues related to FlowMachineIssues related to FlowMachine