-
-
Notifications
You must be signed in to change notification settings - Fork 36
Description
How could the content be improved?
I was running into some problems using the JOIN queries in Episode 6. The query would take hours and simply time out. This could be because when the JOIN command is run, for example, between gaiadr2.source_id and panstarrs1_best_neighbor.source_id, it is perhaps evaluated for the entire database before filtering for ra and dec (Source: ChatGPT -- so take explanation with a grain of salt).
I eventually figured out how to create subqueries and how to force the order of the queries so that only the filtered Gaia results were considered during the join with panstarrs1_best_neighbor. However, the operation still seemed to take over an hour. So in addition to that, I found that filtering out just the columns needed from each table (using another subquery) and using the OFFSET 0 modifier in order to to force the order of operations (see Gaia tutorials on combining tables) reduced the query time from hours (or completely stalling) to 30 seconds or less.
Here is an example, which could replace the neighbours_query string in the tutorial:
"""SELECT
gaia.source_id, gaia.ra, gaia.dec, gaia.pmra, gaia.pmdec,
best.source_id, best.original_ext_source_id,
best.best_neighbour_multiplicity, best.number_of_mates
FROM (
SELECT source_id, ra, dec, pmra, pmdec
FROM gaiadr2.gaia_source
WHERE 1=CONTAINS(
POINT(ra, dec),
CIRCLE(88.8, 7.4, 0.08333333)
)
OFFSET 0) as gaia
JOIN (
SELECT source_id, original_ext_source_id, best_neighbour_multiplicity, number_of_mates
FROM gaiadr2.panstarrs1_best_neighbour
OFFSET 0) AS best
ON gaia.source_id = best.source_id
"""
This particular query ran in less than 10 seconds. It looks a lot more complicated, but it could make things easier for folks who are trying to execute this workshop in one day.
Which part of the content does your suggestion apply to?
Episode 6