Skip to content

JOIN commands can be improved to execute in less time #269

@eblur

Description

@eblur

How could the content be improved?

I was running into some problems using the JOIN queries in Episode 6. The query would take hours and simply time out. This could be because when the JOIN command is run, for example, between gaiadr2.source_id and panstarrs1_best_neighbor.source_id, it is perhaps evaluated for the entire database before filtering for ra and dec (Source: ChatGPT -- so take explanation with a grain of salt).

I eventually figured out how to create subqueries and how to force the order of the queries so that only the filtered Gaia results were considered during the join with panstarrs1_best_neighbor. However, the operation still seemed to take over an hour. So in addition to that, I found that filtering out just the columns needed from each table (using another subquery) and using the OFFSET 0 modifier in order to to force the order of operations (see Gaia tutorials on combining tables) reduced the query time from hours (or completely stalling) to 30 seconds or less.

Here is an example, which could replace the neighbours_query string in the tutorial:

"""SELECT 
gaia.source_id, gaia.ra, gaia.dec, gaia.pmra, gaia.pmdec, 
best.source_id, best.original_ext_source_id,
best.best_neighbour_multiplicity, best.number_of_mates
FROM (
  SELECT source_id, ra, dec, pmra, pmdec
  FROM gaiadr2.gaia_source
  WHERE 1=CONTAINS(
      POINT(ra, dec),
      CIRCLE(88.8, 7.4, 0.08333333)
      )
  OFFSET 0) as gaia
JOIN (
        SELECT source_id, original_ext_source_id, best_neighbour_multiplicity, number_of_mates
        FROM gaiadr2.panstarrs1_best_neighbour
    OFFSET 0) AS best
    ON gaia.source_id = best.source_id
"""

This particular query ran in less than 10 seconds. It looks a lot more complicated, but it could make things easier for folks who are trying to execute this workshop in one day.

Which part of the content does your suggestion apply to?

Episode 6

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions