Skip to content

Heracles reports do not insert final results when using Spark #2112

@TomWhite-MedStar

Description

@TomWhite-MedStar

Expected behavior

After defining and generating a cohort, it should be possible to use the Reporting tab to generate Quick or Full Analyses.

Actual behavior

This works fine on SQL Server, but not on Spark (e.g. DataBricks).

Steps to reproduce behavior

Need an instance of OHDSI that has an OMOP data source connected via SPARK

  1. Select any cohort
  2. Run Generate on the cohort
  3. Run "Quick Analysis" under the Reporting tab
    After several minutes, the Job will show a Failed status.

Root Cause

After all of the initial processing, the final query looks like this:

insert into result_schema.heracles_results (cohort_definition_id, analysis_id, stratum_1, stratum_2, stratum_3, stratum_4, count_value) select cohort_definition_id, analysis_id, cast(stratum_1 as STRING), cast(stratum_2 as STRING), cast(stratum_3 as STRING), cast(stratum_4 as STRING), count_value from tmp.xrfc9243results_1 UNION ALL ...

However, SPARK SQL does not support INSERT INTO with a list of fields that does not exactly match the full set of columns for the final table.

Modifying the query does work. Either this:

insert into omop_160101_to_220731_v813_results.heracles_results (cohort_definition_id, analysis_id, stratum_1, stratum_2, stratum_3, stratum_4, stratum_5, count_value, last_update_time) select cohort_definition_id, analysis_id, cast(stratum_1 as STRING), cast(stratum_2 as STRING), cast(stratum_3 as STRING), cast(stratum_4 as STRING), '' as stratum_5, count_value, now() from tmp.xrfc9243results_1

or this:

insert into omop_160101_to_220731_v813_results.heracles_results select cohort_definition_id, analysis_id, cast(stratum_1 as STRING), cast(stratum_2 as STRING), cast(stratum_3 as STRING), cast(stratum_4 as STRING), '' as stratum_5, count_value, now() from tmp.xrfc9243results_1

Can the relevant SQL fragments be modified so that this works for Spark?

One of the relevant files appears to be:
https://github.com/OHDSI/WebAPI/blob/master/src/main/resources/resources/cohortanalysis/sql/selectHeraclesResults.sql

But it looks as though a Java file is used to orchestrate the sub-queries:
https://github.com/OHDSI/WebAPI/blob/fd108571086fceea8513389fab426a6ea8101888/src/main/java/org/ohdsi/webapi/cohortanalysis/HeraclesQueryBuilder.java

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions