-
Notifications
You must be signed in to change notification settings - Fork 176
Description
Expected behavior
After defining and generating a cohort, it should be possible to use the Reporting tab to generate Quick or Full Analyses.
Actual behavior
This works fine on SQL Server, but not on Spark (e.g. DataBricks).
Steps to reproduce behavior
Need an instance of OHDSI that has an OMOP data source connected via SPARK
- Select any cohort
- Run Generate on the cohort
- Run "Quick Analysis" under the Reporting tab
After several minutes, the Job will show a Failed status.
Root Cause
After all of the initial processing, the final query looks like this:
insert into result_schema.heracles_results (cohort_definition_id, analysis_id, stratum_1, stratum_2, stratum_3, stratum_4, count_value) select cohort_definition_id, analysis_id, cast(stratum_1 as STRING), cast(stratum_2 as STRING), cast(stratum_3 as STRING), cast(stratum_4 as STRING), count_value from tmp.xrfc9243results_1 UNION ALL ...
However, SPARK SQL does not support INSERT INTO with a list of fields that does not exactly match the full set of columns for the final table.
Modifying the query does work. Either this:
insert into omop_160101_to_220731_v813_results.heracles_results (cohort_definition_id, analysis_id, stratum_1, stratum_2, stratum_3, stratum_4, stratum_5, count_value, last_update_time) select cohort_definition_id, analysis_id, cast(stratum_1 as STRING), cast(stratum_2 as STRING), cast(stratum_3 as STRING), cast(stratum_4 as STRING), '' as stratum_5, count_value, now() from tmp.xrfc9243results_1
or this:
insert into omop_160101_to_220731_v813_results.heracles_results select cohort_definition_id, analysis_id, cast(stratum_1 as STRING), cast(stratum_2 as STRING), cast(stratum_3 as STRING), cast(stratum_4 as STRING), '' as stratum_5, count_value, now() from tmp.xrfc9243results_1
Can the relevant SQL fragments be modified so that this works for Spark?
One of the relevant files appears to be:
https://github.com/OHDSI/WebAPI/blob/master/src/main/resources/resources/cohortanalysis/sql/selectHeraclesResults.sql
But it looks as though a Java file is used to orchestrate the sub-queries:
https://github.com/OHDSI/WebAPI/blob/fd108571086fceea8513389fab426a6ea8101888/src/main/java/org/ohdsi/webapi/cohortanalysis/HeraclesQueryBuilder.java