-
Notifications
You must be signed in to change notification settings - Fork 17
Bulk data access server #2482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Bulk data access server #2482
Conversation
|
Demonstration: In the future, additional technologies such as databricks may be used. Also authentication and authorization may be performed through the web client to the pathling-server. use the auth in the createTag method, but unsure what the effects are. Is there something "in" the auth object that stays the same across requests so caching still works? Anyhow, some auth information should maybe be part of the tag (if parts stay the same across requests) |
|
@fhnaumann Could you please merge main into this branch? |
|
Why are there some deleted files from the test data directory in the library API? There are other tests that rely upon this data. |
johngrimes
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't actually compile for me yet, and not passing the tests on CI.
I've added some preliminary comments anyway, I can take another look once we have a green build.
As a general comment please also take a look at the CONTRIBUTING.md file and make sure that everything is ticked off there.
library-api/src/main/java/au/csiro/pathling/library/io/sink/NdjsonSink.java
Outdated
Show resolved
Hide resolved
library-api/src/main/java/au/csiro/pathling/library/io/sink/DataSink.java
Outdated
Show resolved
Hide resolved
library-api/src/main/java/au/csiro/pathling/library/io/sink/DataSink.java
Show resolved
Hide resolved
library-api/src/main/java/au/csiro/pathling/library/io/source/QueryableDataSource.java
Outdated
Show resolved
Hide resolved
library-api/src/main/java/au/csiro/pathling/library/io/source/QueryableDataSource.java
Show resolved
Hide resolved
library-api/src/main/java/au/csiro/pathling/library/io/source/TransformChain.java
Outdated
Show resolved
Hide resolved
| @Component | ||
| @Profile("server") | ||
| @Slf4j | ||
| public class ConformanceProvider implements IServerConformanceProvider<CapabilityStatement>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will need to be updated to accurately reflect the capabilities of the server now.
pathling-server/src/main/java/au/csiro/pathling/FhirServer.java
Outdated
Show resolved
Hide resolved
pathling-server/src/main/java/au/csiro/pathling/FhirServer.java
Outdated
Show resolved
Hide resolved
Add comprehensive Javadoc comments to all fields in the Job class, explaining the purpose of each field. Also add missing @param tag for the id parameter in the constructor and add final modifiers to method parameters.
Add missing ExportConfiguration parameter to ExportProvider constructor call in SecurityTestForOperations test.
Change ExportProvider to inject ServerConfiguration instead of ExportConfiguration directly, since nested configurations are not automatically available as Spring beans. Access the export configuration via serverConfiguration.getExport().
Log all response headers in the assertCompleteResult method to aid in debugging and verification of the Expires header configuration.
The FHIR Bulk Data Export manifest now correctly sets requiresAccessToken to true when server authorisation is enabled. This ensures that bulk data clients include the access token when downloading exported files. Also fixes Dependencies.java to use PathlingContext.Builder pattern instead of the non-existent create(SparkSession, EncodingConfiguration, TerminologyConfiguration) method.
6b96ec0 to
8401caa
Compare
…ent scan Prevents the deltaLake() bean from being created during test data import, which was failing because it tried to read Delta tables that hadn't been generated yet.
Implement patient-level and group-level bulk export per FHIR Bulk Data Access specification. Key changes: - Add PatientExportProvider for /Patient/$export and /Patient/[id]/$export - Add GroupExportProvider for /Group/[id]/$export - Add PatientCompartmentService to filter resources by patient compartment - Add ExportOperationHelper to deduplicate export execution logic - Extend ExportRequest with exportLevel and patientIds fields - Update ExportOperationValidator with patient-level validation - Register new providers in FhirServer and ConformanceProvider - Fix code style issues and add missing Javadoc across modified files
The bulk export manifest was generating incorrect result URLs for patient-level exports (e.g. /Patient/$result instead of /$result). This was caused by parsing the request URL to derive the server base, which included the resource type path segment. Changes: - Add serverBaseUrl field to ExportRequest record - Pass requestDetails.getFhirServerBase() from validator to request - Update ExportResponse to use serverBaseUrl directly - Remove backwards-compatible constructors from ExportRequest - Update test utilities to use canonical constructor
Add Javadoc documentation, nullability annotations, and rename ND_JSON constant to NDJSON to follow Java naming conventions.
Add Javadoc documentation, nullability annotations, and final modifiers. Fix redundant registry lookup and improve code formatting.
Clarify that this provider handles system-level bulk exports, consistent with PatientExportProvider and GroupExportProvider naming.
Add Javadoc documentation, nullability annotations, and final modifiers. Remove commented-out code and use Lombok @Getter for requiresAccessToken.
Improve readability by extracting URL conversion logic into a dedicated private method.
Rename to OperationValidation, add Javadoc documentation, nullability annotations, and final modifiers. Update all references.
Implements the Argonaut $bulk-submit specification for receiving bulk data from external systems. The operation supports a multi-phase submission lifecycle (in-progress, complete, aborted) and delegates actual data processing to the existing ImportExecutor. Key components: - BulkSubmitProvider: Main operation endpoint with async support - BulkSubmitStatusProvider: Status checking endpoint - BulkSubmitValidator: Request validation with submitter authorisation - BulkSubmitExecutor: Manifest fetching and file download orchestration - SubmissionRegistry: In-memory state with Hadoop FileSystem persistence - BulkSubmitResultBuilder: Export-style manifest generation Configuration via pathling.bulk-submit.* properties including enabled flag, allowed submitters list, staging location, and allowable source prefixes.
Conditionally include $bulk-submit and $bulk-submit-status operations in the CapabilityStatement when bulk-submit is enabled in configuration. Adds OperationDefinition resources for both operations.
Per Argonaut spec, URL parameters (manifestUrl, fhirBaseUrl, replacesManifestUrl) are string (url) not FHIR url type. This allows clients to send valueString instead of valueUrl.
The Argonaut spec uses headerName/headerValue for the fileRequestHeader parts, not name/value. Also makes header extraction optional to gracefully skip empty or incomplete headers.
Clients like bulk-submit-provider send Accept: application/json rather than application/fhir+json. Using lenient validation allows these requests to proceed by automatically adding the required header.
Add @AsyncSupported annotation so the operation returns 202 Accepted with Content-Location pointing to $job endpoint for polling. The method blocks until the submission completes, enabling clients to poll via GET.
Per Argonaut spec, manifestUrl may be omitted when setting submissionStatus to complete. The manifest details can come from a previous in-progress request. This change: - Removes validator requirement for manifestUrl on complete status - Stores manifest details in handleInProgressSubmission when provided - Uses stored manifest details in handleCompleteSubmission if not provided in the request
Add submissionId and submissionStatus to request URL to ensure different bulk-submit requests get unique async job cache tags. Also fix state management so withManifestDetails() preserves current state rather than automatically transitioning to PROCESSING.
Add WireMock-based integration test that exercises the bulk-submit workflow with stubbed manifest and NDJSON data endpoints. Fix test helper in BulkSubmitResultBuilderTest to correctly set PROCESSING state. Update CONTRIBUTING.md with correct command for running specific integration tests without unit tests.
|




Renamed the branch to reflect the planned changes more accurately.
Relates to #2467, #1987 and #1986.
In the future, this PR may also include #2476 and #1988