Bulk data access server #2482

fhnaumann · 2025-09-10T23:29:06Z

Renamed the branch to reflect the planned changes more accurately.

Relates to #2467, #1987 and #1986.
In the future, this PR may also include #2476 and #1988

…r-server

fhnaumann · 2025-09-16T02:07:59Z

Demonstration:
An AidBox server is deployed to Kubernetes. A small script pulls data from it (using bulk export?) and converts the ndjson files to parquet files (delta tables). The developed pathling-server uses that data as input and it can be requested to perform bulk export on it. Requests are made through a web client. Every component is dockerized, packaged using helm charts and then deployed to Kubernetes.

In the future, additional technologies such as databricks may be used. Also authentication and authorization may be performed through the web client to the pathling-server.

use the auth in the createTag method, but unsure what the effects are. Is there something "in" the auth object that stays the same across requests so caching still works? Anyhow, some auth information should maybe be part of the tag (if parts stay the same across requests)

… are still running

johngrimes · 2025-09-18T01:40:46Z

@fhnaumann Could you please merge main into this branch?

johngrimes · 2025-09-18T04:51:51Z

Why are there some deleted files from the test data directory in the library API? There are other tests that rely upon this data.

johngrimes

Doesn't actually compile for me yet, and not passing the tests on CI.

I've added some preliminary comments anyway, I can take another look once we have a green build.

As a general comment please also take a look at the CONTRIBUTING.md file and make sure that everything is ticked off there.

library-api/src/main/java/au/csiro/pathling/library/io/sink/NdjsonSink.java

library-api/src/main/java/au/csiro/pathling/library/io/sink/DataSink.java

library-api/src/main/java/au/csiro/pathling/library/io/source/QueryableDataSource.java

library-api/src/main/java/au/csiro/pathling/library/io/source/TransformChain.java

johngrimes · 2025-09-18T04:54:53Z

server/src/main/java/au/csiro/pathling/fhir/ConformanceProvider.java

+@Component
+@Profile("server")
+@Slf4j
+public class ConformanceProvider implements IServerConformanceProvider<CapabilityStatement>,


This will need to be updated to accurately reflect the capabilities of the server now.

pathling-server/src/main/java/au/csiro/pathling/FhirServer.java

pathling-server/pom.xml

@param

Add comprehensive Javadoc comments to all fields in the Job class, explaining the purpose of each field. Also add missing @param tag for the id parameter in the constructor and add final modifiers to method parameters.

Add missing ExportConfiguration parameter to ExportProvider constructor call in SecurityTestForOperations test.

Change ExportProvider to inject ServerConfiguration instead of ExportConfiguration directly, since nested configurations are not automatically available as Spring beans. Access the export configuration via serverConfiguration.getExport().

Log all response headers in the assertCompleteResult method to aid in debugging and verification of the Expires header configuration.

The FHIR Bulk Data Export manifest now correctly sets requiresAccessToken to true when server authorisation is enabled. This ensures that bulk data clients include the access token when downloading exported files. Also fixes Dependencies.java to use PathlingContext.Builder pattern instead of the non-existent create(SparkSession, EncodingConfiguration, TerminologyConfiguration) method.

…ent scan Prevents the deltaLake() bean from being created during test data import, which was failing because it tried to read Delta tables that hadn't been generated yet.

Implement patient-level and group-level bulk export per FHIR Bulk Data Access specification. Key changes: - Add PatientExportProvider for /Patient/$export and /Patient/[id]/$export - Add GroupExportProvider for /Group/[id]/$export - Add PatientCompartmentService to filter resources by patient compartment - Add ExportOperationHelper to deduplicate export execution logic - Extend ExportRequest with exportLevel and patientIds fields - Update ExportOperationValidator with patient-level validation - Register new providers in FhirServer and ConformanceProvider - Fix code style issues and add missing Javadoc across modified files

The bulk export manifest was generating incorrect result URLs for patient-level exports (e.g. /Patient/$result instead of /$result). This was caused by parsing the request URL to derive the server base, which included the resource type path segment. Changes: - Add serverBaseUrl field to ExportRequest record - Pass requestDetails.getFhirServerBase() from validator to request - Update ExportResponse to use serverBaseUrl directly - Remove backwards-compatible constructors from ExportRequest - Update test utilities to use canonical constructor

Add Javadoc documentation, nullability annotations, and rename ND_JSON constant to NDJSON to follow Java naming conventions.

Add Javadoc documentation, nullability annotations, and final modifiers. Fix redundant registry lookup and improve code formatting.

Clarify that this provider handles system-level bulk exports, consistent with PatientExportProvider and GroupExportProvider naming.

@Getter

Add Javadoc documentation, nullability annotations, and final modifiers. Remove commented-out code and use Lombok @Getter for requiresAccessToken.

Improve readability by extracting URL conversion logic into a dedicated private method.

Rename to OperationValidation, add Javadoc documentation, nullability annotations, and final modifiers. Update all references.

Implements the Argonaut $bulk-submit specification for receiving bulk data from external systems. The operation supports a multi-phase submission lifecycle (in-progress, complete, aborted) and delegates actual data processing to the existing ImportExecutor. Key components: - BulkSubmitProvider: Main operation endpoint with async support - BulkSubmitStatusProvider: Status checking endpoint - BulkSubmitValidator: Request validation with submitter authorisation - BulkSubmitExecutor: Manifest fetching and file download orchestration - SubmissionRegistry: In-memory state with Hadoop FileSystem persistence - BulkSubmitResultBuilder: Export-style manifest generation Configuration via pathling.bulk-submit.* properties including enabled flag, allowed submitters list, staging location, and allowable source prefixes.

Conditionally include $bulk-submit and $bulk-submit-status operations in the CapabilityStatement when bulk-submit is enabled in configuration. Adds OperationDefinition resources for both operations.

Per Argonaut spec, URL parameters (manifestUrl, fhirBaseUrl, replacesManifestUrl) are string (url) not FHIR url type. This allows clients to send valueString instead of valueUrl.

The Argonaut spec uses headerName/headerValue for the fileRequestHeader parts, not name/value. Also makes header extraction optional to gracefully skip empty or incomplete headers.

Clients like bulk-submit-provider send Accept: application/json rather than application/fhir+json. Using lenient validation allows these requests to proceed by automatically adding the required header.

Add @AsyncSupported annotation so the operation returns 202 Accepted with Content-Location pointing to $job endpoint for polling. The method blocks until the submission completes, enabling clients to poll via GET.

Per Argonaut spec, manifestUrl may be omitted when setting submissionStatus to complete. The manifest details can come from a previous in-progress request. This change: - Removes validator requirement for manifestUrl on complete status - Stores manifest details in handleInProgressSubmission when provided - Uses stored manifest details in handleCompleteSubmission if not provided in the request

Add submissionId and submissionStatus to request URL to ensure different bulk-submit requests get unique async job cache tags. Also fix state management so withManifestDetails() preserves current state rather than automatically transitioning to PROCESSING.

Add WireMock-based integration test that exercises the bulk-submit workflow with stubbed manifest and NDJSON data endpoints. Fix test helper in BulkSubmitResultBuilderTest to correctly set PROCESSING state. Update CONTRIBUTING.md with correct command for running specific integration tests without unit tests.

sonarqubecloud · 2025-11-28T04:36:01Z

Quality Gate failed

Failed conditions
D Reliability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

Felix Naumann added 13 commits September 1, 2025 14:33

return data in sink write operation

ccf567b

add filtering parameter for resource types on read

9306dce

filter method on source

8d761c4

rename filter method

cb06a4f

Make constructor public so tests can extend from the class

d89d12c

copy pathling-server to this branch

4871a76

modify gitignore to exclude files in pathling-server

4abf4ea

fix NPE in Expressions.scala

6481053

Merge branch 'main' into pathling-8-server

b23138e

test _until parameter

3626c4f

implement and test bulk data delete request

c4d83ab

implement caching layer and copy existing async tests from v7.2.0 fhi…

25aa453

…r-server

perform actual file deletion when bulk-delete requested

2e7b731

johngrimes added this to Pathling Sep 15, 2025

github-project-automation bot moved this to Backlog in Pathling Sep 15, 2025

johngrimes moved this from Backlog to In progress in Pathling Sep 15, 2025

johngrimes added fhirpath Related to fhirpath reference implementation new feature New feature or request labels Sep 15, 2025

Felix Naumann added 3 commits September 15, 2025 14:26

modify async code triggering for comma delimitered prefer header values

49cf828

modified gitignore

333546f

changes so server is in accordance with various client implementations

a07ee92

Felix Naumann added 3 commits September 16, 2025 14:14

create job folders in tempdir instead of in test resource dir

09c06ed

delete files from disk when bulk export is cancelled while spark jobs…

d3d3711

… are still running

copy security files from v7 server

b0b2d62

johngrimes self-requested a review September 18, 2025 01:39

johngrimes requested changes Sep 18, 2025

View reviewed changes

johngrimes assigned fhnaumann Sep 19, 2025

johngrimes added 7 commits November 15, 2025 18:52

Add Javadoc documentation to all fields in Job class

f5e4772

Add comprehensive Javadoc comments to all fields in the Job class, explaining the purpose of each field. Also add missing @param tag for the id parameter in the constructor and add final modifiers to method parameters.

Fix ExportProvider constructor call in test

4536553

Add missing ExportConfiguration parameter to ExportProvider constructor call in SecurityTestForOperations test.

Add response header logging to testExportValid

6063640

Log all response headers in the assertCompleteResult method to aid in debugging and verification of the Expires header configuration.

Change response header logging to trace level

e9b371a

Merge remote-tracking branch 'origin/main' into pathling-8-server

4476b2d

johngrimes force-pushed the pathling-8-server branch from 6b96ec0 to 8401caa Compare November 25, 2025 07:52

johngrimes added 21 commits November 25, 2025 19:23

fix: Exclude FhirServerTestConfiguration from TestDataImporter compon…

c26a42c

…ent scan Prevents the deltaLake() bean from being created during test data import, which was failing because it tried to read Delta tables that hadn't been generated yet.

chore: Remove server/CLAUDE.md

80128ac

refactor: Improve ExportOutputFormat code quality

efc3e0a

Add Javadoc documentation, nullability annotations, and rename ND_JSON constant to NDJSON to follow Java naming conventions.

refactor: Improve ExportResultProvider code quality

6498497

Add Javadoc documentation, nullability annotations, and final modifiers. Fix redundant registry lookup and improve code formatting.

refactor: Rename ExportProvider to SystemExportProvider

e197f6a

Clarify that this provider handles system-level bulk exports, consistent with PatientExportProvider and GroupExportProvider naming.

refactor: Improve ExportResponse code quality

353def5

Add Javadoc documentation, nullability annotations, and final modifiers. Remove commented-out code and use Lombok @Getter for requiresAccessToken.

refactor: Extract buildResultUrl method in ExportResponse

1e7c1c8

Improve readability by extracting URL conversion logic into a dedicated private method.

refactor: Improve OperationValidatorUtil code quality

f9c2c74

Rename to OperationValidation, add Javadoc documentation, nullability annotations, and final modifiers. Update all references.

feat: Add bulk-submit operations to CapabilityStatement

a1bf44a

Conditionally include $bulk-submit and $bulk-submit-status operations in the CapabilityStatement when bulk-submit is enabled in configuration. Adds OperationDefinition resources for both operations.

fix: Use string type for URL parameters in bulk-submit

111d5a1

Per Argonaut spec, URL parameters (manifestUrl, fhirBaseUrl, replacesManifestUrl) are string (url) not FHIR url type. This allows clients to send valueString instead of valueUrl.

fix: Use correct part names for fileRequestHeader in bulk-submit

da35c28

The Argonaut spec uses headerName/headerValue for the fileRequestHeader parts, not name/value. Also makes header extraction optional to gracefully skip empty or incomplete headers.

fix: Use lenient Accept header validation in bulk-submit

0c337e8

Clients like bulk-submit-provider send Accept: application/json rather than application/fhir+json. Using lenient validation allows these requests to proceed by automatically adding the required header.

feat: Integrate $bulk-submit-status with async Job framework

9775fd7

Add @AsyncSupported annotation so the operation returns 202 Accepted with Content-Location pointing to $job endpoint for polling. The method blocks until the submission completes, enabling clients to poll via GET.

chore: Remove unused regex pattern for resource type matching

9d25593

fix: Skip the import process if test data files already exist

926109a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bulk data access server #2482

Bulk data access server #2482

Uh oh!

fhnaumann commented Sep 10, 2025 •

edited

Loading

Uh oh!

fhnaumann commented Sep 16, 2025 •

edited

Loading

Uh oh!

johngrimes commented Sep 18, 2025

Uh oh!

johngrimes commented Sep 18, 2025

Uh oh!

johngrimes left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

johngrimes Sep 18, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sonarqubecloud bot commented Nov 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Bulk data access server #2482

Are you sure you want to change the base?

Bulk data access server #2482

Uh oh!

Conversation

fhnaumann commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fhnaumann commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

johngrimes commented Sep 18, 2025

Uh oh!

johngrimes commented Sep 18, 2025

Uh oh!

johngrimes left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

johngrimes Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sonarqubecloud bot commented Nov 28, 2025

Quality Gate failed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fhnaumann commented Sep 10, 2025 •

edited

Loading

fhnaumann commented Sep 16, 2025 •

edited

Loading