Skip to content

Conversation

aaronweeden
Copy link
Contributor

@aaronweeden aaronweeden commented Sep 5, 2025

Description

This PR changes the Data Export batch processor to use unbuffered queries, which makes it use less memory on the web server.

Because the query is unbuffered, the number of rows can no longer be counted until the query has completed, so this PR removes the debugging statement that prints the row count.

This PR also removes the lines from the raw data REST endpoint in WarehouseControllerProvider that set the unbuffered query mode since this is now handled by the BatchDataset code that is used by both the raw data endpoint and the Data Export batch processor.

Motivation and Context

This fixes a bug in which, if a large amount of raw data are requested to be exported, the batch processor can run out of memory on the web server.

Tests performed

On my developer port of xdmod-dev, I added a debugging message to the end of batch_export_manager.php that prints the value of memory_get_peak_usage(), and I made various sizes of Data Export requests and ran the script. For the old buffered query, the peak memory usage scaled up as the number of days to export increased. For the new unbuffered query, the peak memory usage stayed at around 6–7MB even as the number of days to export increased.

There had been a data export on prod that caused it to run out of memory and crash; I tried this same export on my dev port and confirmed it worked. The parameters were:

  • Realm: SUPREMM
  • Start Date: 2025-01-01
  • End Date: 2025-08-21
  • Format: JSON

I also tested to make sure the /rest/warehouse/raw-data endpoint still works.

Checklist:

  • The pull request description is suitable for a Changelog entry
  • The milestone is set correctly on the pull request
  • The appropriate labels have been added to the pull request

@aaronweeden aaronweeden added this to the ACCESS 11.0.2 p3 milestone Sep 5, 2025
@aaronweeden aaronweeden added the bug Bugfixes label Sep 5, 2025
@aaronweeden aaronweeden marked this pull request as draft September 8, 2025 14:39
@aaronweeden aaronweeden marked this pull request as ready for review September 8, 2025 15:58
@aaronweeden aaronweeden requested a review from jpwhite4 September 8, 2025 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bugfixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants