Skip to content

Conversation

@dsolistorres
Copy link
Contributor

@dsolistorres dsolistorres commented Dec 8, 2025

Closes #33661

This PR addresses performance issues and pagination errors in the site copy job by implementing ElasticSearch Scroll API for a large result set.

Proposed Changes

  • When copying sites with large numbers of contentlets, the copy host job was encountering deep pagination errors when the offset exceeded ElasticSearch's max_result_window (100,000), and also performance degradation with offset-based pagination for large result sets.
  • A refactoring was done on the indexSearchScroll method from the ESContentFactoryImp class to expose the ES scroll API in a new wrapper interface ESContentletScroll. The PaginatedContentlets class uses this new interface to iterate on results using the ES scroll API.
  • SQL queries in HostFactoryImpl were optimized to use structure_inode field from contentlet table to filter hosts, and also to use the ILIKE clause in SQL conditions to match case insensitive values.

Checklist

  • Tests

@dsolistorres dsolistorres force-pushed the issue-33661-optimize-copy-host-job branch 2 times, most recently from 9ef0091 to 163e018 Compare December 12, 2025 23:34
@dsolistorres dsolistorres force-pushed the issue-33661-optimize-copy-host-job branch from 39318d2 to fec8286 Compare December 29, 2025 18:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[DEFECT] Copy Host operation suffers severe performance degradation with large content volume

6 participants