Skip to content

Conversation

ghalliday
Copy link
Member

@ghalliday ghalliday commented Sep 24, 2025

Type of change:

  • This change is a bug fix (non-breaking change which fixes an issue).
  • This change is a new feature (non-breaking change which adds functionality).
  • This change improves the code (refactor or other change that does not change the functionality)
  • This change fixes warnings (the fix does not alter the functionality or the generated code)
  • This change is a breaking change (fix or feature that will cause existing behavior to change).
  • This change alters the query API (existing queries will have to be recompiled)

Checklist:

  • My code follows the code style of this project.
    • My code does not create any new warnings from compiler, build system, or lint.
  • The commit message is properly formatted and free of typos.
    • The commit message title makes sense in a changelog, by itself.
    • The commit is signed.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly, or...
    • I have created a JIRA ticket to update the documentation.
    • Any new interfaces or exported functions are appropriately commented.
  • I have read the CONTRIBUTORS document.
  • The change has been fully tested:
    • I have added tests to cover my changes.
    • All new and existing tests passed.
    • I have checked that this change does not introduce memory leaks.
    • I have used Valgrind or similar tools to check for potential issues.
  • I have given due consideration to all of the following potential concerns:
    • Scalability
    • Performance
    • Security
    • Thread-safety
    • Cloud-compatibility
    • Premature optimization
    • Existing deployed queries will not be broken
    • This change fixes the problem, not just the symptom
    • The target branch of this pull request is appropriate for such a change.
  • There are no similar instances of the same problem that should be addressed
    • I have addressed them here
    • I have raised JIRA issues to address them separately
  • This is a user interface / front-end modification
    • I have tested my changes in multiple modern browsers
    • The component(s) render as expected

Smoketest:

  • Send notifications about my Pull Request position in Smoketest queue.
  • Test my draft Pull Request.

Testing:

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the CRoxieFileCache::lookupFile method to allow concurrent lookups by changing the synchronization approach and separating file initialization from lookup. The change addresses performance bottlenecks in file access by allowing multiple threads to lookup files simultaneously while maintaining thread safety for initialization.

Key changes:

  • Replaced global critical section with per-file initialization synchronization
  • Added lazy initialization pattern for file objects
  • Restructured lookupFile method to handle concurrent access scenarios

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
system/jlib/jhash.hpp Added static mapToValue helper method for type-safe hash map value extraction
roxie/ccd/ccdfile.cpp Refactored CRoxieFileCache with concurrent lookup support, lazy initialization, and improved synchronization

IArrayOf<IFile> sources;
Owned<IFile> logical;
Owned<IFileIO> current;
Linked<IFileIO> current;
Copy link
Preview

Copilot AI Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing from Owned<IFileIO> to Linked<IFileIO> affects ownership semantics. This change should be carefully reviewed to ensure proper resource management, as Linked<> doesn't automatically manage object lifetime like Owned<> does.

Suggested change
Linked<IFileIO> current;
Owned<IFileIO> current;

Copilot uses AI. Check for mistakes.

Comment on lines +206 to +208
CopyMappingStringToIInterface * map = (CopyMappingStringToIInterface *)_map;
IInterface ** x = &map->getValue();
return x ? (C *)(BASE *)*x : NULL;
Copy link
Preview

Copilot AI Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The C-style casts should be replaced with static_cast for better type safety. Additionally, taking the address of a temporary return value from getValue() is undefined behavior if getValue() returns by value.

Copilot uses AI. Check for mistakes.

if (!resolved->initialised)
{
initializeNewFileInstance(resolved, lfn, fileType, pdesc, remotePDesc, numParts, channel, startFileCopy, partNo, localLocation, dfsSize, dfsDate);
//No possibility of a mistmatch if the entry has just been initialised => return immediately
Copy link
Preview

Copilot AI Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a typo in the comment: 'mistmatch' should be 'mismatch'.

Suggested change
//No possibility of a mistmatch if the entry has just been initialised => return immediately
//No possibility of a mismatch if the entry has just been initialised => return immediately

Copilot uses AI. Check for mistakes.

Comment on lines 1873 to 1874
if (!resolved) // May have been cleared above...
continue; // try again
Copy link
Preview

Copilot AI Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The infinite loop with for (;;) and continue statements could potentially create an endless loop if the file consistently fails to resolve. Consider adding a retry limit or timeout mechanism to prevent infinite loops.

Copilot uses AI. Check for mistakes.

Copy link

Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-35002

Jirabot Action Result:
Changing assignee from: tim.klemm@lexisnexisrisk.com to: gavin.halliday@lexisnexisrisk.com
Workflow Transition To: Merge Pending
Updated PR

@timothyklemm
Copy link
Contributor

@ghalliday there's a typo in the Jira number. I was wondering why you reassigned my "not equal" filter terms issue, then saw a completely unrelated change.

@ghalliday ghalliday changed the title HPCC-35002 CRoxieFileCache::lookupFile should allow concurrent lookups HPCC-35022 CRoxieFileCache::lookupFile should allow concurrent lookups Sep 25, 2025
@ghalliday ghalliday closed this Sep 25, 2025
@ghalliday ghalliday reopened this Sep 25, 2025
Copy link

Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-35022

Jirabot Action Result:
Assigning user: gavin.halliday@lexisnexisrisk.com
Workflow Transition To: Merge Pending
Updated PR

Copy link
Contributor

@mckellyln mckellyln left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ok with all the changes.
Approved.

@mckellyln
Copy link
Contributor

But add checkboxes

@ghalliday ghalliday requested a review from mckellyln October 1, 2025 14:09
Copy link
Contributor

@mckellyln mckellyln left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second commit looks good.
Approved.

…s when adding files

Signed-off-by: Gavin Halliday <gavin.halliday@lexisnexis.com>
@ghalliday ghalliday merged commit 83e6803 into hpcc-systems:master Oct 2, 2025
53 checks passed
Copy link

github-actions bot commented Oct 2, 2025

Jirabot Action Result:
Added fix version: 9.16.0
Workflow Transition: 'Resolve issue'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants