Skip to content

Conversation

@ephraimbuddy
Copy link
Contributor

When file_parsing_sort_mode is set to "modified_time", the DAG processor previously re-sorted the entire file queue on every bundle refresh, even when no file modification times had changed.

This change caches the last seen modification time for each file in DagFileStat and skips the sort entirely when no mtimes have changed since the last check.

A follow up on #60003

Copy link
Member

@dheerajturaga dheerajturaga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! LGTM

try:
mtime = os.path.getmtime(file.absolute_path)
files_with_mtime[file] = mtime
stat = self._file_stats[file]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
stat = self._file_stats[file]
stat = self._file_stats.get(file)

It may not be in _file_stats yet if its a new file, but that's okay. Not sure what it'd do with the sorting below though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like tests are failing due to that. I will check it properly tomorrow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. So using get resulted in problems and I verified that using self._file_stats[file] creates defaults

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we want to create it though, until we've parsed it yet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This default dict creates entry if missing:

_file_stats: dict[DagFileInfo, DagFileStat] = attrs.field(

ephraimbuddy and others added 4 commits January 23, 2026 10:46
When file_parsing_sort_mode is set to "modified_time", the DAG processor
previously re-sorted the entire file queue on every bundle refresh,
even when no file modification times had changed.

This change caches the last seen modification time for each file in
DagFileStat and skips the sort entirely when no mtimes have changed
since the last check.
Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>
@ephraimbuddy ephraimbuddy force-pushed the improve-dag-processor-mtime-sorting branch from 56edf98 to a96229f Compare January 23, 2026 09:48
return # No changes, skip sorting

# Sort by mtime descending and rebuild queue
sorted_files = [f for f, _ in sorted(files_with_mtime.items(), key=itemgetter(1), reverse=True)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this put new files at the end of the list?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. This is replicating what _resort_by_mtime does but optimizing by avoiding unnecessory resorting.

New files would have most recent mtimes which is higher thus processed first since it's by descending order. Older ones will be done last

@ephraimbuddy ephraimbuddy self-assigned this Jan 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants