-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Improve DAG processor performance when sorting by mtime #60864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Improve DAG processor performance when sorting by mtime #60864
Conversation
dheerajturaga
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! LGTM
| try: | ||
| mtime = os.path.getmtime(file.absolute_path) | ||
| files_with_mtime[file] = mtime | ||
| stat = self._file_stats[file] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| stat = self._file_stats[file] | |
| stat = self._file_stats.get(file) |
It may not be in _file_stats yet if its a new file, but that's okay. Not sure what it'd do with the sorting below though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like tests are failing due to that. I will check it properly tomorrow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. So using get resulted in problems and I verified that using self._file_stats[file] creates defaults
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure we want to create it though, until we've parsed it yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This default dict creates entry if missing:
| _file_stats: dict[DagFileInfo, DagFileStat] = attrs.field( |
When file_parsing_sort_mode is set to "modified_time", the DAG processor previously re-sorted the entire file queue on every bundle refresh, even when no file modification times had changed. This change caches the last seen modification time for each file in DagFileStat and skips the sort entirely when no mtimes have changed since the last check.
Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>
56edf98 to
a96229f
Compare
| return # No changes, skip sorting | ||
|
|
||
| # Sort by mtime descending and rebuild queue | ||
| sorted_files = [f for f, _ in sorted(files_with_mtime.items(), key=itemgetter(1), reverse=True)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this put new files at the end of the list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. This is replicating what _resort_by_mtime does but optimizing by avoiding unnecessory resorting.
New files would have most recent mtimes which is higher thus processed first since it's by descending order. Older ones will be done last
When file_parsing_sort_mode is set to "modified_time", the DAG processor previously re-sorted the entire file queue on every bundle refresh, even when no file modification times had changed.
This change caches the last seen modification time for each file in DagFileStat and skips the sort entirely when no mtimes have changed since the last check.
A follow up on #60003