-
Notifications
You must be signed in to change notification settings - Fork 138
lib: replace xbps_file_hash_check_dictionary with transaction_file lookup #359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…okup
xbps_file_hash_check_dictionary was called for every file that is
getting unpacked, because the files list is an array it iterated over
the whole files array to find the matching file.
With a lot of files this is really slow and a lot of time was spend
in locking the proplib array and iterating over it.
Time from my Ryzen 3700X system with nvme disk updating
texlive-fontsextra with files 86375:
time xbps-install -y texlive-fontsextra
6m34.61s real 6m27.71s user 0m03.95s system
And with this patch:
time xbps-install -y texlive-fontsextra
0m08.40s real 0m07.34s user 0m00.98s system
| xbps_dbg_printf(xhp, "%s: %s\n", entry_pname, file->sha256); | ||
| assert(file->sha256); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be nice to make it clear that a (null) there is an error? Not sure.
file->sha256 ? file->sha256 : "error..."
|
For the record, a good test case here seems to be updating papirus-icon-theme. It took 2m37s going from I will build this locally and see how better it is. EDIT: I was testing on a loaded machine, so it isn't a clean test, but this PR still took 52s. Without load, 31s. |
|
This should greatly reduce the time needed, not sure why this example with more files is faster, maybe its just a lot more symlinks or smaller files that would need to be checksumed. For the implementation, I think I would rename a few things, |
| TYPE_DIR, | ||
| TYPE_FILE, | ||
| TYPE_CONFFILE, | ||
| } transaction_file_type_t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically speaking *_t is reserved for libc (https://www.gnu.org/software/libc/manual/html_node/Reserved-Names.html), but everyone uses it anyway.
| const char *pkgname; | ||
| const char *pkgver; | ||
| char *sha256; | ||
| const char *target; | ||
| uint64_t size; | ||
| transaction_file_type_t type; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would these first 6 be the xbps_file struct? If so, I guess transaction_file_type would be switched over to file_type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we need two struct a xbps_transaction_file and a xbps_file which is part of the former.
The idea would be to start using the xbps_file struct in places where we currently use the xbps_dictionary references to files.plist so that we gradually switch from dynamic plist types to actual structs which should help to avoid some issues at compile time through the type system.
I think xbps_file would be a simple way to start, doing this for packages would be the end goal, there are so many places that use xbps_dictionarys for packages and there are a lot of places that can be improved on the way to switching to c structs.
| if (sha256) | ||
| item->new.sha256 = strdup(sha256); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is item->new always guaranteed to be zero initialized? Might be reasonable to add an else item->new.sha256 = NULL;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally everything is zero initialized, IMHO not doing that by default when initializing the memory should only be done if you know you overwrite it immediately.
| void xbps_transaction_files_free(void); | ||
| int HIDDEN xbps_transaction_fetch(struct xbps_handle *, | ||
| xbps_object_iterator_t); | ||
| int HIDDEN xbps_transaction_pkg_deps(struct xbps_handle *, xbps_array_t, xbps_dictionary_t); | ||
|
|
||
| struct transaction_file *xbps_transaction_file_new(struct xbps_handle *, const char *); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming is a bit confusing, in that xbps_transaction_file_new was added, but no xbps_transaction_file_free, and xbps_transaction_files_free was added, without a corresponding _new, and doesn't seem to be called at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes the _new does not actually create anything, it gives you a reference to a file struct that already exists from collecting all files in the transaction, "_new" because it returns the new file as opposed to the old file.
Not sure how this should be done exactly, something like xbps_transaction_get_file with either _new or _old or a parameter to choose which one you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xbps_transaction_files_free deletes the whole hashtable/tree, this should be called from somewhere after the transaction is done, maybe from xbps_transaction_commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how this should be done exactly, something like xbps_transaction_get_file with either _new or _old or a parameter to choose which one you want.
I think the parameter helps avoid duplicated logic, so I'm in favor of that.
xbps_file_hash_check_dictionarywas called for every file that isgetting unpacked, because the files list is an array it iterated over
the whole files array to find the matching file.
With a lot of files this is really slow and a lot of time was spend
in locking the proplib array and iterating over it.
Time from my Ryzen 3700X system with nvme disk updating
texlive-fontsextra with files 86375:
And with this patch: