-
-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize bookmark import process #287
Comments
Did you try the import via command line? It was created exactly for this case. |
No, I didn't, although I would expect this to be handled from the UI side as well; I mean as a user I am not really interested in how bookmarks will be processed, until the upload fits into the maximum execution time. But if not, a warning to try with CLI would help: "Over X bookmarks, please use the CLI to import. For usage please visit the docs". (No harm, just sharing my user experience.) I started with the setup docs and rushed to import when site was ready. The CLI seems to have problem with resolving absolute paths. It needs to be relative to storage. I think that should be fixed. Please see output below: $ sudo php artisan links:import ~/pinboard_export.2021.06.11_09.25.html
You will be asked to select a user who will be the owner of the imported bookmarks now.
Please enter the user email address:
> ${MY_EMAIL}
Reading file "/home/pi/pinboard_export.2021.06.11_09.25.html"...
In Filesystem.php line 57:
File does not exist at path /home/pi/${PATH_TO_APP}/storage//home/pi/pinboard_export.202
1.06.11_09.25.html. |
I started from clean slate and tried to import my bookmarks via the CLI. $ sudo php artisan links:import --skip-check pinboard_export.2021.06.11_09.25.html
You will be asked to select a user who will be the owner of the imported bookmarks now.
Please enter the user email address:
> ${MY_EMAIL}
Reading file "pinboard_export.2021.06.11_09.25.html"...
Skipping link check.
2577 links imported successfully, 1000 skipped. What are those 1000 skipped bookmarks? Why were they skipped? I would like to import all of them. I can see 404 warnings in the logs, although I explicitly told the program to skip checking. |
Bookmarks are only skipped if the url already exists in the database. Were there links existing in the database from previous imports? If you disable the check all links will be imported without being checked if they are online or not. |
I dropped all tables from the database, re-installed the application and imported the bookmark export. If you say there are 1000 duplicates then: (It's also weird there are exactly a 1000 duplicates. I don't say there can't be, but such nice numbers ring a bell, I can't stand it.) Unfortunately LinkAce doesn't log skipped links. (It would be nice though. Btw. I have debug mode set to true.) $ cat storage/logs/laravel-2021-06-14.log | grep -c skip
0 $ cat storage/logs/laravel-2021-06-14.log | grep -c 404
108 |
As I wrote, LinkAce skips all urls that are already in the database. That check runs for all links while doing the import. I do not know how the Poinboard export looks like, but there must be duplicates in there. I will see if I can add more details to the import via logs. |
I loaded the export into the browser and checked for duplicates with code below: var links = [...document.querySelectorAll('dt a')].map(e => e.href);
console.log(`No. links: ${links.length}`);
var unique = new Set([...links]);
console.log(`No. unique links: ${unique.size}`);
var counter = {};
links.forEach(link => {
if (counter[link]) {
counter[link] += 1;
} else {
counter[link] = 1;
}
});
console.table(counter);
var nonUniques = Object.entries(counter).filter(([_, count]) => count > 1).map(([link]) => link);
console.log(nonUniques); The numbers match up to my biggest surprise. LinkAce's behavior is OK. One thing that popped into my mind, how does it get handled when there is a duplicate with different tags or description? Maybe there should be a temporary storage, from where you can resolve merge one by one or applying some kind of rules (eg. choose by creation date; merge fields; by number of tags, etc. ...) |
Awesome! Feel free to share any further feedback. It's very valuable to me. 👍 |
Great, keep up the good work! |
I started an import using the web UI, 874K (4,185 bookmarks)
perhaps
a better import experience could be:
What do you think? |
I have not worked on the import process because I decided to give it a complete overhaul in the next big version. I know the current process is not good. If you are using the web UI, the page should not be closed. But there is a CLI command to use instead. I noted the idea of something like a queue that works of all links after they were stored. I am not just sure how this will look like. |
Hmm. I closed the page and the import seems to be continuing in the background? I have reopened LinkAce and it's currently at ~1,380 out of 4,185 |
My web import stopped at ~1,980 of 4,185 seems odd it would go that long after me closing the page. Maybe there was some disruption to my server. I started the CLI import a short while ago and it's already at ~3,060 of 4,185. Almost there! |
When importing bookmarks, most of them already have description and title and tags (the assumption is that you mainly use import for migrating from a different bookmark manager or the browser). https://github.com/Kovah/LinkAce/blob/main/app/Actions/ImportHtmlBookmarks.php#L49-L56 A low effort high return improvement would be to only generate metadata if any of the values is actually being used. This should make the import process a lot faster for the common case. |
I have several thousands of bookmarks (783KB of HTML). Importing them will result in an error:
[2021-06-13 07:10:11] production.ERROR: Maximum execution time of 30 seconds exceeded {"userId":1,"exception":"[object] (Symfony\\Component\\ErrorHandler\\Error\\FatalError(code: 0): Maximum execution time of 30 seconds exceeded at /home/pi/${PATH_TO_APP}/vendor/nesbot/carbon/src/Carbon/Traits/Creator.php:563) [stacktrace] #0 {main} "}
To avoid maximum execution time error, please delegate processing of bookmarks to another (long running) process that can process in batches and/or restart on maximum execution time error and whose progress can be queried.
If I want to directly import from pinboard.in exported bookmarks, I have no other acceptable alternatives. I wouldn't want to increase maximum execution time setting.
The text was updated successfully, but these errors were encountered: