Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[reference] Generic thumbnailer #153

Closed
wants to merge 7 commits into from
Closed

[reference] Generic thumbnailer #153

wants to merge 7 commits into from

Conversation

nodiscc
Copy link
Member

@nodiscc nodiscc commented Mar 7, 2015

Adapted from @mro's work in #128 (3 first commits. Manually reviewed, the code now merges properly and doesn't cause Shaarli to crash. There was a missing { in the other pull request. I left this commit https://github.com/mro/Shaarli/commit/3d338d499eb36780b9795f6360d00c1acd2d0bab out of the pull request for now.

The og:image thumbnailing code doesn't seem to do anything; I've tried to understand regexps from the previous PR, and tried to match them against page that have the og:image property (test page: https://soundcloud.com/crocriddim/test-007)

  • The regexp '/<meta'.'[^>]*?'.'(?:'.$xogimage.'[^>]*?'.$xcontent.'|'.$xcontent.'[^>]*?'.$xogimage.')[^>]*>/mi' compiles to /]*?(?:\s+(?:property|name)\s*=\s*["']og:image(?::secure_url|:url)?["'][^>]*?\s+content\s*=\s*["']([^"']+)["']|\s+content\s*=\s*["']([^"']+)["'][^>]*?\s+(?:property|name)\s*=\s*["']og:image(?::secure_url|:url)?["'])[^>]*>/mi.
  • It seems to match property="og:image" content="https://i1.sndcdn.com/artworks-000107375178-9c6yju-t500x500.jpg"> in the test page https://soundcloud.com/crocriddim/test-007 - tested in http://www.regexr.com/
  • However I've tried adding error_log("HHAHAHHAHHA MATCH"); just above // download the image into $dat and I never get this message in my logs. So I guess we never enter the if( preg_match('/<meta'...... loop.

@mro can you have a look? You can get my code with git clone -b generic-thumbnailer https://github.com/nodiscc/Shaarli

@nodiscc nodiscc mentioned this pull request Mar 7, 2015
@nodiscc nodiscc self-assigned this Mar 9, 2015
@nodiscc
Copy link
Member Author

nodiscc commented Mar 9, 2015

Fixed! Check the list of commits above.

  • I re-changed the commits author to @mro because credit where it's due.
  • The last commit enables thumbnailing for ALL domains when the link has an http(s); URI. Here is how it looks and I think it's great: https://imgur.com/YtiJhbW Load times are great once the thumbnailing is done on first display.

A few remarks:

  • Some thumbnails can't be fetched and the placeholder text remains. We should use the $blankpath blank gif for these. see below
  • We might want to crop thumbs to a defined aspect ratio. This is easily done with imagemagick, so it should be easy with gd.
  • We need to decide whether we enable thumbnailing for all domains, or just a few ones. I'd like it for all domains, or we should let users choose. Please discuss this.
  • I'd like to remove site-specific thumbnailing code and only keep this one. Please discuss.

@nodiscc nodiscc mentioned this pull request Mar 11, 2015
@nodiscc
Copy link
Member Author

nodiscc commented Mar 12, 2015

I have more info on thumbnails failing. I've turned on error reporting, and tried generating thumbnails using a dummy link list. Thumbnailing this link always fails: https://en.wikipedia.org/wiki/Cymatics. When accessing the corresponding permalink, this error shows up in my apache log:

PHP Warning: preg_replace(): Compilation failed: missing ) at offset 68 in /var/www/links/index.php on line 2021, referer: https://my.server.com/links/

@mro Do you get the same results? Any idea why this happens?

@mro
Copy link

mro commented Mar 13, 2015

very odd. I'm not familiar with how precise php error message line numbers are, but the zip tarball as well as the branch generic-thumbnailer have the same line 2021 - which is a mere assignment. So no possibility for a regexp compile error.

Update: Uh - got it: it's line 2001 that's supposed to be

'!^https?://(?:www\.)?YouTube\.com(?::\d+)?/.*?[\?&]v=([^\?&#]+).*$!i'=>array(

@nodiscc nodiscc modified the milestones: 1.0, 1.1 Mar 13, 2015
@nodiscc nodiscc added easy good place to start contributing and removed help wanted labels Mar 23, 2015
@Draky50110
Copy link

So will it be merged or do it need extra tester/testing ?
I can help testing as it is a wanted feature for me :)
Help for generating thumbnails for already existing links and new ones ;)

@nodiscc
Copy link
Member Author

nodiscc commented Mar 24, 2015

Hi @Draky50110 nice to see you're interested :) I still have to apply the change suggested by mro (right now it's partially broken) and rebase changes on top of master. I'll do it today/tomorrow and yes, testers are welcome.

One thing that was not decided: do we enable thumbnails by default for all domains, or just a few ones (those with predictable thumbnail size)? Or do we keep thumbnailing olny the old ones by default, and let users specify a whitelist for other domains?

@Draky50110
Copy link

My own Shaarli displays thumbs for ALL links, based on thumbshots.com : http://gilles.wittezaele.fr/links/
All are cached.

@nodiscc
Copy link
Member Author

nodiscc commented Mar 24, 2015

wouldn't want a 3rd party involved

We agree on this, your thumbnailer does the job locally and is more privacy respecting than the old one (does not leak visitor's IPs to the thumbnailed sites, it leaks the server IP instead)

@mro would you like thumbnailing enabled for all sites by default? Or something else (configure $url_patterns_genthumbnail in the options file?) Might as well ask @ArthurHoaro @virtualtam @kalvn @alexisju @AkibaTech @dhoko @Salan54 and everyone interested.

  • Thumbnailing for only a bunch of sites? (which ones?)
  • Thumbnailing for all sites?
  • Do we make it configurable? The option entry would look like
$GLOBALS['config']['THUMBNAIL_DOMAINS'] = array('xkcd\.com', 'vimeo\.com', 'ted\.com', '(flickr)\.com', (itunes\.apple|imdb)\.com') //enable thumbnailing for these domains. Set to '.*' to enable thumbnails for all domains.

and populate the $url_patterns_genthumbnail array like !^https?://(?:www\.)?$ITEM(?::\d+)?/.*?/\d+!i'

Edit: @Draky50110 this thumbnailer generates thumbs for past and future links. You can test it by cloning/downloading my generic-thumbnailer branch

@mro
Copy link

mro commented Mar 24, 2015

would you like thumbnailing enabled for all sites by default?

frankly I'm undecided. I worry a bit about thousands of files in the image cache never being cleaned up. And thumbnails are not the main topic of shaarli. They are nice but optional. But if it causes no pain: yes. Otherwise configure URL regexps in options file.

Also thought about an additional (optional) form field - pre-filled with og:image & friends. But this makes sense for the bookmarklet only. And requires changes to the persistent data model.

@alexisju
Copy link

Maybe it's probably better to consider this as a specific plugin or a specific template.

When the plugin system is in place (#164), it will be (more) easy to develop this kind of functionality.

@nodiscc
Copy link
Member Author

nodiscc commented Apr 1, 2015

Remaining todo with this, branch generic-thumbnailer

  • Hide alt text on links like https://en.wikipedia.org/wiki/Cymatics when no thumb is displayed, use the blank gif instead
  • Only thumbnail youtube.com youtu.be pix.toile-libre.org imgur.com i.imgur.com dailymotion.com .imageshack.us flickr.com vimeo.com xkcd.com ted.com by default (current supported sites) with the new thumbnailer
  • Remove old thumbnailing code and use new one everywhere (consistent server-side caching)

Later: Add an option to enable thumbnails for all domains ('.*'), add an option to specify a whitelist

@mro
Copy link

mro commented May 22, 2015

may seem a bit OT, but I hit a strange HTTP 406 Not Acceptable error on one certain domain: https://github.com/mro/Shaarli/issues/1 which in turn prevents the og:image url detection.
Feel free to cherry-pick the fix (user agent, sigh).

@yorikvanhavre
Copy link

This works very well already :) Thanks for that, @nodiscc

@mrjovanovic
Copy link

I like the idea of configurable domains as commented by @nodiscc above, with leaving conservative defaults.

@nodiscc
Copy link
Member Author

nodiscc commented Jun 25, 2015

This needs to be rebased and these points should be fixed. Help on the rebase is welcome.

@nodiscc
Copy link
Member Author

nodiscc commented Jun 12, 2016

I have unassigned myself since this probably needs to be rewritten from scratch, and thumbnailing moved to a plugin. I will leave this Pull Request open for future reference.

@nodiscc nodiscc removed their assignment Jun 12, 2016
@nodiscc nodiscc changed the title Generic thumbnailer [reference] Generic thumbnailer Jun 12, 2016
@nodiscc nodiscc self-assigned this Feb 15, 2017
@virtualtam virtualtam modified the milestones: 1.1.0, 0.10.0 Aug 3, 2017
@nodiscc
Copy link
Member Author

nodiscc commented Nov 21, 2017

Made obsolete by #687 (Use web-thumbnailer to retrieve thumbnails)

@nodiscc nodiscc closed this Nov 21, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cleanup code cleanup and refactoring easy good place to start contributing enhancement in progress thumbnail media thumbnails
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants