-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migration to Pillow and huge performance improvements #137
base: master
Are you sure you want to change the base?
Commits on Oct 10, 2021
-
move remove_superficial_options into utils
This move makes sense if one wants to reuse remove_superficial_options since it can be not specific to cache.py only. This prepares prosopopee for Pillow support. Signed-off-by: Quentin Schulz <foss@0leil.net>
Configuration menu - View commit details
-
Copy full SHA for 2c2895c - Browse repository at this point
Copy the full SHA 2c2895cView commit details -
prosopopee: do not dump cache when doing a dry run
Dry runs (`prosopopee test`) shouldn't dump the cache since nothing's done except creating the HTML files which means the cache is more or less meaningless in that case. Let's dump the cache only when doing a normal build run. Signed-off-by: Quentin Schulz <foss@0leil.net>
Configuration menu - View commit details
-
Copy full SHA for 330fcd4 - Browse repository at this point
Copy the full SHA 330fcd4View commit details -
themes: remove unnecessary calls to copy()
For images, calls to copy() is only needed when later in the template {{ image }} is used. Removing those copy() as they trigger creation of thumbnails that will never be used. Signed-off-by: Quentin Schulz <foss@0leil.net>
Configuration menu - View commit details
-
Copy full SHA for ef3a5ee - Browse repository at this point
Copy the full SHA ef3a5eeView commit details -
themes: exposure: index: fix incorrect logic for no_big_gallery_cover
Big gallery covers should be used for lines where only one gallery cover appears. With the current logic, if there is a prime number of galleries (except 2 and 3), first one and all galleries whose index is prime (except 2nd and 3rd) will have a big cover. In the end, all it matters is that if the galleries_line contains only one gallery, that gallery should have a big cover. Signed-off-by: Quentin Schulz <foss@0leil.net>
Configuration menu - View commit details
-
Copy full SHA for 4edfbfb - Browse repository at this point
Copy the full SHA 4edfbfbView commit details -
prosopopee: do not get logs from third party modules
Loggers work by hierarchy. The parent always overrides whatever the child logger has already defined. This applies to the loglevel, which is changed in prosopopee according to the --log-level argument. Since the root logger (gotten with logger = logging.getLogger()) is the parent of ALL loggers which could be declared in any third party module, prosopopee's loglevel also applies to those modules which is usually not wanted especially when prosopopee's default loglevel is the highest available. This is very annoying with Pillow since it's pretty verbose when saving files. Instead, let's declare a logger for prosopopee only. Unfortunately, since the package layout is unconventional (all *.py files in the same directory, instead of subdirs), the recommended logger = logging.getLogger(__name__) cannot be used because __name__ is __main__ in prosopopee.py, and the filename of the file in which it is used (e.g. in cache.py, it'll be cache). Which means they're not related in the eyes of the logging module and prosopopee.py's loglevel will not apply to other *.py files in the project. Instead the expected value of __name__ for more conventional packaging layouts is simulated by appending prosopopee. in front of __name__ except for prosopopee.py which is the parent logger and thus will be simply named prosopopee. Since prosopopee's logger is not the root logger anymore, NOTSET loglevel cannot be used anymore because its meaning is basically "offload messages to parent logger" and the root logger has a default loglevel of WARNING, meaning prosopopee's default loglevel will not print anything labelled as INFO or DEBUG. c.f. https://stackoverflow.com/a/50755200 Signed-off-by: Quentin Schulz <foss@0leil.net>
Configuration menu - View commit details
-
Copy full SHA for 37195a5 - Browse repository at this point
Copy the full SHA 37195a5View commit details -
prosopopee: cache: migrate Cache.cache to a Manager().dict
In order to prepare for multiprocess support, migrate Cache.cache from a simple dict to a Manager().dict which is one of the data type that can be modified safely from other processes. Signed-off-by: Quentin Schulz <foss@0leil.net>
Configuration menu - View commit details
-
Copy full SHA for 466d55f - Browse repository at this point
Copy the full SHA 466d55fView commit details -
prosopopee: cache: compare cached options and current options after c…
…uration json.dumps() which is used to write the cache dict to a file transforms tuples into a list. With the current implementation, if a tuple is supposed to be cached, the needs_to_be_generated method will always return True even though it might not be correct. In order to support tuples in cache entries, let's pass the options passed as parameter to the method through json.loads(json.dumps()) to have the same format between cached options and to-be-compared options. This will be used in a later commit which adds a tuple (width, height) to the cache. Signed-off-by: Quentin Schulz <foss@0leil.net>
Configuration menu - View commit details
-
Copy full SHA for efb210d - Browse repository at this point
Copy the full SHA efb210dView commit details -
migrate from graphicsmagick to pillow for thumbnail generation
Currently, thumbnail generation is done in a single thread while parsing the galleries by calling graphicsmagick for every thumbnail to be generated. This is suboptimal even though graphicsmagick spreads its payload over all available CPU cores. After a quick and dirty benchmarking, it was found that multiprocessed Pillow for generating thumbnails was much more efficient than graphicsmagick. This patch adds support for generation of tuhmbnails with multiprocessed Pillow. Multiple processes have to be used and not multiple threads because Python still uses the Global Interpreter Lock (GIL) for threads, meaning they cannot concurrently be running, which is what one wants for CPU intensive tasks such as thumbnail generation. Multiprocess brings its own set of challenges because most data structures cannot be shared between processes, such as the cache for example. All data modified by any of the processes should be of a type handled by multiprocess.Manager data structures. In order to have the best performances, all thumbnails for an image should be generated at once, so that the original image is opened only once. This therefore requires to keep track of images and add thumbnails to be created to the original image. This can be done via a factory which is passed to the Jinja templates so that they can request thumbnails for given images without knowing more than the original path, name of the original image and the parameters of the thumbnails to create. The ImageFactory keeps all of those original images in a dictionary which consists of a virtual path made from the original image name and a CRC32 of all the options that applies to its thumbnails. This gives prosopopee the ability to group thumbnails per options (e.g. if options are passed in gallery settings.yaml). The original image (or BaseImage) is returned by the ImageFactory and the templates can then request .copy() or .thumbnail() for it. The thumbnails are kept in a dictionary whose keys are the name of the thumbnail which is made out of the original name plus its size and the crc32 of the original image and the options that apply to it. This way, thumbnails are guaranteed to be unique even if requested multiple times by templates. The size is now read with imagesize.getsize() only once when ratio property or .copy() is called on the image so that the performance impact is minimal. A notable change is that the resize option for images only accepts percentages for now. Another notable change is that the .copy() function actually also applies the quality setting, unlike the implementation with graphicsmagick. Since multiprocess.Pool.map splits iterables into pre-defined chunks which are then assigned to processes, it is needed for best performance to have processes with more or less the same taskload so that one or more processes aren't idle when one is working 100%. For that, the original images whose thumbnails are all cached should be removed from the list of images to generate thumbnails from before the list is passed to multiprocess.Pool.map so that each process has more or less the same taskload. This has been tested against Pillow from 6.0.0 to 8.1.0 and Pillow-SIMD 7.0.0.post3. Signed-off-by: Quentin Schulz <foss@0leil.net>
Configuration menu - View commit details
-
Copy full SHA for a46070f - Browse repository at this point
Copy the full SHA a46070fView commit details -
travis: remove now useless graphicsmagick
Signed-off-by: Quentin Schulz <foss@0leil.net>
Configuration menu - View commit details
-
Copy full SHA for 9566424 - Browse repository at this point
Copy the full SHA 9566424View commit details -
docs: update based on migration from GraphicsMagick to Pillow
Signed-off-by: Quentin Schulz <foss@0leil.net>
Configuration menu - View commit details
-
Copy full SHA for 0de1e40 - Browse repository at this point
Copy the full SHA 0de1e40View commit details -
prosopopee: allow selecting number of threads used for thumbnail gene…
…ration Generating thumbnails is done in parallel threads via multiprocessing.Pool. By default, Pool schedules tasks on as many threads as there are cpu threads on the host machine. Let's allow users to select the number of threads Pool can use. Signed-off-by: Quentin Schulz <foss@0leil.net>
Configuration menu - View commit details
-
Copy full SHA for ba91bcb - Browse repository at this point
Copy the full SHA ba91bcbView commit details