Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migration to Pillow and huge performance improvements #137

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Commits on Oct 10, 2021

  1. move remove_superficial_options into utils

    This move makes sense if one wants to reuse remove_superficial_options
    since it can be not specific to cache.py only.
    
    This prepares prosopopee for Pillow support.
    
    Signed-off-by: Quentin Schulz <foss@0leil.net>
    QSchulz committed Oct 10, 2021
    Configuration menu
    Copy the full SHA
    2c2895c View commit details
    Browse the repository at this point in the history
  2. prosopopee: do not dump cache when doing a dry run

    Dry runs (`prosopopee test`) shouldn't dump the cache since nothing's
    done except creating the HTML files which means the cache is more or
    less meaningless in that case.
    
    Let's dump the cache only when doing a normal build run.
    
    Signed-off-by: Quentin Schulz <foss@0leil.net>
    QSchulz committed Oct 10, 2021
    Configuration menu
    Copy the full SHA
    330fcd4 View commit details
    Browse the repository at this point in the history
  3. themes: remove unnecessary calls to copy()

    For images, calls to copy() is only needed when later in the template
    {{ image }} is used.
    
    Removing those copy() as they trigger creation of thumbnails that will
    never be used.
    
    Signed-off-by: Quentin Schulz <foss@0leil.net>
    QSchulz committed Oct 10, 2021
    Configuration menu
    Copy the full SHA
    ef3a5ee View commit details
    Browse the repository at this point in the history
  4. themes: exposure: index: fix incorrect logic for no_big_gallery_cover

    Big gallery covers should be used for lines where only one gallery
    cover appears.
    
    With the current logic, if there is a prime number of galleries (except
    2 and 3), first one and all galleries whose index is prime (except 2nd
    and 3rd) will have a big cover.
    
    In the end, all it matters is that if the galleries_line contains only
    one gallery, that gallery should have a big cover.
    
    Signed-off-by: Quentin Schulz <foss@0leil.net>
    QSchulz committed Oct 10, 2021
    Configuration menu
    Copy the full SHA
    4edfbfb View commit details
    Browse the repository at this point in the history
  5. prosopopee: do not get logs from third party modules

    Loggers work by hierarchy. The parent always overrides whatever the
    child logger has already defined. This applies to the loglevel, which is
    changed in prosopopee according to the --log-level argument.
    
    Since the root logger (gotten with logger = logging.getLogger()) is the
    parent of ALL loggers which could be declared in any third party module,
    prosopopee's loglevel also applies to those modules which is usually not
    wanted especially when prosopopee's default loglevel is the highest
    available.
    
    This is very annoying with Pillow since it's pretty verbose when saving
    files.
    
    Instead, let's declare a logger for prosopopee only. Unfortunately,
    since the package layout is unconventional (all *.py files in the same
    directory, instead of subdirs), the recommended
    logger = logging.getLogger(__name__) cannot be used because __name__ is
    __main__ in prosopopee.py, and the filename of the file in which it is
    used (e.g. in cache.py, it'll be cache). Which means they're not related
    in the eyes of the logging module and prosopopee.py's loglevel will not
    apply to other *.py files in the project.
    
    Instead the expected value of __name__ for more conventional packaging
    layouts is simulated by appending prosopopee. in front of __name__
    except for prosopopee.py which is the parent logger and thus will be
    simply named prosopopee.
    
    Since prosopopee's logger is not the root logger anymore, NOTSET
    loglevel cannot be used anymore because its meaning is basically
    "offload messages to parent logger" and the root logger has a default
    loglevel of WARNING, meaning prosopopee's default loglevel will not
    print anything labelled as INFO or DEBUG.
    
    c.f. https://stackoverflow.com/a/50755200
    
    Signed-off-by: Quentin Schulz <foss@0leil.net>
    QSchulz committed Oct 10, 2021
    Configuration menu
    Copy the full SHA
    37195a5 View commit details
    Browse the repository at this point in the history
  6. prosopopee: cache: migrate Cache.cache to a Manager().dict

    In order to prepare for multiprocess support, migrate Cache.cache from a
    simple dict to a Manager().dict which is one of the data type that can
    be modified safely from other processes.
    
    Signed-off-by: Quentin Schulz <foss@0leil.net>
    QSchulz committed Oct 10, 2021
    Configuration menu
    Copy the full SHA
    466d55f View commit details
    Browse the repository at this point in the history
  7. prosopopee: cache: compare cached options and current options after c…

    …uration
    
    json.dumps() which is used to write the cache dict to a file transforms
    tuples into a list. With the current implementation, if a tuple is
    supposed to be cached, the needs_to_be_generated method will always
    return True even though it might not be correct.
    
    In order to support tuples in cache entries, let's pass the options
    passed as parameter to the method through json.loads(json.dumps()) to
    have the same format between cached options and to-be-compared options.
    
    This will be used in a later commit which adds a tuple (width, height) to
    the cache.
    
    Signed-off-by: Quentin Schulz <foss@0leil.net>
    QSchulz committed Oct 10, 2021
    Configuration menu
    Copy the full SHA
    efb210d View commit details
    Browse the repository at this point in the history
  8. migrate from graphicsmagick to pillow for thumbnail generation

    Currently, thumbnail generation is done in a single thread while parsing
    the galleries by calling graphicsmagick for every thumbnail to be
    generated. This is suboptimal even though graphicsmagick spreads its
    payload over all available CPU cores.
    
    After a quick and dirty benchmarking, it was found that multiprocessed
    Pillow for generating thumbnails was much more efficient than
    graphicsmagick.
    
    This patch adds support for generation of tuhmbnails with multiprocessed
    Pillow.
    
    Multiple processes have to be used and not multiple threads because
    Python still uses the Global Interpreter Lock (GIL) for threads, meaning
    they cannot concurrently be running, which is what one wants for CPU
    intensive tasks such as thumbnail generation.
    
    Multiprocess brings its own set of challenges because most data
    structures cannot be shared between processes, such as the cache for
    example. All data modified by any of the processes should be of a type
    handled by multiprocess.Manager data structures.
    
    In order to have the best performances, all thumbnails for an image
    should be generated at once, so that the original image is opened only
    once. This therefore requires to keep track of images and add thumbnails
    to be created to the original image. This can be done via a factory
    which is passed to the Jinja templates so that they can request
    thumbnails for given images without knowing more than the original path,
    name of the original image and the parameters of the thumbnails to
    create.
    
    The ImageFactory keeps all of those original images in a dictionary
    which consists of a virtual path made from the original image name and a
    CRC32 of all the options that applies to its thumbnails. This gives
    prosopopee the ability to group thumbnails per options (e.g. if options
    are passed in gallery settings.yaml).
    
    The original image (or BaseImage) is returned by the ImageFactory and
    the templates can then request .copy() or .thumbnail() for it.
    
    The thumbnails are kept in a dictionary whose keys are the name of the
    thumbnail which is made out of the original name plus its size and the
    crc32 of the original image and the options that apply to it. This way,
    thumbnails are guaranteed to be unique even if requested multiple times
    by templates.
    
    The size is now read with imagesize.getsize() only once when ratio
    property or .copy() is called on the image so that the performance impact
    is minimal.
    
    A notable change is that the resize option for images only accepts
    percentages for now.
    
    Another notable change is that the .copy() function actually also
    applies the quality setting, unlike the implementation with
    graphicsmagick.
    
    Since multiprocess.Pool.map splits iterables into pre-defined chunks
    which are then assigned to processes, it is needed for best performance
    to have processes with more or less the same taskload so that one or
    more processes aren't idle when one is working 100%. For that, the
    original images whose thumbnails are all cached should be removed from
    the list of images to generate thumbnails from before the list is passed
    to multiprocess.Pool.map so that each process has more or less the same
    taskload.
    
    This has been tested against Pillow from 6.0.0 to 8.1.0 and Pillow-SIMD
    7.0.0.post3.
    
    Signed-off-by: Quentin Schulz <foss@0leil.net>
    QSchulz committed Oct 10, 2021
    Configuration menu
    Copy the full SHA
    a46070f View commit details
    Browse the repository at this point in the history
  9. travis: remove now useless graphicsmagick

    Signed-off-by: Quentin Schulz <foss@0leil.net>
    QSchulz committed Oct 10, 2021
    Configuration menu
    Copy the full SHA
    9566424 View commit details
    Browse the repository at this point in the history
  10. docs: update based on migration from GraphicsMagick to Pillow

    Signed-off-by: Quentin Schulz <foss@0leil.net>
    QSchulz committed Oct 10, 2021
    Configuration menu
    Copy the full SHA
    0de1e40 View commit details
    Browse the repository at this point in the history
  11. prosopopee: allow selecting number of threads used for thumbnail gene…

    …ration
    
    Generating thumbnails is done in parallel threads via
    multiprocessing.Pool. By default, Pool schedules tasks on as many
    threads as there are cpu threads on the host machine.
    
    Let's allow users to select the number of threads Pool can use.
    
    Signed-off-by: Quentin Schulz <foss@0leil.net>
    QSchulz committed Oct 10, 2021
    Configuration menu
    Copy the full SHA
    ba91bcb View commit details
    Browse the repository at this point in the history