Skip to content

Base image based on Alpine #99

@dragospopa420

Description

@dragospopa420

Which package is the feature request for? If unsure which one to select, leave blank

@crawlee/core

Feature

The base image is based on Debian which has a much bigger fingerprint than the Alpine Linux.
So I was thinking maybe the included dockerfile can be based on Alpine Linux, for fast deployment and testing
The apify/actor-node-puppeteer-chrome has 2.53gb, my version has 698mb

Motivation

I'm building an infrastructure of spiders based on Crawlee and I wanted to have the fastest possible deployment time.

Ideal solution or implementation, and any additional constraints

FROM node:current-alpine

# Set workdir
WORKDIR /usr/src/app

# Copy just package.json and package-lock.json
# to speed up the build using Docker layer cache.
COPY package*.json ./

# Change rights for package-lock.json
RUN chmod 744 package-lock.json

# Install chromium and it's dependencies, node is also here to be sure that is updated
RUN apk add --no-cache \
      chromium \
      nss \
      freetype \
      harfbuzz \
      ca-certificates \
      ttf-freefont \
      nodejs \
      yarn

# This tells puppeteer to not download chrome again
ENV PUPPETEER_EXECUTABLE_PATH=/usr/bin/chromium-browser

# Install NPM packages, skip optional and development dependencies to
# keep the image small. Avoid logging too much and print the dependency
# tree for debugging
RUN npm --quiet set progress=false \
    && npm install --omit=dev --omit=optional \
    && echo "Installed NPM packages:" \
    && (npm list --omit=dev --all || true) \
    && echo "Node.js version:" \
    && node --version \
    && echo "NPM version:" \
    && npm --version

# Next, copy the remaining files and directories with the source code.
# Since we do this after NPM install, quick build will be really fast
# for most source file changes.
COPY . ./

# Required for Crawlee
ENV CRAWLEE_CHROME_EXECUTABLE_PATH=/usr/bin/chromium-browser
RUN chmod 744 /usr/bin/chromium-browser

# Run the image.
CMD npm start 

Alternative solutions or implementations

No response

Other context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    t-toolingIssues with this label are in the ownership of the tooling team.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions