Optimize python dependencies in a Docker container

📅2023-08-01🧐149

Our goal to optimize the dependencies for a python project in a docker container is to reduce the size of images and lock the versions for the dependencies.

Some of helpful things could help us achieve the goal:

  • Use a package management tool like Python-Poetry
  • Use a bundle tool like PyInstaller
  • Use multi-stage builds in Dockerfile
Python-Poetry

https://python-poetry.org/

When we use Python-poetry in a Docker multi-stage builds, we could use poetry to create requirement files using poetry export Then perform pip install in the next stage like this:

ARG PROJECT_PATH=/app

FROM docker.io/freeyeti/dev-in-docker:pyinstaller5.8.0-poetry1.4.0 AS poetry

RUN mkdir -p ${PROJECT_PATH}
WORKDIR ${PROJECT_PATH}
COPY . .

RUN poetry export --output requirements.txt

FROM docker.io/freeyeti/dev-in-docker:python3.10-gdal3.4.1-libmagickwand AS django

RUN yes | pip3 install --no-cache-dir -r requirements.txt

And there is a big advantage to use Python-Poetry, you could avoid to install some of dependencies, all you need to do just add a arg of --without $groupname

RUN poetry export --without dev --output requirements.txt
PyInstaller

https://pyinstaller.org/en/stable/

Pyinstaller can helps us to reduce the final image size, since Pyinstaller will create copy all the dependencies into one directory, then you can run it without Python interpreter.

First we just need to create an app.spec file to import our dependencies

import os
import sys

local_lib = '/usr/local/lib/python3.10/dist-packages'

block_cipher = None

added_files = [
  (os.path.join(local_lib, 'webpack_loader'), 'webpack_loader'),
  (os.path.join(local_lib, 'rest_framework'), 'rest_framework'),
  (os.path.join(local_lib, 'uvicorn'), 'uvicorn'),
  (os.path.join(local_lib, 'rasterio'), 'rasterio'),
  (os.path.join(local_lib, 'certifi'), 'certifi'),
  (os.path.join(local_lib, 'h11'), 'h11'),
  ('geodata', 'geodata'),
]

hide_imports = [
  "rest_framework",
  "webpack_loader",
  "uvicorn",
  "rasterio",
  "certifi",
  "h11",
  "GDAL",
  "geodata",
]

app_a = Analysis(['app.py'],
             pathex=['/app'],
             binaries=[],
             datas=added_files,
             hiddenimports=hide_imports,
             hookspath=[],
             runtime_hooks=[],
             excludes=[],
             win_no_prefer_redirects=False,
             win_private_assemblies=False,
             cipher=block_cipher,
             noarchive=False)

MERGE((app_a, 'app', 'app'))

app_pyz = PYZ(app_a.pure, app_a.zipped_data,
             cipher=block_cipher)
app_exe = EXE(app_pyz,
          app_a.scripts,
          [],
          exclude_binaries=True,
          name='app',
          debug=False,
          bootloader_ignore_signals=False,
          strip=False,
          upx=True,
          console=True )

app_coll = COLLECT(app_exe,
               app_a.binaries,
               app_a.zipfiles,
               app_a.datas,
               strip=False,
               upx=True,
               upx_exclude=[],
               name=os.path.join('dist', 'app'))

Then, we add another stage to our Dockerfile to build the bundle

ARG PROJECT_PATH=/app

FROM docker.io/freeyeti/dev-in-docker:pyinstaller5.8.0-poetry1.4.0 AS poetry

RUN mkdir -p ${PROJECT_PATH}
WORKDIR ${PROJECT_PATH}
COPY . .

RUN poetry export --output requirements.txt

FROM docker.io/freeyeti/dev-in-docker:python3.10-gdal3.4.1-libmagickwand AS builder

RUN mkdir -p ${PROJECT_PATH}
WORKDIR ${PROJECT_PATH}

COPY --from=poetry ${PROJECT_PATH} ${PROJECT_PATH}

RUN yes | pip3 install --no-cache-dir -r requirements.txt
RUN pyinstaller app.spec

FROM ubuntu:latest AS runner

RUN mkdir -p ${PROJECT_PATH}
WORKDIR ${PROJECT_PATH}

COPY --from=backend ${PROJECT_PATH}/dist ${PROJECT_PATH}/dist