Optimize python dependencies in a Docker container
📅2023-08-01🧐192
Our goal to optimize the dependencies for a python project in a docker container is to reduce the size of images and lock the versions for the dependencies.
Some of helpful things could help us achieve the goal:
- Use a package management tool like Python-Poetry
- Use a bundle tool like PyInstaller
- Use multi-stage builds in Dockerfile
When we use Python-poetry in a Docker multi-stage builds, we could use poetry to create requirement files using poetry export
Then perform pip install in the next stage like this:
ARG PROJECT_PATH=/app
FROM docker.io/freeyeti/dev-in-docker:pyinstaller5.8.0-poetry1.4.0 AS poetry
RUN mkdir -p ${PROJECT_PATH}
WORKDIR ${PROJECT_PATH}
COPY . .
RUN poetry export --output requirements.txt
FROM docker.io/freeyeti/dev-in-docker:python3.10-gdal3.4.1-libmagickwand AS django
RUN yes | pip3 install --no-cache-dir -r requirements.txt
And there is a big advantage to use Python-Poetry, you could avoid to install some of dependencies, all you need to do just add a arg of --without $groupname
RUN poetry export --without dev --output requirements.txt
https://pyinstaller.org/en/stable/
Pyinstaller can helps us to reduce the final image size, since Pyinstaller will create copy all the dependencies into one directory, then you can run it without Python interpreter.
First we just need to create an app.spec file to import our dependencies
import os
import sys
local_lib = '/usr/local/lib/python3.10/dist-packages'
block_cipher = None
added_files = [
(os.path.join(local_lib, 'webpack_loader'), 'webpack_loader'),
(os.path.join(local_lib, 'rest_framework'), 'rest_framework'),
(os.path.join(local_lib, 'uvicorn'), 'uvicorn'),
(os.path.join(local_lib, 'rasterio'), 'rasterio'),
(os.path.join(local_lib, 'certifi'), 'certifi'),
(os.path.join(local_lib, 'h11'), 'h11'),
('geodata', 'geodata'),
]
hide_imports = [
"rest_framework",
"webpack_loader",
"uvicorn",
"rasterio",
"certifi",
"h11",
"GDAL",
"geodata",
]
app_a = Analysis(['app.py'],
pathex=['/app'],
binaries=[],
datas=added_files,
hiddenimports=hide_imports,
hookspath=[],
runtime_hooks=[],
excludes=[],
win_no_prefer_redirects=False,
win_private_assemblies=False,
cipher=block_cipher,
noarchive=False)
MERGE((app_a, 'app', 'app'))
app_pyz = PYZ(app_a.pure, app_a.zipped_data,
cipher=block_cipher)
app_exe = EXE(app_pyz,
app_a.scripts,
[],
exclude_binaries=True,
name='app',
debug=False,
bootloader_ignore_signals=False,
strip=False,
upx=True,
console=True )
app_coll = COLLECT(app_exe,
app_a.binaries,
app_a.zipfiles,
app_a.datas,
strip=False,
upx=True,
upx_exclude=[],
name=os.path.join('dist', 'app'))
Then, we add another stage to our Dockerfile to build the bundle
ARG PROJECT_PATH=/app
FROM docker.io/freeyeti/dev-in-docker:pyinstaller5.8.0-poetry1.4.0 AS poetry
RUN mkdir -p ${PROJECT_PATH}
WORKDIR ${PROJECT_PATH}
COPY . .
RUN poetry export --output requirements.txt
FROM docker.io/freeyeti/dev-in-docker:python3.10-gdal3.4.1-libmagickwand AS builder
RUN mkdir -p ${PROJECT_PATH}
WORKDIR ${PROJECT_PATH}
COPY --from=poetry ${PROJECT_PATH} ${PROJECT_PATH}
RUN yes | pip3 install --no-cache-dir -r requirements.txt
RUN pyinstaller app.spec
FROM ubuntu:latest AS runner
RUN mkdir -p ${PROJECT_PATH}
WORKDIR ${PROJECT_PATH}
COPY --from=backend ${PROJECT_PATH}/dist ${PROJECT_PATH}/dist