Using Multistage Builds to Reduce the Size of Containers

Kacper Bąk
6 min readJan 27, 2023

--

Photo by nikko osaka on Unsplash

Multistage builds are an efficient way to create small containers with specific applications inside. This article will explain what multistage builds are, why they are useful, and how to use them to reduce the size of your containers.

A multistage build is a process where a base image is used to build an application, and then the executable code is copied to a smaller, lighter container. This allows for a much smaller image size while still giving the application all the necessary dependencies to run.

The base image used for the build phase can be anything from a specific version of Ubuntu to a specific version of Java. Once the base image is specified, multiple stages can be added to install dependencies, compile code, copy files, and package the application. Each stage is given a name, and all stages are run in sequential order. At the end of the multistage build, a single image is created which contains the application and all necessary dependencies.

Multistage builds can be very useful when trying to reduce the size of containers. When using languages such as Python or Java, it can be difficult to reduce the size of the container. However, multistage builds allow you to separate the build and deployment phases, which can help to reduce the size of the container. Additionally, using lighter versions of technologies and efficient hosting services can also help to reduce the size of the container.

In conclusion, multistage builds are a great way to reduce the size of your containers. By using a base image to build the application and then copying only the necessary files to a smaller container, you can create small containers with specific applications inside. Additionally, using lighter versions of technologies and efficient hosting services can also help to reduce the size of the container.

For example, if you are creating a Java application, you could use a Java base image to build the application and then copy only the necessary files to a smaller container. Similarly, if you are creating a Python application, you could use a Python base image to build the application and then copy only the necessary files to a smaller container. Additionally, if you are using AWS or another cloud-hosting service, you could use the platform’s optimized image to reduce the size of the container.

For example, to create a Java application using multistage builds, you could use the following code:

FROM openjdk:8
# Install dependencies
RUN apt-get update && apt-get install -y maven
# Copy source code
COPY . .
# Build the application
RUN mvn package
# Create the container
FROM openjdk:8-jre-alpine
# Copy the executable code
COPY - from=0 target/my-app.jar /opt/my-app.jar
# Run the application
CMD ["java", "-jar", "/opt/my-app.jar"]

This code is a Dockerfile used to build and deploy a Java application.

This code is in the second stage of the build, where it compiles the application source code and packages it into a JAR file.

Similarly, to create a Python application using multistage builds, you could use the following code:

FROM python:3.7
# Install dependencies
RUN pip install -r requirements.txt
# Copy source code
COPY . .
# Create the container
FROM python:3.7-slim
# Copy the executable code
COPY - from=0 src/my-app.py /opt/my-app.py
# Run the application
CMD ["python", "/opt/my-app.py"]

This code is a Dockerfile, which is used to define the steps needed to build a Docker image. It is in the final stage of the build, where the application is being set up to run in the container. It begins by specifying the Python version (3.7), installs dependencies with pip, copies the source code, and then copies the executable code from the previous build into the container. Finally, it runs the application with the CMD command.

Several thoughts arose in my mind…

  1. Why this code is bad?
  2. What is the difference between CMD and Entrypoint in Dockerfile?
  3. When you run docker it goes into the entry point, we could use also WORKDIR, to set where it should start. I mean in which directory?

This code is not bad, but it could be improved. The difference between CMD and Entrypoint is that CMD is used to specify the default command to run when a container is started, while Entrypoint is used to set the command that will run when a container is initialized. This can be used to set the default command that will be executed when a container is created. The WORKDIR command sets the working directory in the container, which is the directory that the container will start in when it is initialized.

Then I refactored it…

FROM python:3.7

# Install dependencies
RUN pip install -r requirements.txt

# Set working directory
WORKDIR /opt

# Copy source code
COPY . .

# Copy executable code
COPY --from=0 src/my-app.py /opt/my-app.py

# Set entrypoint
ENTRYPOINT ["python", "/opt/my-app.py"]

I have some thoughts, name stage. I have to name it as builder… In the line with COPY, you have to copy from the builder, if you have multistage it can be easily messed up, you have to copy by name.

FROM golang:1.16 AS builder
WORKDIR /go/src/github.com/alexellis/href-counter/
RUN go get -d -v golang.org/x/net/html
COPY app.go ./
RUN CGO_ENABLED=0 go build -a -installsuffix cgo -o app .

FROM alpine:latest
RUN apk --no-cache add ca-certificates
WORKDIR /root/
COPY --from=builder /go/src/github.com/alexellis/href-counter/app ./
CMD ["./app"]

The above code snippet is a good example of how Dockerfile should work. But I think I can improve it easily. Also, this code does not work at all. Go needs an initialized go.mod to be able to build at all.

FROM golang:1.16 AS builder
WORKDIR /go/src/github.com/alexellis/href-counter/
RUN go get -d -v golang.org/x/net/html
COPY app.go ./
RUN CGO_ENABLED=0 go build -a -installsuffix cgo -o app .

FROM alpine:latest
RUN apk --no-cache add ca-certificates
WORKDIR /root/
COPY --from=builder /go/src/github.com/alexellis/href-counter/app ./
ENTRYPOINT ["./app"]

I took an example, that could be improved from official docker documentation. https://docs.docker.com/build/building/multi-stage/

Based on these differences, it can be concluded that the choice of which command to use at the end of the Dockerfile is largely dependent on the application being built.

Generally, it is recommended to use the ENTRYPOINT command as it allows for more flexibility when running the container. This can be useful if you want to override the default command with custom arguments or environment variables.

The CMD command is simpler and can be used if you just want to specify the default command that will always be executed when the container is run.

From the conclusions reached, I analyzed the Python Dockerfile again and enhanced it.

FROM python:3.7
# Install dependencies
RUN pip install -r requirements.txt

# Set working directory
WORKDIR /opt

# Copy source code
COPY . .

# Copy executable code
COPY --from=0 src/my-app.py /opt/my-app.py

# Set entrypoint
ENTRYPOINT ["python", "/opt/my-app.py"]

# Set environment variables
ENV PYTHONPATH=/opt/

# Expose port
EXPOSE 5000

This Dockerfile can be improved by setting environment variables and exposing a port.

The ENV command can be used to set environment variables that will be available to the application when it is running in the container.

The EXPOSE command can be used to expose a port so that the application can be accessed externally.

And you will ask, why is ENTRYPOINT at the end?

It’s just an indication of what’s about to happen as soon as you fire up the container.

However… The ENTRYPOINT command doesn’t have to be at the end of the Dockerfile. It is usually placed at the end because it is the last command that will be executed when the container is created. It can also be placed earlier in the Dockerfile if needed.

This Dockerfile can also be improved with one of the most important things, namely stage naming.

FROM python:3.7 AS builder
# Install dependencies
RUN pip install -r requirements.txt

# Set working directory
WORKDIR /opt

# Copy source code
COPY . .

# Copy executable code
COPY --from=0 src/my-app.py /opt/my-app.py

FROM python:3.7-slim
# Copy the executable code
COPY --from=builder /opt/my-app.py /opt/my-app.py

# Set entrypoint
ENTRYPOINT ["python", "/opt/my-app.py"]

# Set environment variables
ENV PYTHONPATH=/opt/

# Expose port
EXPOSE 5000

Multistage builds are an efficient way to create smaller containers with specific applications inside. By separating the build and deployment phases, multistage builds allow for a much smaller image size while still giving the application all the necessary dependencies to run. Examples of multistage builds include building a Java application with a Java base image and a Python application with a Python base image. Additionally, using lighter versions of technologies and efficient hosting services can also help to reduce the size of the container. In conclusion, multistage builds are a great way to reduce the size of containers and ensure that applications have all the necessary dependencies to run.

References:
1. Source code: https://github.com/53jk1/Quickstart-for-GitHub-Actions
2. Documentation: https://docs.docker.com/build/building/multi-stage/

--

--