ci: Better build caching for CI (#2742)

* ci: Cache builds by splitting into two jobs For the cache to work properly, we need to derive a cache key from the build context (files that affect the Dockerfile build) instead of the cache key changing by commit SHA. We also need to avoid a test suite failure from preventing the caching of a build, thus splitting into separate jobs. This first attempt used `upload-artifact` and `download-artifact` to transfer the built image, but it has quite a bit of overhead and prevented multi-platform build (without complicating the workflow further). * ci: Transfer to dependent job via cache only While `download-artifact` + `docker load` is a little faster than rebuilding the image from cached layers, `upload-artifact` takes about 2 minutes to upload the AMD64 (330MB) tar image export (likely due to compression during upload?). The `actions/cache` approach however does not incur that hit and is very quick (<10 secs) to complete it's post upload work. The dependent job still gets a cache-hit, and the build job is able to properly support multi-platform builds. Added additional notes about timing and size of including ARM builds. * ci: Move Dockerfile ARG to end of build When the ARG changes due to commit SHA, it invalidates all cache due to the LABEL layers at the start. Then any RUN layers implicitly invalidate, even when the ARG is not used. Introduced basic multi-stage build, and relocated the container config / metadata to the end of the build. This avoids invalidating expensive caching layers (size and build time) needlessly.
2022-08-28 11:42:42 +12:00 · 2022-08-28 11:42:42 +12:00 · 21fbbfabe1
parent 672e9cf19a
commit 21fbbfabe1
2 changed files with 151 additions and 51 deletions
--- a/.github/workflows/test_merge_requests.yml
+++ b/.github/workflows/test_merge_requests.yml
@ -14,60 +14,145 @@ on:
 permissions:
  contents: read

+# `actions/cache` does not upload a new cache until completing a job successfully.
+# To better cache image builds, tests are handled in a dependent job afterwards.
+# This way failing tests will not prevent caching of an image. Useful when the build context
+# is not changed by new commits.
 jobs:
-  build-and-test:
+  job-build-image:
    runs-on: ubuntu-20.04
+    outputs:
+      image-build-key: ${{ steps.derive-image-cache-key.outputs.digest }}
    steps:
-      - name: Checkout
+      - name: 'Checkout'
        uses: actions/checkout@v3
        with:
+          # Required for image to include `configomat.sh`:
          submodules: recursive

-      - name: Set up QEMU
-        uses: docker/setup-qemu-action@v2.0.0
+      # Can potentially be replaced by: `${{ hashFiles('target/**', 'Dockerfile', 'VERSION') }}`
+      # Must not be affected by file metadata changes and have a consistent sort order:
+      # https://docs.github.com/en/actions/learn-github-actions/expressions#hashfiles
+      # Keying by the relevant build context is more re-usable than a commit SHA.
+      - name: 'Derive Docker image cache key from content'
+        id: derive-image-cache-key
+        shell: bash
+        run: |
+          ADDITIONAL_FILES=(
+            'Dockerfile'
+            'VERSION'
+          )

-      - name: Set up Docker Buildx
-        uses: docker/setup-buildx-action@v2.0.0
-        id: buildx
+          # Recursively collect file paths from `target/` and pipe a list of
+          # checksums to be sorted (by hash value) and finally generate a checksum
+          # of that list, using `awk` to only return the hash value (digest):
+          IMAGE_CHECKSUM=$(\
+            find ./target -type f -exec sha256sum "${ADDITIONAL_FILES[@]}" {} + \
+              | sort \
+              | sha256sum \
+              | awk '{ print $1 }' \
+          )

-      - name: Cache Docker layers
+          echo "::set-output name=digest::${IMAGE_CHECKSUM}"
+
+      # Attempts to restore the build cache from a prior build run.
+      # If the exact key is not restored, then upon a successful job run
+      # the new cache is uploaded for this key containing the contents at `path`.
+      # Cache storage has a limit of 10GB, and uploads expire after 7 days.
+      # When full, the least accessed cache upload is evicted to free up storage.
+      # https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows
+      - name: 'Handle Docker build layer cache'
        uses: actions/cache@v3
        with:
          path: /tmp/.buildx-cache
-          key: ${{ runner.os }}-buildx-${{ github.sha }}
+          key: cache-buildx-${{ steps.derive-image-cache-key.outputs.digest }}
+          # If no exact cache-hit for key found, lookup caches with a `cache-buildx-` key prefix:
+          # This is safe due to cache layer invalidation via the image build context.
          restore-keys: |
-            ${{ runner.os }}-buildx-
+            cache-buildx-

-      - name: Build images locally
+      # Support ARM64 builds on AMD64 host:
+      - name: 'Set up QEMU'
+        uses: docker/setup-qemu-action@v2.0.0
+        with:
+          platforms: arm64
+
+      # Enables `buildx` support within `build-push-action`, improving cache and platform support:
+      - name: 'Set up Docker Buildx'
+        uses: docker/setup-buildx-action@v2.0.0
+
+      # NOTE: AMD64 can build within 2 minutes, ARM adds 13 minutes. 330MB each
+      # ARMv7 can build in parallel, adding no extra time (but does add 150MB cache size).
+      # Moving ARM build to a separate job would cut down time to start running tests.
+      - name: 'Build images'
        uses: docker/build-push-action@v3.1.1
        with:
-          builder: ${{ steps.buildx.outputs.name }}
          context: .
-          file: ./Dockerfile
          build-args: |
            VCS_REF=${{ github.sha }}
            VCS_VER=${{ github.ref }}
-          platforms: linux/amd64,linux/arm/v7,linux/arm64
          tags: mailserver-testing:ci
-          cache-to: type=local,dest=/tmp/.buildx-cache
+          # Build for AMD64 (runs against test suite) and ARM64 (only to verify building works):
+          platforms: linux/amd64,linux/arm64
+          # Paired with steps `actions/cache` and `Replace cache` (replace src with dest):
+          # NOTE: `mode=max` is only for `cache-to`, it configures exporting all image layers.
+          # https://github.com/docker/buildx/blob/master/docs/reference/buildx_build.md#cache-from
+          cache-from: type=local,src=/tmp/.buildx-cache
+          cache-to: type=local,dest=/tmp/.buildx-cache-new,mode=max
+          # This job just builds the image and stores to cache, no other exporting required:
+          # https://github.com/docker/build-push-action/issues/546#issuecomment-1122631106
+          outputs: type=cacheonly

-      - name: Build image for test suit
+      # WORKAROUND: The `cache-to: type=local` input for `build-push-action` persists old-unused cache.
+      # The workaround is to write the new build cache to a different location that replaces the
+      # original restored cache after build, reducing frequency of eviction due to cache storage limit (10GB).
+      # https://github.com/docker/build-push-action/blob/965c6a410d446a30e95d35052c67d6eded60dad6/docs/advanced/cache.md?plain=1#L193-L199
+      # NOTE: This does not affect `cache-hit == 'true'` (which skips upload on direct cache key hit)
+      - name: 'Replace cache'
+        run: |
+          rm -rf /tmp/.buildx-cache
+          mv /tmp/.buildx-cache-new /tmp/.buildx-cache
+
+  job-run-tests:
+    name: 'Run Test Suite'
+    needs: job-build-image
+    runs-on: ubuntu-20.04
+    steps:
+      - name: 'Checkout'
+        uses: actions/checkout@v3
+        with:
+          # Required to retrieve bats (core + extras):
+          submodules: recursive
+
+      # Get the cached build layers from the build job:
+      # This should always be a cache-hit, no new uploads should happen when this job finishes:
+      - name: 'Retrieve image build from build cache'
+        uses: actions/cache@v3
+        with:
+          path: /tmp/.buildx-cache
+          key: cache-buildx-${{ needs.job-build-image.outputs.image-build-key }}
+          restore-keys: |
+            cache-buildx-
+
+      # Importing from the cache should create the image within approx 30 seconds:
+      # buildx not needed as no exporting and only single AMD64 platform is loaded:
+      - name: 'Build AMD64 image from cache'
        uses: docker/build-push-action@v3.1.1
        with:
-          builder: ${{ steps.buildx.outputs.name }}
          context: .
-          file: ./Dockerfile
          build-args: |
            VCS_REF=${{ github.sha }}
            VCS_VER=${{ github.ref }}
-          platforms: linux/amd64
+          tags: mailserver-testing:ci
+          # Export the built image for the Docker host to use:
          load: true
-          tags: mailserver-testing:ci
+          # Rebuilds the AMD64 image from the cache:
+          platforms: linux/amd64
          cache-from: type=local,src=/tmp/.buildx-cache

-      - name: Run test suite
-        run: >
-          NAME=mailserver-testing:ci
-          bash -c 'make generate-accounts tests'
+      - name: 'Run tests'
        env:
          CI: true
+        run: |
+          NAME=mailserver-testing:ci
+          make generate-accounts tests
--- a/69
+++ b/69
@ -1,7 +1,12 @@
-FROM docker.io/debian:11-slim
+# This Dockerfile provides two stages: stage-base and stage-final
+# This is in preparation for more granular stages (eg ClamAV and Fail2Ban split into their own)
+
+#
+# Base stage provides all packages, config, and adds scripts
+#
+
+FROM docker.io/debian:11-slim AS stage-base

-ARG VCS_VER
-ARG VCS_REF
 ARG DEBIAN_FRONTEND=noninteractive

 ARG FAIL2BAN_DEB_URL=https://github.com/fail2ban/fail2ban/releases/download/0.11.2/fail2ban_0.11.2-1.upstream1_all.deb
@ -10,27 +15,6 @@ ARG FAIL2BAN_GPG_PUBLIC_KEY_ID=0x683BF1BEBD0A882C
 ARG FAIL2BAN_GPG_PUBLIC_KEY_SERVER=hkps://keyserver.ubuntu.com
 ARG FAIL2BAN_GPG_FINGERPRINT="8738 559E 26F6 71DF 9E2C  6D9E 683B F1BE BD0A 882C"

-LABEL org.opencontainers.image.version=${VCS_VER}
-LABEL org.opencontainers.image.revision=${VCS_REF}
-LABEL org.opencontainers.image.title="docker-mailserver"
-LABEL org.opencontainers.image.vendor="The Docker Mailserver Organization"
-LABEL org.opencontainers.image.authors="The Docker Mailserver Organization on GitHub"
-LABEL org.opencontainers.image.licenses="MIT"
-LABEL org.opencontainers.image.description="A fullstack but simple mail server (SMTP, IMAP, LDAP, Antispam, Antivirus, etc.). Only configuration files, no SQL database."
-LABEL org.opencontainers.image.url="https://github.com/docker-mailserver"
-LABEL org.opencontainers.image.documentation="https://github.com/docker-mailserver/docker-mailserver/blob/master/README.md"
-LABEL org.opencontainers.image.source="https://github.com/docker-mailserver/docker-mailserver"
-
-# These ENVs are referenced in target/supervisor/conf.d/saslauth.conf
-# and must be present when supervisord starts.
-# If necessary, their values are adjusted by target/scripts/start-mailserver.sh on startup.
-ENV FETCHMAIL_POLL=300
-ENV POSTGREY_AUTO_WHITELIST_CLIENTS=5
-ENV POSTGREY_DELAY=300
-ENV POSTGREY_MAX_AGE=35
-ENV POSTGREY_TEXT="Delayed by Postgrey"
-ENV SASLAUTHD_MECH_OPTIONS=""
-
 SHELL ["/bin/bash", "-o", "pipefail", "-c"]

 # -----------------------------------------------
@ -292,10 +276,41 @@ RUN chmod +x /usr/local/bin/*

 COPY ./target/scripts/helpers /usr/local/bin/helpers

+#
+# Final stage focuses only on image config
+#
+
+FROM stage-base AS stage-final
+ARG VCS_REF
+ARG VCS_VER
+
 WORKDIR /
-
 EXPOSE 25 587 143 465 993 110 995 4190
-
 ENTRYPOINT ["/usr/bin/dumb-init", "--"]
-
 CMD ["supervisord", "-c", "/etc/supervisor/supervisord.conf"]
+
+# These ENVs are referenced in target/supervisor/conf.d/saslauth.conf
+# and must be present when supervisord starts. Introduced by PR:
+# https://github.com/docker-mailserver/docker-mailserver/pull/676
+# These ENV are also configured with the same defaults at:
+# https://github.com/docker-mailserver/docker-mailserver/blob/672e9cf19a3bb1da309e8cea6ee728e58f905366/target/scripts/helpers/variables.sh
+ENV FETCHMAIL_POLL=300
+ENV POSTGREY_AUTO_WHITELIST_CLIENTS=5
+ENV POSTGREY_DELAY=300
+ENV POSTGREY_MAX_AGE=35
+ENV POSTGREY_TEXT="Delayed by Postgrey"
+ENV SASLAUTHD_MECH_OPTIONS=""
+
+# Add metadata to image:
+LABEL org.opencontainers.image.title="docker-mailserver"
+LABEL org.opencontainers.image.vendor="The Docker Mailserver Organization"
+LABEL org.opencontainers.image.authors="The Docker Mailserver Organization on GitHub"
+LABEL org.opencontainers.image.licenses="MIT"
+LABEL org.opencontainers.image.description="A fullstack but simple mail server (SMTP, IMAP, LDAP, Antispam, Antivirus, etc.). Only configuration files, no SQL database."
+LABEL org.opencontainers.image.url="https://github.com/docker-mailserver"
+LABEL org.opencontainers.image.documentation="https://github.com/docker-mailserver/docker-mailserver/blob/master/README.md"
+LABEL org.opencontainers.image.source="https://github.com/docker-mailserver/docker-mailserver"
+# ARG invalidates cache when it is used by a layer (implicitly affects RUN)
+# Thus to maximize cache, keep these lines last:
+LABEL org.opencontainers.image.revision=${VCS_REF}
+LABEL org.opencontainers.image.version=${VCS_VER}