Stop (ab)using allow failure

Jul 24, 2022 · 2 min read

We’ve all had our CI fail due to an issue that couldn’t be fixed right away. Usually something like a security issue in a downstream package that can’t be updated yet. This then causes the build to fail, making us unable to merge or deploy. allow_failure seems like a great way to solve this, but it is not.

allow_failure is setting us up for failure. Because now any issue with this check is ignored. And even after the vulnerable dependency is updated, the allow_failure stays. Meaning next time you won’t get an error when there is a vulnerability.

Instead, allow your MR to be merged even when it is red, and allow the deploy step to be executed, even if previous steps fail. This should be discussed with the team beforehand, as we don’t want to merge and deploy when a unit test is failing, or if another check is broken. You only want to do this when you simply can’t fix the issue right away.

In gitlab CI you can use the needs keyword to specify what steps are exactly needed for in order to run the current step. So you can do something like this. Where the 2 build steps have no needs, and the deploy steps only require the build steps. Meaning that even if the other steps fail, you can still run these manual steps.

tests:
  #checks

security:
  #checks

php-build-release:
  stage: build
  needs: []
  # build

nodejs-build-release:
  stage: build
  needs: []
  # build

deploy-test-release:
  stage: deploy
  when: manual
  needs:
    - php-build-release
    - nodejs-build-release
  # deploy

deploy-prod-release:
  stage: deploy
  when: manual
  needs:
    - php-build-release
    - nodejs-build-release
  # deploy

So stop using allow_failure, as it will stop you from seeing new errors popping up. Instead, allow your pipeline to be red for a short while. But do make sure your team can handle this responsibility.

Gert de Pagter
Authors
Software Engineer
My interests include software development and math.