Alex Dixon

Gamedev, graphics, open source. Shuffling bytes and swizzling vectors since 2004...

Maintaining CI is a pain in the...

04 March 2025

An ongoing source of frustration is maintaining continuous integration in open source hobby projects. It’s really useful to have continuous builds, automated tests and package delivery, but it comes with maintenance. Time will pass and the time will come where I want to tag a build in git and let all my lovely automated CI publish a package, or maybe work on a project I haven’t touched for a while and I want to run the tests, but for what feels like more often than not, the build fails for an unexpected reason.

The problem is that even if very little changes in the source code the CI often fails for various reasons out of your control. It takes a while to get back into the headspace of how the build is configured and start debugging a problem. It’s really annoying when you just want to spend time working on something new and fun and are now sweating on what was supposed to be a relaxing Saturday morning, trying to fix tests and areas of the code that you didn’t intend on looking at. You end up with the “fix CI” commit history of death as you push changes and wait to see the results on a cloud hosted runner.

There are various reasons as to why this happens. I’ve just gone through a frustrating ordeal with updating my iOS distribution certificates that expired recently and so prevented me from publishing a new build of my iOS app diig. The app beta expired so I stopped being able to use it; this happens every 60 days and I havent had to make any changes to the app itself for a while so 60 days expires and I have to push a new build. I haven’t released the app to the AppStore to make it publicly available because it’s something I’m just using personally, the 60 day limit in itself is annoying but having to do the yearly certificate and provisioning profile update is even more so. I always forget all of the things you need, so for my future self here is the rough run down:

First you need a development certificate and a distribution certificate, you can create new certificates on the Apple Developer website in the Certificates Identifiers & Profiles section. You need to create a certificate signing request which can be done through Keychain Access > Certificate Assistant > Request a Certificate from a Certificate Authority.

The certificates (.cer files) can be downloaded and then imported into the keychain and then exported as a .p12 file with a password. The password here is stored in GitHub Actions as a secret. The .p12 files can be encoded as base64: base64 -i dist.p12. The output in the console is copied into another secret. Here I have something along the lines of DEV_P12 and DIST_P12.

Next, a provisioning profile is required for both development and distribution that can be generated from the Certificates Identifiers & Profiles section as well. I created an iOS development profile and selected the development certificate, same for the iOS distribution profile.

The profiles are added to the repository (they should probably also be secret, but this was how the build was already set up) and copied into the ~/Library/MobileDevice/Provisioning Profiles folder on the build agent.

Finally, everything should build because the actions yml file does the file copying, the base64 decoding and all of that jazz. But I was wrong, the build was still failing. The error was that Xcode did not have a valid provisioning profile. OK, then maybe something was up with the certs or the profiles, I revoked them, generated them again and was extra careful about them making sure the right cert was named the right thing, the pasted secrets didn’t have any extraneous characters or mistakes. Try building again, same error. Maybe just redo the certs and profiles again? Just to be sure. Still the same problem!

At this point tagging builds (burned 5 tags) and pushing, waiting for the dreaded CI failure was annoying. So I decided to see what I could do locally on my machine to reproduce the issue more rapidly. The problem with this is that the keychain has working provisioning profiles that are managed by Xcode so I am able to build locally and that was why I didn’t try this sooner. I need to build on an external machine that has no such user account connected to Xcode.

I realised I was able to look in the ~/Library/MobileDevice/Provisioning Profiles folder and see the older stale profiles (from the last time I set this up). Ahh, I can delete those ones and see if I can reproduce the issue using the archive command line:

xcodebuild archive -workspace build/ios/diig_ios.xcworkspace -configuration Release -scheme diig -archivePath build/ios/diig_ios OTHER_CODE_SIGN_FLAGS="--keychain $KEYCHAIN_PATH" PROVISIONING_PROFILE="digiosdev" CODE_SIGN_STYLE="Manual" -verbose

Error: Xcode requires a valid provisioning profile.

But the profile digiosdev is clearly there in the folder so why does Xcode complain there is not a provisioning profile? Copilot was able to help me here and it suggested using PROVISIONING_PROFILE_SPECIFIER instead of PROVISIONING_PROFILE.

Problem solved. This took me a few hours on a Saturday morning before leaving to meet friends and then a further few hours the next day to fiddle around and get the build working again. I did all the certificate and provisioning profile stuff correctly the first time and it’s annoying that for some reason since updating the profile PROVISIONING_PROFILE_SPECIFIER was necessary for it to be picked up, maybe it could be due to an Xcode update? Apple has a tendency to change things a lot, deprecate APIs, make changes to signing and distribution, it’s painful to keep up at times.

But herein lies the crux of this all, even if you don’t change a thing yourself the world around you can change and that can cause build systems to suddenly fail.

This has happened to me countless times. Python environment setup has changed multiple times on different platforms and for different projects over the years. Things which cause my pip setup to fail so I hack around to find the working one. Could it be pip3 or python3 -m pip or py -3 -m pip, maybe using brew to install Python instead. I don’t know, just hack until it works again.

Android builds on Linux have been another immense pain in my android-studio project. Android also has a tendency to change a lot, you have a lot that goes into it: SDK, NDK, Gradle, Kotlin, Java, CMake and Ninja and even more build systems in there, all of these changing over time cause headaches, especially if you haven’t touched the thing for a year or something and somebody comes along with a small PR and all of a sudden the CI is broken. At one time I had to forcibly downgrade the Java version on the actions runner because it caused a known crash in the Android studio licensing agreement, this fixed it for a while but then the Java version I needed became unavailable to GitHub actions and I had to upgrade and find other fixes… thanks also to PR contributors on helping to maintain the CI on that project.

Another frustrating session of CI fixing came in my Rust graphics engine hotline. A Rust compiler update in conjunction with a particular bevy_ecs version started to cause a hard to diagnose crash in my tests. It only happened in the tests I couldn’t reproduce in a standalone build and also couldn’t reproduce in a single test. It was only when all tests ran (in single threaded) and eventually one would crash. I spent weeks on this, only half an hour or so after work but chipping away at it, trying to debug it and make sense of what was going on. I had particular difficulty because I had no symbols or callstack, I rolled back my code to a known working version where the tests passed and it was published to crates.io and it was still crashing. In the end the fix was to update bevy_ecs, which sounds straight forward but it took me a while to attribute it to bevy_ecs and updating required me to fix API breaking changes in my code, it was not simply a case of a version bump. Frustrating to spend a few weeks trying to fix these tests for an unrelated reason to what I wanted them to be used for, to help me implement new features without breaking existing functionality.

Another perplexing issue with a Rust project was when it began to fail compilation, even though no changes to code were made. The reason was that an external dependency had an updated version, this particular crate had been patched and patching only applies to a specific version of a crate. Since the version changed the patch was not applied and the unpatched version did not compile. This is where I discovered about explicit versioning in cargo and how even with a full version specifier cargo may try to change or update a dependency version to make the best fit within the cargo tree. In this case the solution was to commit the cargo lock file and user cargo build —frozen to make the CI more stable. Easy fix but unexpected symptoms always cause alarm at first.

Some conclusions I can draw from these scenarios: I could run the CI periodically so issues get caught sooner and not just when I am making changes, but that would cause the same kind of frustration and might even be worse for a hobby project where I would be alerted CI now is broken and now I know I have to fix it at some point, it might detract from other projects. Using custom docker images would help to lock down the versions of software running the builds, I don’t know much about that though so it’s something to look into. Cargo.lock proved a good solution to enforce stable versioning in Rust projects. At least there are some solutions to help improve reliability, but they don’t help with the issue such as PROVISIONING_PROFILE_SPECIFIER it threw me off totally, I was so close the first time to getting it right and this completely screwed me, macOS is constantly updating, forcing you to update Xcode and forcing you to face and fix these problems head on.

Maintaining CI is a pain in the proverbial. In a production environment, for a job and with a team it’s a necessity and you generally have better coverage, but for a solo project it’s great to have but a burden to maintain. Even if nothing changes on your side the world changes around us, sometimes you just gotta suck it up and fix it.