Hakyll CI builds in one fifth of the time

As I've mentioned before, one of the big pain points relating to this blogging business is deployment time: Haskell is slow to compile and Hakyll has multiple large dependencies, so the builds would initially take up to an hour. Yeah, you read that right. 60 minutes 😱. Something goes wrong towards the end of the build? Sucks to be you.

Thanks to Saksham Sharma and their post on speeding up Haskell CI builds, however, I have been able to bring it down to 7-8 minutes in GitLab's CI/CD systems (excluding time spent waiting for runners to spin up etc.). That said, it wasn't quite as easy as I'd hoped it would be (when is it ever?): Due to how Stack and Nix interact, building of the site would crash when it ran into UTF-8-encoded characters. Not cool.

Let's fix it.

Step 1: using an image with Hakyll pre-built

In Sharma's post, they mention that they've created an image that you can use for your build systems. The simplest version would look a little something like this (freely updated from their minimal configuration example):

image: sakshamsharma/docker-hakyll:v3

pages:
  script:
    - stack build
    - stack exec site build
  # ... rest of stage omitted

An important thing to note is that your stack config's resolver must match the one used in the Docker image, otherwise the build system would have to recompile Hakyll and its dependencies for your version, taking us back to the hour-long builds.

For v3, the resolver is lts-12.21, so make sure your project's stack.yaml contains the following line:

resolver: lts-12.21

If this works for you and is all you need: great! If it doesn't and you get errors talking about invalid byte sequences like the one below: don't panic. I'll sort you out.

Compiling
  [ERROR] ./about.rst: hGetContents: invalid argument (invalid byte sequence)

Step 2: This one weird trick

As described in this GitHub issue, a fix for the above error is available in Stack's master branch and as of Stack v2.1---the release candidate for which was released while I was writing this post---will be included with the tool.

From the release notes for the release candidate: "Use en_US.UTF-8 locale by default in pure Nix mode so programs won't crash because of Unicode in their output".

So if you're using Stack v2.1 or later, the steps outlined in this section should not be necessary.

As evidenced by a fair few GitHub issues^[1], this is something that a number of users run into and it might be difficult to troubleshoot, but what it boils down to is this: When running Stack in Nix mode it defaults to building in pure mode. This isolates the build environment by removing environment variables and other things on your system that could influence the build and lead to a lack of reproducibility. This is usually a good thing, but it also unsets the LANG variable, which Stack relies on to know how it should handle encodings.

Ok. So all we gotta do is re-set that variable, then? Yes. But how to do that might not be immediately apparent. You might be used to running shell commands like this:

MY_VAR="my-value" ls -lah

But this won't work with Stack, because it'll still isolate the environment. What you can do, however, is to use the --no-nix-pure option. This tells Stack not to isolate the build environment, so you'll still be able to access external variables. Here's an extract from my current build file that does just that:

image: sakshamsharma/docker-hakyll:v3
script:
  - stack build
  - stack exec --no-nix-pure site build

This works perfectly on GitLab's CI runners, but if this still doesn't solve your issue, you might want to check what the locale is actually set to by using the locale shell command. The output should look something like this:

$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

If the output doesn't show a UTF-8 format, that seems like a good place to start (I'd try EXPORT LANG=en_US.UTF-8 before running the Stack commands), but now we're wading out past the scope of this post, so you're gonna have to go it on your own, I'm afraid. Sorry, kiddo.

Wrapping up

And that's it! Simple, but not immediately obvious. It's likely that a similar approach---the prepared Nix container---would work for other Haskell projects as well, though I can't say for certain one way or the other.

Footnotes

^[1] A selection of GitHub issues relating to the unicode problem: