Don't waste your time on assets compilation on Heroku

23 February 2022 — Published originally on blog.arkency.com

Don't waste your time on assets compilation on Heroku

At some point, you may want or be forced to use the CDN to serve assets of your Rails app. When your app is globally available, you may want to serve the assets from strategically located servers around the world to provide the best possible experience for the end user. Serving static assets via Puma is not the best idea — it'll be slow. The only viable option on Heroku is to use CDN. I will show you how to do it smart, save time and have faster deployments

Difference between Push and Pull CDN

There are two types of CDNs. Push and Pull. Push approach which basically acts as like the origin server. Assets are requested for client directly from it. The only downside is that we need to deliver the assets to CDN on our own, but it's not that hard as it sounds. If we used the Pull CDN, it would do it for us, but initial request for a user would be sluggish and rewriting URLs is a no–go. Btw. Amazon found every 100ms of latency cost them 1% in sales — big money on the table.

Existing solutions for pushing assets to CDN

There are quite few solutions available, the most popular is probably asset_sync gem. Basically, it hooks into assets:precompile and syncs assets with given S3 bucket (or other provider). I don't like implicit hooks. It also happens during deployment adding more time to it. On Heroku, all the assets and their sources, like "beloved" node_modules contribute to slug size. It's easy to be far away from their soft—limit (300MB) which contributes to slower deployments because of longer compression time.

Our way

What if I tell you that assets can be compiled on CI, in parallel with the test suite and pushed to CDN, so they're instantly available as soon as the app is released?

How it started: >8 minutes from push to master to release

How is it going: ~2 minutes from push to master to release

The process

assets are precompiled using pretty modern stack on CI in parallel while the tests are running,
CI uploads them to CDN's bucket along with manifest file,
custom Heroku buildpack downloads manifest,
during build phase, asset precompilation is skipped since manifest is in place,
app is released and links assets from CDN,
build time and slug size are saved

bin/rails assets:precompile in a modern way

Our current stack is sprockets with esbuild, cssbundling-rails, tailwind along with postcss and cssnano.

One day we'll switch to Propshaft, all the preceding steps makes us closer to it.

We went with CloudFront, producing gzipped versions of assets is obsolete since the CDN can do it on our behalf. It'll even pick the best compression algorithm for client's browser like brotli instead of good 'ol gzip.

# config/environments/production.rb
config.asset_host = ENV.fetch("ASSET_HOST")
config.assets.compile = false
config.assets.gzip = false

We build the assets as a separate workflow on Github actions

# .github/workflows/assets.yml
name: CDN assets

on:
  workflow_dispatch:
  push:
    branches:
      - master

env:
  RAILS_ENV: test
  RAILS_MASTER_KEY: ${{ secrets.RAILS_TEST_MASTER_KEY }}

jobs:
  assets:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v2

      - name: Setup Ruby
        uses: ruby/setup-ruby@v1
        with:
          ruby-version: 3.1.0
          bundler-cache: true

      - name: Setup Node
        uses: actions/setup-node@v2
        with:
          node-version: 16.x
          cache: "npm"

      - name: Install dependencies
        run: |
          npm install --no-fund --no-audit

      - name: Build assets
        env:
          ASSET_ENV: production
          GIT_COMMIT: ${{ github.sha }}
        run: |
          bin/rails assets:precompile

As you can see, we use limited number of ENV variables here, we replaced RAILS_ENV=production with ASSET_ENV=production for assets:precompile. It required simple tweaks in esbuild's and postcss's configs. Since we don't do minification with Sprockets, we don't need the full Rails env here. It happens within seconds now.

Uploading the files

# lib/cdn_assets.rb
require "aws-sdk-s3"

class CdnAssets
  def initialize(
    pool: Concurrent::FixedThreadPool.new(10),
    root: Rails.root,
    client: Aws::S3::Client.new(
      {
        access_key_id: ENV["AWS_ACCESS_KEY_ID"],
        secret_access_key: ENV["AWS_SECRET_ACCESS_KEY"],
        region: ENV["AWS_REGION"],
      },
    ),
    mk_manifest_path: -> { Sprockets::Railtie.build_manifest(Rails.application).path },
    mk_commit_sha: -> { ENV.fetch("GIT_COMMIT") { `git rev-parse --verify HEAD`.strip } }
  )
    @pool = pool
    @root = root
    @client = client
    @mk_manifest_path = mk_manifest_path
    @mk_commit_sha = mk_commit_sha
  end

  def upload(commit_sha = mk_commit_sha.())
    _synced_files = synced_files

    puts "Uploading #{_synced_files.size} missing files"

    _synced_files.each do |path, _|
      pool.post do
        content_type = detect_content_type(path)
        params = { bucket: bucket, key: path, body: body(path), acl: acl }
        params[:content_type] = content_type if content_type
        client.put_object(params)

        puts path
      end
    end

    puts "Uploading manifest files"

    [manifest_of(commit_sha), latest_manifest].each do |destination_manifest_path|
      client.put_object(
        bucket: bucket,
        key: destination_manifest_path,
        body: File.read(mk_manifest_path.()),
        acl: acl,
        content_type: "application/json",
      )
      puts destination_manifest_path
    end

    pool.shutdown
    pool.wait_for_termination
  end

  private

  attr_reader :pool, :root, :client, :mk_manifest_path, :mk_commit_sha

  def manifest_of(commit_sha)
    "assets/manifest-#{commit_sha}.json"
  end

  def latest_manifest
    "assets/manifest-latest.json"
  end

  def detect_content_type(path)
    MIME::Types.type_for(path).first&.content_type
  end

  def body(path)
    Pathname.new(prefix).join(path).read
  end

  def synced_files
    local_files - remote_files
  end

  def local_files
    Dir
      .chdir(root.join(prefix)) { Dir.glob("**/*").reject { |path| File.directory?(path) } }
      .map { |relative_path| [relative_path, digest_for(relative_path)] }
  end

  def digest_for(relative_path)
    Digest::MD5.hexdigest(body(relative_path))
  end

  def remote_files
    client
      .list_objects(bucket: bucket)
      .flat_map { |response| response.contents.map { |file| [file.key, normalize_etag(file.etag)] } }
  end

  def normalize_etag(etag)
    etag.delete_prefix("\"").delete_suffix("\"")
  end

  def bucket
    ENV["AWS_BUCKET"]
  end

  def acl
    "public-read"
  end

  def prefix
    "public"
  end
end

We used FixedThreadPool to upload files in parallel. Concurrent Ruby is a great library to do this, for sure it's already present in your codebase since it's a dependency for ActiveSupport, dry-rb or one and only RailsEventStore.

Important optimisation is listing files present in the bucket along with their ETags, we can compare those with the ones to be sent and only upload files which name or content differs. It's especially important to compare not only name for non–digested files. We upload everything from Rails public directory, eg. 422.html — no digest here, file could change and it would be omitted during upload while relying on its path only (or key when using S3 vocabulary). S3 can produce ETag in few ways, check which applies to your scenario in the documentation. For our case it's Digest::MD5.hexdigest of a file content.

Telling S3 what is the Content-Type of uploaded files is a must. If it's not provided, it'll do a best guess. However, browser won't run application.js with Content-Type: binary/octet. Guessing the content type is not where it shines unfortunately.

Rails expect that .sprockets-manifest-totallyrandomdigest.json will be present in public/assets when the app starts. Yep, digest included in manifest filename is totally random and Rails detects it based on path and regex matching the name. We use same mechanism to find desired file: Sprockets::Railtie.build_manifest(Rails.application).path. After that we're able to upload it under a known and predictable name: manifest-$COMMIT_SHA.json. We produce manifest-latest.json as a fallback in case something went wrong and we haven't delivered manifest referencing released commit.

Rake task for the ease of use:

# lib/tasks/cdn_assets.rake
namespace :cdn_assets do
  desc "Distribute public/assets to CDN"
  task :upload do
    CdnAssets.new.upload
  end
end

Adding missing step to .github/workflows/assets.yml:

     - name: Push assets to AWS S3
        env:
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          AWS_REGION: eu-central-1
          AWS_BUCKET: fancy-bucket
          GIT_COMMIT: ${{ github.sha }}
        run: |
          bin/rails cdn_assets:upload

Download manifest on Heroku

Having predictable Sprockets manifest name allows us to download it on Heroku using carefully crafted buildpack. What it does is downloading manifest-$COMMIT_SHA.json or the fallback one to public/assets/$ASSET_MANIFEST_PATH. $ASSET_MANIFEST_PATH can be something like: public/assets/.sprockets-manifest-5ad1cd2a52740dfb575f43c74d6f3b0e.json. It doesn't need to change in time, it's name doesn't reference content, it has to match sprockets lookup pattern.

Save even more time and slug size

You want to run cdn manifest buildpack before heroku/ruby default buildpack. Rails will skip assets:precompile because of manifest file being in place. You earn some time here and you can later limit your slug size and build time by skipping installing node, running yarn or npm by creating .slugignore file:

package.json
package-lock.json
yarn.lock