Simple scheduled workloads using Google Cloud

Simple scheduled workloads using Google Cloud

ยท

6 min read

๐Ÿ”” This article was originally posted on my site, [MihaiBojin.com](https://MihaiBojin.com/projects/golang/scheduled-jobs-in-gcp?utm%5Fsource=Hashnode&utm%5Fmedium=organic&utm%5Fcampaign=top-promo "MihaiBojin.com"). ๐Ÿ””


Let's say you need to run a job periodically. How would you do it?

The traditional way is to provision a VM (somewhere), run crontab, and call whatever script/binary you want.

This is simple, and it works, but it has some challenges...

  • What if your machine breaks down?

  • What if it restarts but can't stop?

  • What if someone gets access to it?

I could probably come up with ten more such questions. The point is that this approach is risky.

You just don't know when it's going to break on you. And sooner or later, it eventually does, usually at the most inopportune time!

So you now have another problem: monitoring and alerting. That's usually two problems. The complexity keeps expanding.

Sure it was solved many times before, but why bother? Time is money, as they say! In today's world, that sentence is more accurately represented as "time has a missed cost of opportunity!"


Enter Google Cloud and the task at hand...

For my current pet project, I need to periodically process new documents. It's irrelevant what the task is. The point is more about scheduling and running these workloads.

After a bit of digging, I came up with the following stack:

Broken down into the following steps:

  • Push the code to GitHub; CI/CD (I won't detail this part, I spoke a bit about it in my previous article)

  • Build the code

  • Package the binary as a Docker container

  • Push the container to Google Artifact Registry

  • Run the container as a Google Cloud Run Job

  • Periodically run the job to process new work

Nothing crazy here, but I figure writing all this stuff down might someday help someone; I think it's always easier to start from a working example!

Before you start, install and configure the GCloud CLI!

Build a Golang app

Golang is a great choice nowadays for most workloads. Its ecosystem is thriving; you'll find a library for pretty much everything; it's simple to learn and code in; and it is excellent for generating statically linked libraries, resulting in tiny containers (e.g., ~13MB not 120MB+ like for a Java app)

Dockerize it

The best way to make your containers as small as possible is to use Docker's multi-stage builds.

Roughly, this means building the app on a standard image and then copying only the resulting artifacts to "scratch" (Docker's name for basically 'just the host OS's kernel').

Here's what the Dockerfile looks like:


# syntax = docker/dockerfile:1.3

FROM golang:1.18-buster as builder

# Install and update root SSL certificates

RUN apt-get update \

&& export DEBIAN_FRONTEND=noninteractive \

&& apt-get -y install --no-install-recommends \

ca-certificates \

&& apt-get clean -y \

&& update-ca-certificates

# Cache go dependencies to avoid downloading them on every rebuild

WORKDIR /app

ENV CGO_ENABLED=0

COPY go.* ./

RUN go mod download

# Copy local code to the container image.

COPY . .

# Build the binary using buildkit's cache to speed up rebuilds

# https://github.com/moby/buildkit/blob/master/frontend/dockerfile/docs/syntax.md#example-cache-go-packages

ARG TARGETOS

ARG TARGETARCH

RUN --mount=type=cache,target=/root/.cache/go-build GOOS=${TARGETOS} GOARCH=${TARGETARCH} go build -ldflags "-s -w" -v -o app .

# Make the smallest possible container

# https://docs.docker.com/develop/develop-images/multistage-build/#use-multi-stage-builds

FROM scratch

COPY --from=builder /app/app /app/app

COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/

CMD ["/app/app"]

A few notes about this step:

  • You could define a much simpler Dockerfile; it really depends on how many times you expect to be rebuilding it - since this is meant to run in Google Cloud, you can assume a remote build process will run and redeploy on every commit, and most likely will not have a pre-existing cache, hence making all of this unnecessary; locally, it makes a big difference, which is why I implemented like so

  • My code needs to connect to other services over TLS; the "scratch" lacks root certificates; the definitions above adds them in by installing the ca-certificates package and ensuring the most recent certs are included (update-ca-certificates; finally it includes them in the final result via COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/)

  • If you're building on an M1 mac, remember that arm != x64; as such, your Docker build command should target the correct architecture for your intended runtime platform (in GCP that's x86_64), i.e.: docker build --platform linux/x86_64 ...

Push the container to Artifact Registry

First, create a repo; I use Terraform for this; IaaC FTW! ๐Ÿ˜‰


variable "gcloud_region" {

default = "europe-west3" # your region of choice

}

variable "gcloud_project" {

default = "..." # your GCP project ID

}

# GCloud Run Jobs are not fully GA yet, and you need to use the beta provider for now

provider "google-beta" {

project = var.gcloud_project

region = var.gcloud_region

zone = format("%s-c", var.gcloud_region) # desired AZ, usually something like: 'europe-west3-c'

}

resource "google_artifact_registry_repository" "containers" {

provider = google-beta

location = var.gcloud_region

repository_id = "containers" # the repo's name

format = "DOCKER"

lifecycle {

prevent_destroy = true # avoids accidentally deleting all published images

}

}

Given an existing repository (in my case REPO=containers), pushing your image is dead-easy:

  • tag the image: docker tag "${IMAGE}" "${REGION}-docker.pkg.dev/${GCP_PROJECT_ID}/${REPO}/${IMAGE}" (obviously, with the above variables correctly set)

  • configure Docker to push to GCP: gcloud auth configure-docker --project ${GCP_PROJECT_ID} --quiet ${REGION}-docker.pkg.dev

  • push the image: docker push "${REGION}-docker.pkg.dev/${GCP_PROJECT_ID}/${REPO}/${IMAGE}"

  • and optionally, check that it was correctly uploaded gcloud artifacts docker images list "${REGION}-Docker.pkg.dev/${GCP_PROJECT_ID}/${REPO}/${IMAGE}"

Run the container as a Google Cloud Run Job

I won't get into a lot of details here; it should be pretty self-explanatory; run one gcloud command to define the job.

Unfortunately, the GCP Terraform Provider doesn't yet support creating jobs.


gcloud beta run jobs create ${JOB_NAME} \

--region "${REGION}" \

--image "${REGION}-docker.pkg.dev/${GCP_PROJECT_ID}/containers/roles.tech/${APP_NAME}:latest" \

--tasks 1 \ # Only run one task

--max-retries 0 \ # Do not retry

--set-env-vars "$KEY1=$VALUE1,$KEY2=$VALUE2" # set any env vars required by the running code

Small tip here: if you define the job via the UI, it will use whichever version you select; use ...:latest to always run the latest Docker image, and avoid having to update the job's definition on every code rebuild!

Trigger periodic executions using a CronJob

Once more, Terraform to the rescue:


# retrieves the default service account

# this is an anti-pattern, and you're probably better off creating a dedicated service account

data "google_compute_default_service_account" "default" {

}

variable "job_name" {

default = "..." # the name of your job ($JOB_NAME above)

}

resource "google_cloud_scheduler_job" "cronjob-name" {

name = "cronjob-name"

schedule = "0 * * * *" # run every hour on the dot

time_zone = "Etc/UTC" # UTC timezone

attempt_deadline = "15s"

retry_config {

retry_count = 0 # do not retry, if the job fails

}

# trigger the Cloud Run Job by calling its handler via HTTPS, with an empty body

http_target {

http_method = "POST"

uri = format("https://%s-run.googleapis.com/apis/run.googleapis.com/v1/namespaces/%s/jobs/%s:run", var.gcloud_region, var.job_name, var.gcloud_project)

body = ""

# authenticate via OAuth using the specified service account

oauth_token {

service_account_email = data.google_compute_default_service_account.default.email

}

}

}

And that's it! If you need a simple and reliable mechanism for periodically executing code, look no further than Google Cloud Run with a side of Terraform.

Opinions and suggestions are always welcome (find me on Twitter).

Thank you!


If you liked this article and want to read more like it, [please subscribe to my newsletter](motivated-founder-807.ck.page/db1cf284bf "newsletter link"); I send one out every few weeks!

ย