I’ve always stressed that it should never be Kubernetes or Serverless, but rather: What is the right tool for the job. Both technologies are great and both technologies carry their own baggage.
This week as I was working on some design and thought about its implementation I had to decide whether I should add a Kubernetes CronJob or make it a lambda service. Sometimes the decisions are quite obvious but other times they aren’t as cut and dry. Here is a little bit of my framework for making the decisions on which technology I will use:
Does it require access to Kubernetes Resources?
This right away might strike you as: Well if it requires Kubernetes resources it’s a no brainer that it should be a Kubernetes cronjob! But this I think is more nuanced.
First of all, if it requires access to an internal service that is not exposed via a load balancer, then it’s pretty difficult to argue that it should be a lambda. That is, unless you want to go through the trouble of exposing that resource. I’ve found we often do not wish to do so, so it’s easier to just use a cronjob. Then you have access to everything.
As an example, we have a job that looks for new files uploaded into an s3 bucket. If they are found, we call on one of our kubernetes services to do the processing. (It could be argued that this processing job should never have been in Kuberntes in the first place, but alas, here we are. )
However, suppose I want to make a task that responds as soon as a new file is uploaded to an S3 bucket that doesn’t need the Kubernetes internal endpoint. A lambda makes a lot more sense here because it responds right away. In addition, as I teach in my Kubernetes class, the lambda can still call Kubernetes actions. So if we just make a job then we can have lambda invoke that job on Kubernetes and it would have access to the internal endpoint.
So just because it requires internal access to Kubernetes, doesn’t mean its got to be a cronjob. Lambdas still can work great in this case.
How long will the job run?
Some of the jobs I’ve been making take about 45 minutes to run. This is a pretty strong case that you should use a Kubernetes cronjob. Not much of an argument here to use Lambdas. They aren’t meant for long running tasks like this.
Are you storing data?
If you are downloading gobs of data, then most likely the job is going to take a long time, so see the previous question. But if its short lived, the lambdas only allow you to save into the /tmp directory and they aren’t meant for that sort of thing. With Kubernetes cronjobs we have more flexibility in this department as we can mount any amount of storage we want to the pod. I’ve never had to do this with a Lambda, but I suppose you could push into and S3 bucket. But again, I find that large data files usually take longer than a quick 1 minute run and that’s about all I’m comfortable with when it comes to running lambda tasks.
This week I’ve been making quite a few jobs in our ingestion pipeline and it’s been fun because some of them have been lambdas and others just cronjobs. Each one is purposefully fit, but these three questions seem to help me the most in making the choice on which to run.