readyz and livez: Getting started with Kubernetes health endpoints using NodeJS/Express

Klaus Hofrichter
14 min readJan 10, 2022

It’s easy to add some Kubernetes lifecycle management to a NodeJS/Express application. Let’s do it in practice.

Photo by Hush Naidoo Jade Photography on Unsplash

This article discusses the health endpoints /readyz and /livez as part of a series of articles around a complete local K3D cluster installation. You will end up with a system on your local machine that invites you to try things on your own.

You may have heard about these endpoints before, and there is Kubernetes documentation here and a discussion about pod lifecycle there. But before going any further, what’s the deal with the “z”? After research, it seems that this is just because these words would not be used normally, so routes using these words would likely not collide with existing routes. Ok, now you know, and we can move on.

We are using the existing system to show how such endpoints can be used in Kubernetes because we recently explored the scaling of a NodeJS workload in that environment. There is a lot of launching and destroying of Pods going on, and so it makes sense to have a better understanding of how Kubernetes handles the lifecycle. All sources are in this repository: https://github.com/klaushofrichter/keda-scaling, and you need to follow the instructions in this article to install everything. The setup comes with Grafana, Prometheus, Alertmanager, Fluent-bit, Influxdb, and a few other things, but the focus is now on the NodeJS application.

Update February 13, 2022: There is a newer version of the software here with several updates

TL;DR: The super-short instruction is this: if you are using Windows, install Docker Desktop, use WSL to create a Linux environment, clone the repository, run setup.sh and start.sh on the Linux side and that’s it.

NodeJS / Express Application

The application server.js is not doing so much: it offers two APIs, one that provides information about the application itself, and one that returns a random number. That could probably be done in 10 lines of code, but a lot of instrumentation and support is built around the simple function:

This is certainly not a “production application” — it is intended to demonstrate something and to make it easy to modify it and see what happens. To see more logs, we need to change the loglevel and re-deploy the application.

You should now go ahead and launch the cluster with ./start.sh. Then please open server.js with your favorite editor, find line 11 where the loglevel is set, and change debug to trace.

var debuglevel="trace";

Then save, and redeploy:

$ ./app-deploy.sh

This script builds a new image, removes the existing application deployment, and creates a new deployment. This is a bit brute-force, and not how Kubernetes should be used (use alternatively app-update.sh for a rolling update), but by using app-deploy.sh you can be sure that possible other changes are also applied, e.g. manifest details in app.yaml.template.

Either way, after doing this, you can go to this Application Information Grafana Dashboard and see the logs that are produced. When the application launches, it invents a name for itself because that is easier to remember than the machine-generated pod identifier, and it issues a series of log output. You should see something that shows all loglevels on the dashboard. The screenshot below shows the Dashboard after a fresh start with ./start.sh based on the original repository, and a subsequent ./app-update.sh after setting the log level to trace. We are going to have a closer look at the three areas that are marked.

Grafana Dashboard “myapp Application Information” after initial start and update with loglevel=trace

There is already a lot to see… If you don’t see it like this, you can try to change the time interval in the top right, depending on how fast you re-deployed, some logs may be already moved out to the left.

The next sections go into detail about what the three areas marked above are about.

The Red Zone: Initial Launch

Assuming you see the screen similar to the picture above, and you can identify the Red Zone: Please use the mouse and mark the elements of the Red Zone. This will set the time interval for this Dashboard, and all queries are redone for that time interval, including the panels on top and below the time-series panel in the middle. You may sometimes lose the application information on top if you select a small time interval. Ignore that, and ideally, it now looks like this:

Dashboard after selecting a small time interval

This shows logs on the timeline as well as text in the table below. We are launching two replicas, in this case, they called themselves “Tillie the Xerus” and “Charlie the Ostrich”. Let's focus on Charlie by clicking on the Pod Name filter symbol and picking Charlie only. Then scroll the table to the beginning (i.e. all the way down). You should see this:

Initial Logs for the specific Pod “Charlie the Ostrich”

This is where the application outputs all log levels. This does NOT include “trace”, because this is the initial version with the loglevel="debug" setting.

In the source you can see the code fragment:

server.js — log test output

trace does not show in the log output, because we only transport debug and lower; see the setting marked up in the code.

The log output “…started listening” is coming when the app is starting to listen on the port, see line 120. Now scrolling the logs table all the way up, you can see something like this:

At the bottom, there is another output related to the listening for connections, but then we see /readyz in action. The log shows a call to /readyz every four seconds, indicating that the application is not ready, and then it says “ready”. How does that happen?

Some explanation of readyz: Kubernetes uses that API to find out if the pod is able to receive calls. When the application says “no” through a return code like 500, the app will not receive calls to the worker APIs /info or /random. Once the app says “yes” by returning 200, the pod will receive API calls. Here is the related specification in app.yaml.template:

Fragment of app.yaml.template

We are using the httpGet method for the readynessProbe (there are other methods). We define the path /service/readyz, say that Kubernetes should check every four seconds and that it’s OK to fail once.

There is also the livez specification, basically doing the same, but with an added initialDelaySeconds: 10 spec. That means that there should be no check for livez until after 10 seconds of pod launch.

Quick side-note about the /service prefix: We are using a single port, 8080 per default, for multiple applications. The ingress-nginx controller helps to sort this out. The prefix /service is reserved for our NodeJS app, /prom for Prometheus, and /alert for Alertmanager. Grafana is on the root.

Going back to the NodeJS application: We artificially delay the “ready” signal by 10 seconds through a timer. In real life, this can be more or less time, and the app would use this time to connect to a database, do some authentication, or do other time-consuming things. Once the preparation is done, the app signals Kubernetes that it is ready to answer calls by returning a 200 code on /readyz. In our case, a timeout triggers this function:

You can see the output “App is ready” in the logs. The express-route code is simple, either returning 200 or 500, depending on the ready flag:

This is in effect the /readyz handler. We will verify later how traffic is directed to the pod once we get to “ready”.

The Blue Zone: API Calls

To see the logs related to the blue area in the earlier screenshot, you will need to zoom out a few times until you can see the logs in the time-series, and then mark the area.

It might look like this:

These are four calls to the API, coincidently two /service/info calls to one of the pods (“Charlie”) and two /service/random calls to the other (“Tillie”). This happened as part of the ./start.sh script once everything is deployed. The /service/random calls generate two log entries each, one to record the route and one on debug level to show the random number that is generated. The NodeJS application looks like this:

The Orange Zone: Rolling Update

You may want to zoom out again and select the area marked in orange earlier:

Here we are looking at the logs for the app-update.sh call, after we changed the loglevel to trace. If you used app-deploy.sh instead if app-update.sh you will have a different sequence — the existing pods Charlie and Tillie are removed, and two new replicas come in. In case of app-update.sh we see an overlap, i.e. there is no downtime.

Let's look at Charlie by filtering for the Pod Name as shown in the screenshot above. In the selected time interval we have a total of four pods. Sadly, the table filter does not apply to the timeline… but we can see that there are two info logs, and we have the timestamp. You can filter out everything other than info logs by clicking on info in the timeline legend. The two Charlie logs are marked in the picture below:

You can see that there is a log “Got SIGTERM” and another ten seconds later “Process Exit”. Kubernetes is in the process of removing the initial pods and launching new ones because we changed the image. A pod that is supposed to go away receives a SIGTERM, and is given time to actually shut down. For example, data could be saved, connections closed, etc. If the pod does not exit on its own, SIGKILL will end the process after 30 seconds per default. It is possible to change that value in the manifest by setting a terminationGracePeriodSeconds as part of the containers spec. In our case, we handle SIGTERM ourselves before SIGKILL happens.

Once we get a SIGTERM, a timer starts which calls exit() after 10 seconds. That is what you see in the logs. In the real world, the time would be used to do something useful, like closing connections and finishing other things. But importantly, we are setting the internal state.isReady flag to false. That, in turn, causes the /readyz API to indicate that the Pod is not ready anymore. But in practice, Kubernetes does not call /readyz or /livez after a SIGTERM anymore. The call to /readyz happens every four seconds — we would not see this in the log here as this is a trace level output, and the Charlie pod is on loglevel debug. Later we have a closer look at this with loglevel trace.

We have been able to exit ourselves before Kubernetes uses SIGKILL, this is certainly good behavior. The pod goes through some phases from Kubernetes point of view, like “terminating”, which you can see in tools like Lens or using kubectl describe, but these are not easily visible from the NodeJS point of view.

Looking at the other “old” pod, Tillie, we can see that this one is exiting before Charlie.

Two new pods have been created before Charlie and Tillie stopped answering API calls: These new pods are calling themselves Winnie and Stanley. Winnie is launching first, and we can see the colorful markup of the logs, now including the trace level. Winnie goes through the same start procedure.

Filtering for Winnie, we can now see that the /livez endpoint is called. This happens after the launch every 30 seconds per specification in app.yaml.template. This is relatively seldom to reduce the load a bit, as we are always returning 200 for this call.

If the app does not return 200 it will be taken out and shutdown after the failureThreshhold is reached. However, we don’t demonstrate this here.

That concludes this scenario, which does not include load, other than the few calls in the Blue Zone.

Generating Load and managing Traffic

We’ll have another view on our setup: we are using the same configuration, but remove Keda to avoid automatic scaling as we do that manually. We will go down to zero pods and generate some traffic with app-traffic.sh

$ ./keda-undeploy.sh      # prevent autoscaling
$ ./app-scale.sh 0 # remove all existing replicas
$ ./app-traffic.sh 0 1 # call the API continously every second

In a separate terminal window, we scale up to one replica, and replace that replica with a new pod to see what happens in detail. Depending on your timing, we may receive errors from app-traffic.sh because we have no pods, but this can be ignored.

$ ./app-scale.sh 1        # launch one replica
$ ./app-update.sh # replace the one existing replica

On the Dashboard, you should be able to see something like this with three distinct zones:

The Pink Zone: Scale to zero

Zooming into the Pink Zone will show that there are just two log entries, showing that the two previous pods Winnie and Stanley exited…

The Green Zone: Launching a Pod while getting calls

The Green Zone shows what happens to a single pod while the Cluster receives traffic. We launched the pod that named itself Eleanor. Eleanor goes through the same motions as we have discussed before, however, there is a load coming to the cluster.

The call to app-traffic.sh 0 1 generates an infinite sequence of calls with one second between the calls. The terminal shows that we receive a 503, Temporarily Unavailable when there is no pod handling the traffic. At some point, the calls are satisfied with a 200. We will try to find that moment in the logs:

We can expect this to happen about 10 seconds after pod launch, because of the ready signal 10 seconds after launch.

Here we marked the first time where a call was answered by Eleanor, and it happened right after the app declared itself to be ready for the service. That shows that Kubernetes honors the /readyz signal, as the /random or /info APIs could have been accessed much earlier.

The Yellow Zone: Exchanging Pods

We now look at the rolling update: Eleanor is going to be replaced with Randall. We find the moment where Eleanor receives the SIGTERM, and starts the 10-second timeout until exit.

But something strange happens…: after SIGTERM, Eleanor still receives and answers API calls. There are no more /readyz calls as Kubernetes knows that the service is going to disappear and would not be available anytime. But it still routes traffic to that pod?

So back to the drawing board? After some experimentation with longer timeout until SIGKILL after SIGTERM, and various settings for periodSeconds (i.e. the frequency of the /readyz and /livez probes): in our system, there is a period of up to four seconds where the app still receives calls to the API after SIGTERM. There are no calls to /readyz or /livez right after SIGTERM. So it looks like that the load balancer removes the pod only with a few seconds delay.

In our application, this does not matter much, as the app can answer the calls. But in the real world, e.g. when files or connections are closing, this may lead to failed calls, because the app may not be able to respond to API calls once some resources are gone. A solution would be a two-stage shutdown timeout, one with about four or five seconds of continued service, and then a subsequent second one for the actual cleanup,

However, that sounds pretty ugly, and this is concerning. Despite looking for details about this, I don’t have an explanation or documentation yet. Perhaps a reader may have insights to share? There might be some problem with my interpretation of the specification… The question is: after SIGTERM coming from Kubernetes, is the API supposed to be called? If so, for how long?

Also of note, we did not use /healthz in this context, as it is depreciated. We also did not explore PreStop hooks, as these are different concepts — it should be possible to manage this with the probes alone. Opinions?

Update January 10, 2022: Here is a Medium article saying “Ingress controllers get an updated list of endpoints with some delay, and new traffic continues to be sent to the pod.” — That confirms what we saw in our logs. The quoted article refers to another article, saying “The pod should largely ignore [SIGTERM], keep running, and keep serving new connections.
What’s the best practice to deal with this? Different from the quoted article that uses Nginx, we have full control over our application. The two-stage shutdown suggested above is easy to implement but feels non-deterministic as the time until no more calls come through probably depends heavily on the load balancer behavior.

Update January 13, 2022: It turns out that in fact, you can expect API calls to a pod after SIGTERM, just as we have observed because the loadbalancer takes a little while to sort things out that remove the pod from the rotation. The issue with this is that you want to respond to these calls to avoid a client getting a 500 response. Right now there seems no obvious way to see from the pod’s point of view when the LB is updated. So one possible solution is to start a pre-shutdown timer when SIGTERM occurs, in our case perhaps after eight or ten seconds to have a good margin to the observed four seconds until calls don’t come through anymore. After that eight to ten seconds the real graceful shutdown would start: returning 500 on the APIs /random and /info, removing any database connections or close files (if that is needed in your app) and then eventually call process.exit() before SIGKILL comes through, which is expected 30 seconds after SIGTERM. If 30 seconds are not enough: you can extend this time with the attribute terminationGracePeriodSeconds as part of the container spec. Note that these changes are not implemented in the repository, but these would be part of the repository of future articles.

Update January 19, 2022: the updated software with a better implementation of a graceful shutdown is here.

Where to go from here?

It turns out very concretely that it is possible to use this setup for learning and investigation. An important question came up, which should be somewhat well understood, but for some reason, no simple answer came up yet. This section will be updated once there is a better understanding.

Other than that, it is possible here to try more experiments with /livez, e.g. not answering the calls or not returning 200. Depending on the failureThreshhold, the pod should be taken out automatically.

Also, some experiments with PreStop and PostStart hooks can be done to do even more with the Kubernetes pod lifecycle features.

--

--