Improve renderer liveness check
Problem: When the renderer receives many requests at the same time, all of its workers are busy for a longer period of time and the kubernetes liveness check runs in a timeout, which leads to the pod being killed (and the currently processed requests being lost)
As a quick fix, we could increase failureThreshold
or periodSeconds
.
For a proper fix, we could change the liveness probe to use something that doesn't need a worker, e.g. using the tcpSocket
to just check the port (does this work without worker?) or execute some kind of command in the container checking the liveness (open port, running process).
Note that the current behavior is correct for the readiness check (no OK response should lead to this pod not receiving more requests).