auto-scaling of ViewServer renderer pod based on request queue length

problem statement

we currently have already 5 customer-facing (and additional test) instances of ViewServer running on our cluster for AGRI/EO-WIDGET, each instance has a fixed static number of replicas for the renderer pods configured, i.e. resources are claimed and reserved on cluster even if not used

goal

allow to claim only minimum number of resources per default and scale up if needed (note: while scale to zero for 0..n renderer pods would be even nicer I'm happy with solutions with 1..n renderer pods with 1 as default)

there is an out of the box concept in kubernetes called HPA https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/ which scales up based on resource metrics, but per default only based on memory and cpu consumption

in our case memory and cpu of the renderer pod are not the relevant metrics, a good indicator would be the http request queue length of the http server (gunicorn or similar) to trigger autoscaling behavior

there is another kubernetes project https://keda.sh/ which provides queue based metrics so hope it can be leveraged for http request queues too (in addition to queues from brokers like rabbitmq, kafka,...), if this is not a feasible way I propose to expose the request queue length from gunicorn as a renderer metric on prometheus endpoint and try to connect HPA to this custom metric

to sum up: we should investigate in the direction of these 2 components and check if they work alongside or one of them can be extended properly to optimize our cluster utilization

@stephan.meissl @fabian.schindler @mallingerb

Edited Jan 21, 2022 by Stefan Achtsnit