auto-scaling of ViewServer renderer pod based on request queue length
problem statement
we currently have already 5 customer-facing (and additional test) instances of ViewServer running on our cluster for AGRI/EO-WIDGET, each instance has a fixed static number of replicas for the renderer pods configured, i.e. resources are claimed and reserved on cluster even if not used
goal
allow to claim only minimum number of resources per default and scale up if needed (note: while scale to zero for 0..n renderer pods would be even nicer I'm happy with solutions with 1..n renderer pods with 1 as default)
there is an out of the box concept in kubernetes called HPA https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/ which scales up based on resource metrics, but per default only based on memory and cpu consumption
in our case memory and cpu of the renderer pod are not the relevant metrics, a good indicator would be the http request queue length of the http server (gunicorn or similar) to trigger autoscaling behavior
there is another kubernetes project https://keda.sh/ which provides queue based metrics so hope it can be leveraged for http request queues too (in addition to queues from brokers like rabbitmq, kafka,...), if this is not a feasible way I propose to expose the request queue length from gunicorn as a renderer metric on prometheus endpoint and try to connect HPA to this custom metric
to sum up: we should investigate in the direction of these 2 components and check if they work alongside or one of them can be extended properly to optimize our cluster utilization
@stephan.meissl @fabian.schindler @mallingerb