VS issueshttps://gitlab.eox.at/esa/prism/vs/-/issues2021-11-22T17:45:36+01:00https://gitlab.eox.at/esa/prism/vs/-/issues/128Using common data exchange format between components2021-11-22T17:45:36+01:00Fabian SchindlerUsing common data exchange format between components# Introduction
This is a collection of ideas and concepts with their respective advantages and drawbacks in their use.
The basic idea is to specify a common data exchange format encoding for most of the communications between the compo...# Introduction
This is a collection of ideas and concepts with their respective advantages and drawbacks in their use.
The basic idea is to specify a common data exchange format encoding for most of the communications between the components. The intention is to better decouple the systems and allow for better composability of the available components and potentially future ones.
It is hereby proposed to use STAC Items as a data exchange format between the components. The STAC Items are transient, in the sense that they are only put into the queues and not stored on volumes/buckets. [`VSQ`](https://gitlab.eox.at/esa/prism/vsq) allows to embed the STAC items into the JSON message structure. used.
General advantages are:
- it is possible to encode the footprint of the product directly in the JSON (but it can be set to `null` if not immediately available)
- there are several Python libraries available to digest, create or transform items (e.g: [PySTAC](https://github.com/stac-utils/pystac) and [stac.py](https://github.com/brazil-data-cube/stac.py)) but they are optional, as it is sometimes easier to simply work with the raw Python objects.
- it combines the data/metadata assets with readily available metadata values.
- referenced assets are not required to be on the same storage, allowing more flexibility
- the transient nature eliminates the requirement to create sidecar files to store metadata from one component to the next (such as GSC files generated in the preprocessor for the registrar)
Disadvantages:
- some concepts are harder to represent with STAC, such as data directories (object storage prefixes)
- it is not automatically clear how to deal with missing metadata. e.g: the `geometry` could be `null`, but how would the components handle that?
- verbosity. As the whole STAC Item is put into the queues, it may not be handy anymore to directly inspect the queues without additional tools.
## Components involved with registration/ingestion
This listing details what each component inputs/outputs and an assessment how the new format could be of use.
### *Ingestor*
- Input: Browse Report XML files
- Outputs: custom JSON format (basically translation of XML -> JSON) which currently only the preprocessor is able to handle properly
- Assessment: The custom JSON format could easily be replaced with the STAC Item format, which would standardize it, and allow for an easier integration with other components.
### *Preprocessor*
- Input: Object storage prefix or custom JSON format
- Output: Object storage prefix
- Assessment: Arguably, this component would benefit the most of a switch to STAC Items. Using the `assets` it is easily distinguishable which assets are of interest. Also, metadata of the input STAC Item could simply be passed through, without the preprocessor being required to understand it. In essence, only the asset links would have to be replaced or enriched with the processed items.
### *Registrar*
- Input: Object storage prefix
- Output: none
- Assessment: The current approach is not very stable. Several "schemes" are tried and checked whether they can be applied to be registered. Unifying this to STAC Items would greatly reduce the number of code paths. Metadata from the STAC Item could easily be handed through and mapped to the internal metadata model. It could be interesting to allow to forward the registered item to the next queue, so that the registrar is not necessarily the "dead-end" of the whole ingestion queue. (e.g: to start seeding the registered product)
### *Seeder*
- Input: ???
- Output: none
- Assessment: This component is currently not implemented in the new VS. In theory, it could retrieve seeding requests in the form of STAC Items to get the region and time of interest to seed.
### *Harvester*
- Input: custom JSON or raw values
- Output: tbd
- Assessment: currently there is no data format defined, STAC Items would be a "natural" fit as STAC API is actually one of the intended backends. Some backends may be more tricky though: e.g: object storage listings are not easily translatable into STAC Items without actually reading metadata files at that location. Some OADS outputs (`.index` files, basically just CSV) could actually map quite nicely into STAC Items.
## Usage example
### Harvester -> Preprocessor -> Registrar -> Seeder
In this example scenario, the Harvester queries an external catalogue and either passes through the STAC Items or transforms them to that format. The items are written to the queue and the harvester is oblivious of which component is the next in the chain.
The preprocessor has an immediate list of files (`assets`) to work with. There is usually no need to retrieve additional metadata, but if necessary a referenced metadata file can be opened to read that. It processes selected files from the assets, and creates a copy of the STAC Item input file and adds the preprocessed files as new assets. All other metadata is kept for other components to digest. This new STAC Item is send to the next queue.
The registrar receives the STAC Item and based on its contents and the configuration starts the registration into its backends. If successful, the STAC Item is passed on the the next component without modification.
The seeder uses the stored spatiotemporal information in the STAC Item to start the seeding process.ViewServer 2.0https://gitlab.eox.at/esa/prism/vs/-/issues/160auto-scaling of ViewServer renderer pod based on request queue length2022-01-24T13:21:42+01:00Stefan Achtsnitauto-scaling of ViewServer renderer pod based on request queue length### problem statement
we currently have already 5 customer-facing (and additional test) instances of ViewServer running on our cluster for AGRI/EO-WIDGET, each instance has a fixed static number of replicas for the renderer pods configur...### problem statement
we currently have already 5 customer-facing (and additional test) instances of ViewServer running on our cluster for AGRI/EO-WIDGET, each instance has a fixed static number of replicas for the renderer pods configured, i.e. resources are claimed and reserved on cluster even if not used
### goal
allow to claim only minimum number of resources per default and scale up if needed (note: while scale to zero for 0..n renderer pods would be even nicer I'm happy with solutions with 1..n renderer pods with 1 as default)
---
there is an out of the box concept in kubernetes called HPA https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/ which scales up based on resource metrics, but per default only based on memory and cpu consumption
in our case memory and cpu of the renderer pod are **not** the relevant metrics, a good indicator would be the http request queue length of the http server (gunicorn or similar) to trigger autoscaling behavior
there is another kubernetes project https://keda.sh/ which provides queue based metrics so hope it can be leveraged for http request queues too (in addition to queues from brokers like rabbitmq, kafka,...), if this is not a feasible way I propose to expose the request queue length from gunicorn as a renderer metric on prometheus endpoint and try to connect HPA to this custom metric
to sum up: we should investigate in the direction of these 2 components and check if they work alongside or one of them can be extended properly to optimize our cluster utilization
@stephan.meissl @fabian.schindler @mallingerb2022-02-04https://gitlab.eox.at/esa/prism/vs/-/issues/157Update User Guide and Design Document2021-11-16T21:36:20+01:00Stephan Meißlstephan.meissl@eox.atUpdate User Guide and Design DocumentThe [attached RIDs](/uploads/f4e02dd481f8a2f229dda90982fa75ca/View_Server_Documents_RIDs.xlsx) should be addressed.The [attached RIDs](/uploads/f4e02dd481f8a2f229dda90982fa75ca/View_Server_Documents_RIDs.xlsx) should be addressed.Lubomir DoležalLubomir Doležal2021-11-12https://gitlab.eox.at/esa/prism/vs/-/issues/156legend for the dem processed layers2021-11-10T22:17:49+01:00Mussab Abdallalegend for the dem processed layersDem processed layers (hillshade, aspect, slope and contour) should have a legend that reflects the color scale values vs band values.Dem processed layers (hillshade, aspect, slope and contour) should have a legend that reflects the color scale values vs band values.https://gitlab.eox.at/esa/prism/vs/-/issues/155setting parameters for dem processed rendering2021-11-10T22:20:24+01:00Mussab Abdallasetting parameters for dem processed renderingthe renderer should support custom wms parameters for setting additional rendering parameters for hillshade, aspect, slope and contour layersthe renderer should support custom wms parameters for setting additional rendering parameters for hillshade, aspect, slope and contour layershttps://gitlab.eox.at/esa/prism/vs/-/issues/154add navigation/zoom button to 3D viewer2021-11-10T15:15:43+01:00Mussab Abdallaadd navigation/zoom button to 3D viewerthere should be a button to control/reset the view orientation and zoom in 3D, most of the solution includes plugins to Cesium widget (e.g https://github.com/alberto-acevedo/cesium-navigation)there should be a button to control/reset the view orientation and zoom in 3D, most of the solution includes plugins to Cesium widget (e.g https://github.com/alberto-acevedo/cesium-navigation)https://gitlab.eox.at/esa/prism/vs/-/issues/153provide a GetCapability button per collection2021-11-10T15:15:21+01:00Mussab Abdallaprovide a GetCapability button per collectionadd a button that links to -for example- https://dem.pdas.prism.eox.at/ows?service=WMS&request=GetCapabilitiesadd a button that links to -for example- https://dem.pdas.prism.eox.at/ows?service=WMS&request=GetCapabilitieshttps://gitlab.eox.at/esa/prism/vs/-/issues/150Add minimum resources for the client to User Guide2021-10-16T12:11:14+02:00Stephan Meißlstephan.meissl@eox.atAdd minimum resources for the client to User GuideMussab AbdallaMussab Abdallahttps://gitlab.eox.at/esa/prism/vs/-/issues/149Registrar swift token does not refresh sometimes after expiring2021-11-21T16:24:12+01:00Lubomir DoležalRegistrar swift token does not refresh sometimes after expiring- registrar sometimes fails to find a file after running for a long time without restart (probably auth token expired and is not refreshed for some reason) - restart of registrar fixes this, Logs: `...file.tif not recognized as a support...- registrar sometimes fails to find a file after running for a long time without restart (probably auth token expired and is not refreshed for some reason) - restart of registrar fixes this, Logs: `...file.tif not recognized as a supported file format.`
The reason is probably expiring of the auth token and in some cases not refreshing for some reason.
full trace:
```
Sep 16, 2021 @ 07:11:37.234 RuntimeError: `/vsiswift/emg-data/data26/0000571902/PH1A_PHR_FUS__3_20210129T093100_20210129T093102_TOU_1234_a568.DIMA.tar/IMG_PHR1A_PMS_202101290931008_ORT_5928860101_R1C1.tif' not recognized as a supported file format. /csea-emg-pdas_registrar.1.wsydxbtuqxzex8ejroynlzb2z
Sep 16, 2021 @ 07:11:37.234 return _gdal.Open(*args) /csea-emg-pdas_registrar.1.wsydxbtuqxzex8ejroynlzb2z
Sep 16, 2021 @ 07:11:37.234 File "/usr/lib/python3/dist-packages/osgeo/gdal.py", line 2978, in Open /csea-emg-pdas_registrar.1.wsydxbtuqxzex8ejroynlzb2z
Sep 16, 2021 @ 07:11:37.233 ds = gdal.Open(vsi_path) /csea-emg-pdas_registrar.1.wsydxbtuqxzex8ejroynlzb2z
Sep 16, 2021 @ 07:11:37.233 File "/usr/local/lib/python3.8/dist-packages/registrar-1.4.7-py3.8.egg/registrar/backend.py", line 215, in _register_with_registrator /csea-emg-pdas_registrar.1.wsydxbtuqxzex8ejroynlzb2z
Sep 16, 2021 @ 07:11:37.233 product = self._register_with_registrator(source, item, replace, storage, mapping) /csea-emg-pdas_registrar.1.wsydxbtuqxzex8ejroynlzb2z
Sep 16, 2021 @ 07:11:37.233 File "/usr/local/lib/python3.8/dist-packages/registrar-1.4.7-py3.8.egg/registrar/backend.py", line 377, in register /csea-emg-pdas_registrar.1.wsydxbtuqxzex8ejroynlzb2z
Sep 16, 2021 @ 07:11:37.233 return func(*args, **kwds) /csea-emg-pdas_registrar.1.wsydxbtuqxzex8ejroynlzb2z
Sep 16, 2021 @ 07:11:37.233 File "/usr/lib/python3.8/contextlib.py", line 75, in inner /csea-emg-pdas_registrar.1.wsydxbtuqxzex8ejroynlzb2z
Sep 16, 2021 @ 07:11:37.233 backend.register(source, context, replace=False) /csea-emg-pdas_registrar.1.wsydxbtuqxzex8ejroynlzb2z
Sep 16, 2021 @ 07:11:37.233 File "/usr/local/lib/python3.8/dist-packages/registrar-1.4.7-py3.8.egg/registrar/registrar.py", line 59, in register_file /csea-emg-pdas_registrar.1.wsydxbtuqxzex8ejroynlzb2z
Sep 16, 2021 @ 07:11:37.233 register_file(config, value, replace) /csea-emg-pdas_registrar.1.wsydxbtuqxzex8ejroynlzb2z
Sep 16, 2021 @ 07:11:37.233 File "/usr/local/lib/python3.8/dist-packages/registrar-1.4.7-py3.8.egg/registrar/daemon.py", line 56, in run_daemon /csea-emg-pdas_registrar.1.wsydxbtuqxzex8ejroynlzb2z
Sep 16, 2021 @ 07:11:37.233 Traceback (most recent call last): /csea-emg-pdas_registrar.1.wsydxbtuqxzex8ejroynlzb2z
Sep 16, 2021 @ 07:11:37.233 ERROR registrar.daemon: `/vsiswift/emg-data/data26/0000571902/PH1A_PHR_FUS__3_20210129T093100_20210129T093102_TOU_1234_a568.DIMA.tar/IMG_PHR1A_PMS_202101290931008_ORT_5928860101_R1C1.tif' not recognized as a supported file format.
```https://gitlab.eox.at/esa/prism/vs/-/issues/148PRISM Backlog ingestion errors2021-09-29T09:09:29+02:00Lubomir DoležalPRISM Backlog ingestion errorsThis issue is to track ongoing backlog ingestion campaign on PASS started on 15.9.2021:
- ~~vhr18 registration failing because storage_auth created for original vhr18 is different from new the one expected by new registrar - models misma...This issue is to track ongoing backlog ingestion campaign on PASS started on 15.9.2021:
- ~~vhr18 registration failing because storage_auth created for original vhr18 is different from new the one expected by new registrar - models mismatch `ERROR registrar.daemon: duplicate key value violates unique constraint "backends_storage_name_key"`, fix the model and register the failed products manually~~
- ~~registrar sometimes fails to find a file after running for a long time without restart (probably auth token expired and is not refreshed for some reason) - restart of registrar fixes this, register failed products manually. Error to search for: `...file.tif not recognized as a supported file format.`~~ -> moved to https://gitlab.eox.at/esa/prism/vs/-/issues/149Lubomir DoležalLubomir Doležalhttps://gitlab.eox.at/esa/prism/vs/-/issues/1473D client performance2021-10-16T12:11:02+02:00Mussab Abdalla3D client performancehttps://gitlab.eox.at/esa/prism/vs/-/issues/146Tagged releases do not create images2021-09-10T18:36:04+02:00Lubomir DoležalTagged releases do not create imagesCurrently because our tagged releases start with word `release` they are not caught by the TAG_REGEX in ci_image_build.sh
https://gitlab.eox.at/esa/prism/vs/-/blob/staging/ci_image_build.sh#L11-15
Solution: there is a CI env variable `C...Currently because our tagged releases start with word `release` they are not caught by the TAG_REGEX in ci_image_build.sh
https://gitlab.eox.at/esa/prism/vs/-/blob/staging/ci_image_build.sh#L11-15
Solution: there is a CI env variable `CI_COMMIT_TAG` which can be used to know if we are doing this for a tag or not.
https://docs.gitlab.com/ee/ci/variables/#list-all-environment-variablesNikola JankovicNikola Jankovichttps://gitlab.eox.at/esa/prism/vs/-/issues/143Investigate current STAC translators2021-11-09T12:14:30+01:00Nikola JankovicInvestigate current STAC translatorsIt might be necessary to investigate properly current STAC translators, before deciding to write a separate one.It might be necessary to investigate properly current STAC translators, before deciding to write a separate one.Nikola JankovicNikola Jankovichttps://gitlab.eox.at/esa/prism/vs/-/issues/142Setup liveness probe for services2021-09-02T17:43:30+02:00Nikola JankovicSetup liveness probe for servicesAdd and update with following
> can you please make the livenessProbe in viewserver helm chart configurable, currently it is not possible to override default values
> - periodSeconds: 10
> - successThreshold: 1
> - timeoutSecond...Add and update with following
> can you please make the livenessProbe in viewserver helm chart configurable, currently it is not possible to override default values
> - periodSeconds: 10
> - successThreshold: 1
> - timeoutSeconds: 1
>
> I propose to increase timeoutSeconds (e.g. to 5 sec) and also to configure failureThreshold (default 3) to 10 - it seems that the viewserver is often busy with requests and therefore the individual livenessProbe requests won't go through and the pod is considered unhealthy (and gets restarted)
>
> if we can't improve situation that we should talk about possibilities to either priorizite livenessProbe requests (don't know if/how this can be done in python with gunicorn or similar) or to use different means to indicate liveness (implementation effort)Nikola JankovicNikola Jankovichttps://gitlab.eox.at/esa/prism/vs/-/issues/135Update to latest dind image and test run2021-08-25T13:39:59+02:00Nikola JankovicUpdate to latest dind image and test runNikola JankovicNikola Jankovichttps://gitlab.eox.at/esa/prism/vs/-/issues/134Parallelize CI/CD2021-08-25T13:39:59+02:00Nikola JankovicParallelize CI/CDTry by splitting building of services as separate tasks.Try by splitting building of services as separate tasks.Nikola JankovicNikola Jankovichttps://gitlab.eox.at/esa/prism/vs/-/issues/133Errors during preprocessing and registration state on 4.8.20212021-08-13T15:57:04+02:00Lubomir DoležalErrors during preprocessing and registration state on 4.8.2021Some of following errors were tracked in the operational system of PASS. Needs further investigation:
registration of what should be an existing file fails without other errors, needs inspection, happens on multiple collections (everywh...Some of following errors were tracked in the operational system of PASS. Needs further investigation:
registration of what should be an existing file fails without other errors, needs inspection, happens on multiple collections (everywhere where we gdal.Open the file to get number of bands)
```
ERROR registrar.daemon: `/vsiswift/emg-data/data26/0000433307/EW01_WV6_PAN_SO_20210726T144857_20210726T144858_DGI_77083_9ECF.0000.tar/21JUL26144857-P2AS-014234494010_02_P001.tif' not recognized as a supported file format.
```
registration error happening sometimes (not always) on demF collection - needs further investigation
```
INFO registrar.backend: Registering coverage [['dem-data', 'data24/0000448124/DEM1_SAR_DTE_90_20130502T113059_20140612T115113_ADS_000000_0460.DEM.tar/Copernicus_DSM_30_S83_00_E123_00_DEM.tif']] as int16_grayscale
ERROR registrar.daemon: get() returned more than one Grid -- it returned 2!
```
tried to fix this by adding config https://github.com/openshift/origin-aggregated-logging/blob/1a5481f25b91c38cf3a8d2b3522cffecb84bfa66/fluentd/configs.d/openshift/output-es-config.conf#L30 will see if that helps
```
failed to flush the buffer. retry_time=0 next_retry_seconds=2021-08-04 09:29:32 +0000 chunk="5c8b8701a194e5fb71ae5c51c2282ca2" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch\", :port=>9200, :scheme=>\"http\"}): read timeout reached"
```Lubomir DoležalLubomir Doležalhttps://gitlab.eox.at/esa/prism/vs/-/issues/132Add pdf version of Operator Guide2021-08-10T12:10:45+02:00Stephan Meißlstephan.meissl@eox.atAdd pdf version of Operator GuideCopy Makefile and build from User Guide and also add a similar link to the pdf.Copy Makefile and build from User Guide and also add a similar link to the pdf.Nikola JankovicNikola Jankovichttps://gitlab.eox.at/esa/prism/vs/-/issues/131GetEOCoverageSet requests2021-09-14T18:37:55+02:00Mussab AbdallaGetEOCoverageSet requestsGetEOCoverageSet requests returns an exception: `Missing required parameter 'format'`, even though `format` is not a mandatory parameter according to [OGC Standerds](https://docs.opengeospatial.org/is/10-140r2/10-140r2.html#geteocoverage...GetEOCoverageSet requests returns an exception: `Missing required parameter 'format'`, even though `format` is not a mandatory parameter according to [OGC Standerds](https://docs.opengeospatial.org/is/10-140r2/10-140r2.html#geteocoverageset_operation).
When provided the response returns that format is not supported (e.g: `'Format 'image/tiff' is not supported.'`) even if the format is supported- listed in element`wcs:formatSupported` in wcs `GetCapabilities` response -https://gitlab.eox.at/esa/prism/vs/-/issues/130contour rendering2021-11-22T16:04:32+01:00Mussab Abdallacontour renderingthe contours visualization does not render properly as seen in the image below:
![contour_error](/uploads/743e3ec883effefd55fd241ac41cc73d/contour_error.png)
I did a brief investigation, and it seems that the EOxServer does render the w...the contours visualization does not render properly as seen in the image below:
![contour_error](/uploads/743e3ec883effefd55fd241ac41cc73d/contour_error.png)
I did a brief investigation, and it seems that the EOxServer does render the wms requests fine. Below is a wms request for the same product above (`http://127.0.0.1:81/ows?SERVICE=WMS&VERSION=1.1.0&REQUEST=GetMap&FORMAT=image%2Fpng&TRANSPARENT=true&LAYERS=urn:eop:DLR:CDEM30:Copernicus_DSM_10_N41_00_E047_00:V9577__contours&STYLES=blackbody&WIDTH=800&HEIGHT=800&SRS=EPSG%3A4326&BBOX=47.0%2C41.0%2C48.0%2C42.0`):
![wms_contour](/uploads/9e994584e8ee4e8ca25227064ea9c566/wms_contour.png)
Could it be an issue with cache ??