# Introduction This repository holds the configuration of the PRISM View Server (PVS). The present README.md holds the architecture, conventions, relevant configuration, installation instructions, as well as canonical references. # Architecture The PRISM View Server (PVS) uses various Docker images whereas `core`, `cache`, `client`, `ingestor`, `fluentd` and `preprocessor` are build from this repository and the others are pulled from docker hub. ## Prerequisites ### Object Storage (OBS) Access keys to store preprocessed items and caches used by all services. Access key to input items used by preprocessor. ## Networks One internal and one external network per stack. ## Volumes In base stack * traefik-data In logging stack * logging_es-data Per collection * db-data used by database * redis-data used by redis * instance-data used by registrar and renderer * report-data sftp output of reporting interface * from-fepd - sftp input to **ingestor** ## Services The following services are defined via docker compose files. ### reverse-proxy * based on the external traefik image * data stored in local volume on swarm master * reads swarm changes from /var/run/docker.sock on swarm master * provides the endpoint for external access * configured via docker labels ### shibauth * based on the external unicon/shibboleth-sp:3.0.4 Apache + Shibboleth SP3 image * provides authentication and authorization via SAML2 * docker configuration files set access control rules * traefik labels determine which services are protected via Shib ### database * based on external postgis:10 image * DB stored in local volume on swarm master * provides database to all other services ### redis * based on external redis image * data stored in local volume on swarm master * holds these keys * preprocessing * preprocess-md_queue * holds metadata in json including object path for image to be preprocessed * `lpush` by ingestor or manually * `brpop` by preprocessor * preprocess_queue * holds items (tar object path) to be preprocessed * `lpush` by ingestor or manually * `brpop` by preprocessor * preprocessing_set * holds ids for currently preprocessed items * `sadd` by preprocessor * preprocess-success_set * holds ids for successfully preprocessed items * `sadd` by preprocessor * preprocess-failure_set * holds ids for failed preprocessed items * `sadd` by preprocessor * registration * register_queue * holds items (metadata and data objects prefix - same as tar object path above) to be registered * `lpush` by preprocessor or manually * `brpop` by registrar * registering_set * holds ids for currently registered items * `sadd` by registrar * register-success_set * holds ids for successfully registered items * `sadd` by registrar * register-failure_set * holds ids for failed registered items * `sadd` by registrar * seeding * seed_queue * time intervals to pre-seed * `lpush` by registrar or manually * `brpop` by seeder * seed-success_set * seed-failure_set ### ingestor * based on ingestor image * by default a flask app listening on `/` endpoint for `POST` requests with reports * or can be overriden to be used as inotify watcher on a configured folder for new appearance of reports * accepts browse reports with references to images on Swift * extracts the browse metadata (id, time, footprint, image reference) * `lpush` metadata into a `preprocess-md_queue` ### TODO: seeder * based on cache image * connects to DB * `brpop` time interval from seed_queue * for each seed time and extent from DB * pre-seed using renderer ### preprocessor * based on preprocessor image (GDAL 3.1) * connects to OBS * `brpop` item from preprocess_queue or preprocess-md_queue * `sadd` to preprocessing_set * downloads image or package from OBS * translates to COG * translates to GSC if needed * uploads COG & GSC to OBS * adds item (metadata and data object paths) to register_queue * `sadd` to preprocess-{success|failure}\_set * `srem` from preprocessing_set ### registrar * based on core image * connects to OBS & database * uses instance-data volume * `brpop` item from register_queue * `sadd` ... * register in DB * (optional) store time:start/time:end in seed_queue * `sadd/srem` ... ### cache * based on cache image * connects to OBS & database * provides external service for WMS & WMTS * either serves WMTS/WMS requests from cache or retrieves on-demand from renderer to store in cache and serve ### renderer * based on core image * connects to OBS & database * provides external service for OpenSearch, WMS, & WCS * renders WMS requests received from cache or seeder ### logging stack * uses elasticsearch:7.9 & kibana:7.9 external images * fluentd image is build and published to registry because of additional plugins * ES data stored in local volume on swarm master * external access allowed to kibana through traefik * log parsing enabled for cache and core ### sftp * uses external atmoz/sftp image * provides sftp access to two volumes for report exchange on registration result xmls and ingest requirement xmls * accessible on swarm master on port 2222-22xx * credentials supplied via config # Usage ## Test locally using docker swarm Initialize swarm & stack: ```bash docker swarm init # initialize swarm ``` Build images: Note we use **dev** tag for local development, so images need to be built locally ``` docker build core/ -t registry.gitlab.eox.at/esa/prism/vs/pvs_core:dev docker build cache/ -t registry.gitlab.eox.at/esa/prism/vs/pvs_cache:dev docker build preprocessor/ -t registry.gitlab.eox.at/esa/prism/vs/pvs_preprocessor:dev docker build client/ -t registry.gitlab.eox.at/esa/prism/vs/pvs_client:dev docker build fluentd/ -t registry.gitlab.eox.at/esa/prism/vs/fluentd:dev docker build ingestor/ -t registry.gitlab.eox.at/esa/prism/vs/pvs_ingestor:dev ``` For production deployment, as registry is open to public, this part is done by a later step `Deploy the stack in production` as it will pull necessary images automatically. Create external network for stack to run: ``` docker network create -d overlay vhr18-extnet docker network create -d overlay emg-extnet docker network create -d overlay dem-extnet ``` Add following .env files with credentials to the cloned copy of the repository /env folder: `vhr18_db.env`, `vhr18_obs.env`, `vhr18_django.env`. create docker secrets: Sensitive environment variables are not included in the .env files, and must be generated as docker secrets. All stacks currently share these secret names, therefore it must stay the same for all stacks. The same goes for sftp configuration values, To create docker secrets, and configs run: ```bash # secret creation # replace the "<variable>" with the value of the secret printf "<OS_PASSWORD_DOWNLOAD>" | docker secret create OS_PASSWORD_DOWNLOAD - printf "<DJANGO_PASSWORD>" | docker secret create DJANGO_PASSWORD - printf "<OS_PASSWORD>" | docker secret create OS_PASSWORD - # configs creation printf "<user>:<password>:<UID>:<GID>" | docker config create sftp_users_<name> - # for production base stack deployment, additional basic authentication credentials list need to be created # format of such a list used by traefik are username:hashedpassword (MD5, SHA1, BCrypt) sudo apt-get install apache2-utils htpasswd -nb <username> <password> >> auth_list.txt docker secret create BASIC_AUTH_USERS_AUTH auth_list.txt docker secret create BASIC_AUTH_USERS_APIAUTH auth_list_api.txt ``` In case **shibauth** service will be used, for production deployment, two more secrets need to be created for each stack, where **shibauth** is deployed. These ensure that the SP is recognized and its identity confirmed by the IDP. They are configured as **stack-name-capitalized_SHIB_KEY** and **stack-name-capitalized_SHIB_CERT**. In order to create them, use the attached **keygen.sh** command-line tool in */config* folder. ```bash SPURL="https://emg.pass.copernicus.eu" # service initial access point made accessible by traefik ./config/keygen.sh -h $SPURL -y 20 -e https://$SPURL/shibboleth -n sp-signing -f docker secret create EMG_SHIB_CERT sp-signing-cert.pem docker secret create EMG_SHIB_KEY sp-signing-key.pem ``` Additionally a docker config `idp-metadata` containing the metadata of the used IDP needs to be added: ```bash docker config create idp_metadata idp-metadata-received.xml ``` Deploy the stack in dev environment: ``` docker stack deploy -c docker-compose.vhr18.yml -c docker-compose.vhr18.dev.yml -c docker-compose.logging.yml -c docker-compose.logging.dev.yml vhr18-pvs # start VHR_IMAGE_2018 stack in dev mode, for example to use local sources docker stack deploy -c docker-compose.emg.yml -c docker-compose.emg.dev.yml -c docker-compose.logging.yml -c docker-compose.logging.dev.yml emg-pvs # start Emergency stack in dev mode, for example to use local sources ``` Deploy base & logging stack in production environment: ``` docker stack deploy -c docker-compose.base.ops.yml base-pvs docker stack deploy -c docker-compose.logging.yml -c docker-compose.logging.ops.yml logging ``` Deploy the stack in production environment: Please note that in order to reuse existing database volumes, <stack-name> needs to be the same. Here we use `vhr18-pvs` but in operational service `vhr18-pdas` is used. ``` docker stack deploy -c docker-compose.vhr18.yml -c docker-compose.vhr18.ops.yml vhr18-pvs ``` First steps: ``` # To register first data, use the following command inside the registrar container: UPLOAD_CONTAINER=<product_bucket_name> && python3 registrar.py --objects-prefix <product_object_storage_item_prefix> # To see the catalog opensearch response in the attached web client, a browser CORS extension needs to be turned on. ``` Tear town stack including data: ```bash docker stack rm vhr18-pvs # stop stack docker volume rm vhr18-pvs_db-data # delete volumes docker volume rm vhr18-pvs_redis-data docker volume rm vhr18-pvs_traefik-data docker volume rm vhr18-pvs_instance-data ``` ### Setup logging To access the logs, navigate to http://localhost:5601 . Ignore all of the fancy enterprise capabilities and select Kibana > Discover in the hamburger menu. On first run, you need to define an index pattern to select the data source for kibana in elastic search. Since we only have fluentd, you can just use `*` as index pattern. Select `@timestamp` as time field ([see also](https://www.elastic.co/guide/en/kibana/current/tutorial-define-index.html)). Example of a kibana query to discover logs of a single service: ``` https://<kibana-url>/app/discover#/?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-15m,to:now))&_a=(columns:!(path,size,code,log),filters:!(),index:<index-id>,interval:auto,query:(language:kuery,query:'%20container_name:%20"<service-name>"'),sort:!()) ``` Development service stacks keep their logging to stdout/stderr unless `logging` dev stack is used. On production machine, `fluentd` is set as a logging driver for docker daemon by modifying `/etc/docker/daemon.json` to ``` { "log-driver": "fluentd", "log-opts": { "fluentd-sub-second-precision": "true" } } ``` ### setup sftp The `SFTP` image allow remote access into 2 logging folders, you can define (edit/add) users, passwords and (UID/GID) using `docker config create` mentioned above. In the below example the username is `eox`, once the stack is deployed you can sftp into the logging folders through port 2222 (for ``vhr18``, ``emg`` and ``dem`` have 2223 and 2224 respectively) if you are running the dev stack localhost : ```bash sftp -P 2222 eox@127.0.0.1 ``` You will log in into`/home/eox/data` directory which contains the 2 logging directories : `to/panda` and `from/fepd` **NOTE:** The mounted directory that you are directed into is *`/home/user`*, where `user` is the username, hence when setting / editing the username in configs, the `sftp` mounted volumes path in `docker-compose.<collection>.yml` must change respectively. # Documentation ## Installation ```bash python3 -m pip install sphinx recommonmark sphinx-autobuild ``` ## Generate html and synchronize with client/html/user-guide ```bash make html # For watched html automatic building make html-watch # For pdf output and sync it to client/html/ make latexpdf # To shrink size of pdf gs -sDEVICE=pdfwrite -dPDFSETTINGS=/ebook -dPrinted=false -q -o View-Server_-_User-Guide_small.pdf View-Server_-_User-Guide.pdf # make latexpdf and make html combined make build ``` The documentation is generated in the respective *_build/html* directory. # Create software releases ## Release a new vs version We use [bump2version](https://github.com/c4urself/bump2version) to increment versions of invividual docker images and create git tags. Tags after push trigger CI `docker push` action of versioned images. It also updates used image versions in `.ops` docker compose files. Pushing to `master` branch updates `latest` images, while `staging` branch push updates `staging` images. For **versions** in general, we use semantic versioning with format {major}.{minor}.{patch}-{release}.{build}. First check deployed staging version on staging platform (TBD), then if no problems are found, proceed. Following operation should be done on `staging` or `master` branch. ``` bump2version <major/minor/patch/release/build> git push git push --tags ``` If it was done on `staging` branch, then it should be merged to `master`, unless only a patch to previous major versions is made. A hotfix to production is developed in a branch initiated from master, then merged to staging for verification. It is then merged to master for release. ## Source code release Create a TAR from source code: ```bash git archive --prefix release-1.0.0/ -o release-1.0.0.tar.gz -9 master ``` Save Docker images: ```bash docker save -o pvs_core.tar registry.gitlab.eox.at/esa/prism/vs/pvs_core docker save -o pvs_cache.tar registry.gitlab.eox.at/esa/prism/vs/pvs_cache docker save -o pvs_preprocessor.tar registry.gitlab.eox.at/esa/prism/vs/pvs_preprocessor docker save -o pvs_client.tar registry.gitlab.eox.at/esa/prism/vs/pvs_client docker save -o pvs_ingestor.tar registry.gitlab.eox.at/esa/prism/vs/pvs_ingestor docker save -o fluentd.tar registry.gitlab.eox.at/esa/prism/vs/fluentd ```