EOX GitLab Instance

Skip to content
Snippets Groups Projects

Introduction

This repository holds the configuration of the PRISM View Server (PVS).

The present README.md holds the architecture, conventions, relevant configuration, installation instructions, as well as canonical references.

Architecture

The PRISM View Server (PVS) uses various Docker images whereas core, cache, client, ingestor, fluentd and preprocessor are build from this repository and the others are pulled from docker hub.

Prerequisites

Object Storage (OBS)

Access keys to store preprocessed items and caches used by all services.

Access key to input items used by preprocessor.

Networks

One internal and one external network per stack.

Volumes

In base stack

  • traefik-data

In logging stack

  • logging_es-data

Per collection

  • db-data used by database
  • redis-data used by redis
  • instance-data used by registrar and renderer
  • report-data sftp output of reporting interface
  • from-fepd - sftp input to ingestor

Services

The following services are defined via docker compose files.

reverse-proxy

  • based on the external traefik image
  • data stored in local volume on swarm master
  • reads swarm changes from /var/run/docker.sock on swarm master
  • provides the endpoint for external access
  • configured via docker labels

shibauth

  • based on the external unicon/shibboleth-sp:3.0.4 Apache + Shibboleth SP3 image
  • provides authentication and authorization via SAML2
  • docker configuration files set access control rules
  • traefik labels determine which services are protected via Shib

database

  • based on external postgis:10 image
  • DB stored in local volume on swarm master
  • provides database to all other services

redis

  • based on external redis image
  • data stored in local volume on swarm master
  • holds these keys
    • preprocessing
      • preprocess-md_queue
        • holds metadata in json including object path for image to be preprocessed
        • lpush by ingestor or manually
        • brpop by preprocessor
      • preprocess_queue
        • holds items (tar object path) to be preprocessed
        • lpush by ingestor or manually
        • brpop by preprocessor
      • preprocessing_set
        • holds ids for currently preprocessed items
        • sadd by preprocessor
      • preprocess-success_set
        • holds ids for successfully preprocessed items
        • sadd by preprocessor
      • preprocess-failure_set
        • holds ids for failed preprocessed items
        • sadd by preprocessor
    • registration
      • register_queue
        • holds items (metadata and data objects prefix - same as tar object path above) to be registered
        • lpush by preprocessor or manually
        • brpop by registrar
      • registering_set
        • holds ids for currently registered items
        • sadd by registrar
      • register-success_set
        • holds ids for successfully registered items
        • sadd by registrar
      • register-failure_set
        • holds ids for failed registered items
        • sadd by registrar
    • seeding
      • seed_queue
        • time intervals to pre-seed
        • lpush by registrar or manually
        • brpop by seeder
      • seed-success_set
      • seed-failure_set

ingestor

  • based on ingestor image
  • by default a flask app listening on / endpoint for POST requests with reports
  • or can be overriden to be used as inotify watcher on a configured folder for new appearance of reports
  • accepts browse reports with references to images on Swift
  • extracts the browse metadata (id, time, footprint, image reference)
  • lpush metadata into a preprocess-md_queue

TODO: seeder

  • based on cache image
  • connects to DB
  • brpop time interval from seed_queue
  • for each seed time and extent from DB
    • pre-seed using renderer

preprocessor

  • based on preprocessor image (GDAL 3.1)
  • connects to OBS
  • brpop item from preprocess_queue or preprocess-md_queue
    • sadd to preprocessing_set
    • downloads image or package from OBS
    • translates to COG
    • translates to GSC if needed
    • uploads COG & GSC to OBS
    • adds item (metadata and data object paths) to register_queue
    • sadd to preprocess-{success|failure}_set
    • srem from preprocessing_set

registrar

  • based on core image
  • connects to OBS & database
  • uses instance-data volume
  • brpop item from register_queue
    • sadd ...
    • register in DB
    • (optional) store time:start/time:end in seed_queue
    • sadd/srem ...

cache

  • based on cache image
  • connects to OBS & database
  • provides external service for WMS & WMTS
  • either serves WMTS/WMS requests from cache or retrieves on-demand from renderer to store in cache and serve

renderer

  • based on core image
  • connects to OBS & database
  • provides external service for OpenSearch, WMS, & WCS
  • renders WMS requests received from cache or seeder

logging stack

  • uses elasticsearch:7.9 & kibana:7.9 external images
  • fluentd image is build and published to registry because of additional plugins
  • ES data stored in local volume on swarm master
  • external access allowed to kibana through traefik
  • log parsing enabled for cache and core

sftp

  • uses external atmoz/sftp image
  • provides sftp access to two volumes for report exchange on registration result xmls and ingest requirement xmls
  • accessible on swarm master on port 2222-22xx
  • credentials supplied via config

Usage

Test locally using docker swarm

Initialize swarm & stack:

docker swarm init                               # initialize swarm

Build images: Note we use dev tag for local development, so images need to be built locally

docker build core/ -t registry.gitlab.eox.at/esa/prism/vs/pvs_core:dev
docker build cache/ -t registry.gitlab.eox.at/esa/prism/vs/pvs_cache:dev
docker build preprocessor/ -t registry.gitlab.eox.at/esa/prism/vs/pvs_preprocessor:dev
docker build client/ -t registry.gitlab.eox.at/esa/prism/vs/pvs_client:dev
docker build fluentd/ -t registry.gitlab.eox.at/esa/prism/vs/fluentd:dev
docker build ingestor/ -t registry.gitlab.eox.at/esa/prism/vs/pvs_ingestor:dev
docker build sftp/ -t registry.gitlab.eox.at/esa/prism/vs/pvs_sftp:dev

For production deployment, as registry is open to public, this part is done by a later step Deploy the stack in production as it will pull necessary images automatically.

Create external network for stack to run:

docker network create -d overlay vhr18-extnet
docker network create -d overlay emg-extnet
docker network create -d overlay dem-extnet

Add following .env files with credentials to the cloned copy of the repository /env folder: vhr18_db.env, vhr18_obs.env, vhr18_django.env.

create docker secrets:

Sensitive environment variables are not included in the .env files, and must be generated as docker secrets. All stacks currently share these secret names, therefore it must stay the same for all stacks. The same goes for sftp configuration values, To create docker secrets, and configs run:

# secret creation
# replace the "<variable>" with the value of the secret
printf "<OS_PASSWORD_DOWNLOAD>" | docker secret create OS_PASSWORD_DOWNLOAD -
printf "<DJANGO_PASSWORD>" | docker secret create DJANGO_PASSWORD -
printf "<OS_PASSWORD>" | docker secret create OS_PASSWORD -

# configs creation
printf "<user>:<password>:<UID>:<GID>" | docker config create sftp_users_<name> -
# for production base stack deployment, additional basic authentication credentials list need to be created
# format of such a list used by traefik are username:hashedpassword (MD5, SHA1, BCrypt)
sudo apt-get install apache2-utils
htpasswd -nb <username> <password> >> auth_list.txt
docker secret create BASIC_AUTH_USERS_AUTH auth_list.txt
docker secret create BASIC_AUTH_USERS_APIAUTH auth_list_api.txt

In case shibauth service will be used, for production deployment, two more secrets need to be created for each stack, where shibauth is deployed. These ensure that the SP is recognized and its identity confirmed by the IDP. They are configured as stack-name-capitalized_SHIB_KEY and stack-name-capitalized_SHIB_CERT. In order to create them, use the attached keygen.sh command-line tool in /config folder.

SPURL="https://emg.pass.copernicus.eu" # service initial access point made accessible by traefik
./config/keygen.sh -h $SPURL -y 20 -e https://$SPURL/shibboleth -n sp-signing -f
docker secret create EMG_SHIB_CERT sp-signing-cert.pem 
docker secret create EMG_SHIB_KEY sp-signing-key.pem 

Additionally a docker config idp-metadata containing the metadata of the used IDP needs to be added:

docker config create idp_metadata idp-metadata-received.xml 

Deploy the stack in dev environment:

docker stack deploy -c docker-compose.vhr18.yml -c docker-compose.vhr18.dev.yml -c docker-compose.logging.yml -c docker-compose.logging.dev.yml vhr18-pvs  # start VHR_IMAGE_2018 stack in dev mode, for example to use local sources
docker stack deploy -c docker-compose.emg.yml -c docker-compose.emg.dev.yml -c docker-compose.logging.yml -c docker-compose.logging.dev.yml emg-pvs # start Emergency stack in dev mode, for example to use local sources

Deploy base & logging stack in production environment:

docker stack deploy -c docker-compose.base.ops.yml base-pvs
docker stack deploy -c docker-compose.logging.yml -c docker-compose.logging.ops.yml logging

Deploy the stack in production environment: Please note that in order to reuse existing database volumes, needs to be the same. Here we use vhr18-pvs but in operational service vhr18-pdas is used.

docker stack deploy -c docker-compose.vhr18.yml -c docker-compose.vhr18.ops.yml vhr18-pvs

First steps:

# To register first data, use the following command inside the registrar container:
UPLOAD_CONTAINER=<product_bucket_name> && python3 registrar.py --objects-prefix <product_object_storage_item_prefix>
# To see the catalog opensearch response in the attached web client, a browser CORS extension needs to be turned on.

Tear town stack including data:

docker stack rm vhr18-pvs                      # stop stack
docker volume rm vhr18-pvs_db-data                        # delete volumes
docker volume rm vhr18-pvs_redis-data
docker volume rm vhr18-pvs_traefik-data
docker volume rm vhr18-pvs_instance-data

Setup logging

To access the logs, navigate to http://localhost:5601 . Ignore all of the fancy enterprise capabilities and select Kibana > Discover in the hamburger menu.

On first run, you need to define an index pattern to select the data source for kibana in elastic search. Since we only have fluentd, you can just use * as index pattern. Select @timestamp as time field (see also). Example of a kibana query to discover logs of a single service:

https://<kibana-url>/app/discover#/?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-15m,to:now))&_a=(columns:!(path,size,code,log),filters:!(),index:<index-id>,interval:auto,query:(language:kuery,query:'%20container_name:%20"<service-name>"'),sort:!())

Development service stacks keep their logging to stdout/stderr unless logging dev stack is used. On production machine, fluentd is set as a logging driver for docker daemon by modifying /etc/docker/daemon.json to

{
    "log-driver": "fluentd",
    "log-opts": {
        "fluentd-sub-second-precision": "true"
    }
}

setup sftp

The SFTP image allow remote access into 2 logging folders, you can define (edit/add) users, passwords and (UID/GID) using docker config create mentioned above.

In the below example the username is eox, once the stack is deployed you can sftp into the logging folders through port 2222 (for vhr18, emg and dem have 2223 and 2224 respectively) if you are running the dev stack localhost :

sftp -P 2222 eox@127.0.0.1

You will log in into/home/eox/data directory which contains the 2 logging directories : to/panda and from/fepd

NOTE: The mounted directory that you are directed into is /home/user, where user is the username, hence when setting / editing the username in configs, the sftp mounted volumes path in docker-compose.<collection>.yml must change respectively.

Documentation

Installation

python3 -m pip install sphinx recommonmark sphinx-autobuild

Generate html and synchronize with client/html/user-guide

make html

# For watched html automatic building
make html-watch

# For pdf output and sync it to client/html/
make latexpdf
# To shrink size of pdf
gs -sDEVICE=pdfwrite -dPDFSETTINGS=/ebook -dPrinted=false -q -o View-Server_-_User-Guide_small.pdf View-Server_-_User-Guide.pdf
# make latexpdf and make html combined
make build

The documentation is generated in the respective _build/html directory.

Create software releases

Release a new vs version

We use bump2version to increment versions of invividual docker images and create git tags. Tags after push trigger CI docker push action of versioned images. It also updates used image versions in .ops docker compose files.

Pushing to master branch updates latest images, while staging branch push updates staging images. For versions in general, we use semantic versioning with format {major}.{minor}.{patch}-{release}.{build}. First check deployed staging version on staging platform (TBD), then if no problems are found, proceed. Following operation should be done on staging or master branch.

bump2version <major/minor/patch/release/build>
git push
git push --tags

If it was done on staging branch, then it should be merged to master, unless only a patch to previous major versions is made. A hotfix to production is developed in a branch initiated from master, then merged to staging for verification. It is then merged to master for release.

Source code release

Create a TAR from source code:

git archive --prefix release-1.0.0/ -o release-1.0.0.tar.gz -9 master

Save Docker images:

docker save -o pvs_core.tar registry.gitlab.eox.at/esa/prism/vs/pvs_core
docker save -o pvs_cache.tar registry.gitlab.eox.at/esa/prism/vs/pvs_cache
docker save -o pvs_preprocessor.tar registry.gitlab.eox.at/esa/prism/vs/pvs_preprocessor
docker save -o pvs_client.tar registry.gitlab.eox.at/esa/prism/vs/pvs_client
docker save -o pvs_ingestor.tar registry.gitlab.eox.at/esa/prism/vs/pvs_ingestor
docker save -o fluentd.tar registry.gitlab.eox.at/esa/prism/vs/fluentd