Introduction
This repository holds the configuration of the PRISM View Server (PVS).
The present README.md holds the architecture, conventions, relevant configuration, installation instructions, as well as canonical references.
Architecture
The PRISM View Server (PVS) uses various Docker images whereas core
,
cache
, client
, ingestor
, fluentd
and preprocessor
are build from this repository and
the others are pulled from docker hub.
Prerequisites
Object Storage (OBS)
Access keys to store preprocessed items and caches used by all services.
Access key to input items used by preprocessor.
Networks
One internal and one external network per stack.
Volumes
In base stack
- traefik-data
In logging stack
- logging_es-data
Per collection
- db-data used by database
- redis-data used by redis
- instance-data used by registrar and renderer
- report-data sftp output of reporting interface
- from-fepd - sftp input to ingestor
Services
The following services are defined via docker compose files.
reverse-proxy
- based on the external traefik image
- data stored in local volume on swarm master
- reads swarm changes from /var/run/docker.sock on swarm master
- provides the endpoint for external access
- configured via docker labels
shibauth
- based on the external unicon/shibboleth-sp:3.0.4 Apache + Shibboleth SP3 image
- provides authentication and authorization via SAML2
- docker configuration files set access control rules
- traefik labels determine which services are protected via Shib
database
- based on external postgis:10 image
- DB stored in local volume on swarm master
- provides database to all other services
redis
- based on external redis image
- data stored in local volume on swarm master
- holds these keys
- preprocessing
- preprocess-md_queue
- holds metadata in json including object path for image to be preprocessed
-
lpush
by ingestor or manually -
brpop
by preprocessor
- preprocess_queue
- holds items (tar object path) to be preprocessed
-
lpush
by ingestor or manually -
brpop
by preprocessor
- preprocessing_set
- holds ids for currently preprocessed items
-
sadd
by preprocessor
- preprocess-success_set
- holds ids for successfully preprocessed items
-
sadd
by preprocessor
- preprocess-failure_set
- holds ids for failed preprocessed items
-
sadd
by preprocessor
- preprocess-md_queue
- registration
- register_queue
- holds items (metadata and data objects prefix - same as tar object path above) to be registered
-
lpush
by preprocessor or manually -
brpop
by registrar
- registering_set
- holds ids for currently registered items
-
sadd
by registrar
- register-success_set
- holds ids for successfully registered items
-
sadd
by registrar
- register-failure_set
- holds ids for failed registered items
-
sadd
by registrar
- register_queue
- seeding
- seed_queue
- time intervals to pre-seed
-
lpush
by registrar or manually -
brpop
by seeder
- seed-success_set
- seed-failure_set
- seed_queue
- preprocessing
ingestor
- based on ingestor image
- by default a flask app listening on
/
endpoint forPOST
requests with reports - or can be overriden to be used as inotify watcher on a configured folder for new appearance of reports
- accepts browse reports with references to images on Swift
- extracts the browse metadata (id, time, footprint, image reference)
-
lpush
metadata into apreprocess-md_queue
TODO: seeder
- based on cache image
- connects to DB
-
brpop
time interval from seed_queue - for each seed time and extent from DB
- pre-seed using renderer
preprocessor
- based on preprocessor image (GDAL 3.1)
- connects to OBS
-
brpop
item from preprocess_queue or preprocess-md_queue-
sadd
to preprocessing_set - downloads image or package from OBS
- translates to COG
- translates to GSC if needed
- uploads COG & GSC to OBS
- adds item (metadata and data object paths) to register_queue
-
sadd
to preprocess-{success|failure}_set -
srem
from preprocessing_set
-
registrar
- based on core image
- connects to OBS & database
- uses instance-data volume
-
brpop
item from register_queue-
sadd
... - register in DB
- (optional) store time:start/time:end in seed_queue
-
sadd/srem
...
-
cache
- based on cache image
- connects to OBS & database
- provides external service for WMS & WMTS
- either serves WMTS/WMS requests from cache or retrieves on-demand from renderer to store in cache and serve
renderer
- based on core image
- connects to OBS & database
- provides external service for OpenSearch, WMS, & WCS
- renders WMS requests received from cache or seeder
logging stack
- uses elasticsearch:7.9 & kibana:7.9 external images
- fluentd image is build and published to registry because of additional plugins
- ES data stored in local volume on swarm master
- external access allowed to kibana through traefik
- log parsing enabled for cache and core
sftp
- uses external atmoz/sftp image
- provides sftp access to two volumes for report exchange on registration result xmls and ingest requirement xmls
- accessible on swarm master on port 2222-22xx
- credentials supplied via config
Usage
Test locally using docker swarm
Initialize swarm & stack:
docker swarm init # initialize swarm
Build images: Note we use dev tag for local development, so images need to be built locally
docker build core/ -t registry.gitlab.eox.at/esa/prism/vs/pvs_core:dev
docker build cache/ -t registry.gitlab.eox.at/esa/prism/vs/pvs_cache:dev
docker build preprocessor/ -t registry.gitlab.eox.at/esa/prism/vs/pvs_preprocessor:dev
docker build client/ -t registry.gitlab.eox.at/esa/prism/vs/pvs_client:dev
docker build fluentd/ -t registry.gitlab.eox.at/esa/prism/vs/fluentd:dev
docker build ingestor/ -t registry.gitlab.eox.at/esa/prism/vs/pvs_ingestor:dev
docker build sftp/ -t registry.gitlab.eox.at/esa/prism/vs/pvs_sftp:dev
For production deployment, as registry is open to public, this part is done by a later step Deploy the stack in production
as it will pull necessary images automatically.
Create external network for stack to run:
docker network create -d overlay vhr18-extnet
docker network create -d overlay emg-extnet
docker network create -d overlay dem-extnet
Add following .env files with credentials to the cloned copy of the repository /env folder: vhr18_db.env
, vhr18_obs.env
, vhr18_django.env
.
create docker secrets:
Sensitive environment variables are not included in the .env files, and must be generated as docker secrets. All stacks currently share these secret names, therefore it must stay the same for all stacks. The same goes for sftp configuration values, To create docker secrets, and configs run:
# secret creation
# replace the "<variable>" with the value of the secret
printf "<OS_PASSWORD_DOWNLOAD>" | docker secret create OS_PASSWORD_DOWNLOAD -
printf "<DJANGO_PASSWORD>" | docker secret create DJANGO_PASSWORD -
printf "<OS_PASSWORD>" | docker secret create OS_PASSWORD -
# configs creation
printf "<user>:<password>:<UID>:<GID>" | docker config create sftp_users_<name> -
# for production base stack deployment, additional basic authentication credentials list need to be created
# format of such a list used by traefik are username:hashedpassword (MD5, SHA1, BCrypt)
sudo apt-get install apache2-utils
htpasswd -nb <username> <password> >> auth_list.txt
docker secret create BASIC_AUTH_USERS_AUTH auth_list.txt
docker secret create BASIC_AUTH_USERS_APIAUTH auth_list_api.txt
In case shibauth service will be used, for production deployment, two more secrets need to be created for each stack, where shibauth is deployed. These ensure that the SP is recognized and its identity confirmed by the IDP. They are configured as stack-name-capitalized_SHIB_KEY and stack-name-capitalized_SHIB_CERT. In order to create them, use the attached keygen.sh command-line tool in /config folder.
SPURL="https://emg.pass.copernicus.eu" # service initial access point made accessible by traefik
./config/keygen.sh -h $SPURL -y 20 -e https://$SPURL/shibboleth -n sp-signing -f
docker secret create EMG_SHIB_CERT sp-signing-cert.pem
docker secret create EMG_SHIB_KEY sp-signing-key.pem
Additionally a docker config idp-metadata
containing the metadata of the used IDP needs to be added:
docker config create idp_metadata idp-metadata-received.xml
Deploy the stack in dev environment:
docker stack deploy -c docker-compose.vhr18.yml -c docker-compose.vhr18.dev.yml -c docker-compose.logging.yml -c docker-compose.logging.dev.yml vhr18-pvs # start VHR_IMAGE_2018 stack in dev mode, for example to use local sources
docker stack deploy -c docker-compose.emg.yml -c docker-compose.emg.dev.yml -c docker-compose.logging.yml -c docker-compose.logging.dev.yml emg-pvs # start Emergency stack in dev mode, for example to use local sources
Deploy base & logging stack in production environment:
docker stack deploy -c docker-compose.base.ops.yml base-pvs
docker stack deploy -c docker-compose.logging.yml -c docker-compose.logging.ops.yml logging
Deploy the stack in production environment:
Please note that in order to reuse existing database volumes, needs to be the same. Here we use vhr18-pvs
but in operational service vhr18-pdas
is used.
docker stack deploy -c docker-compose.vhr18.yml -c docker-compose.vhr18.ops.yml vhr18-pvs
First steps:
# To register first data, use the following command inside the registrar container:
UPLOAD_CONTAINER=<product_bucket_name> && python3 registrar.py --objects-prefix <product_object_storage_item_prefix>
# To see the catalog opensearch response in the attached web client, a browser CORS extension needs to be turned on.
Tear town stack including data:
docker stack rm vhr18-pvs # stop stack
docker volume rm vhr18-pvs_db-data # delete volumes
docker volume rm vhr18-pvs_redis-data
docker volume rm vhr18-pvs_traefik-data
docker volume rm vhr18-pvs_instance-data
Setup logging
To access the logs, navigate to http://localhost:5601 . Ignore all of the fancy enterprise capabilities and select Kibana > Discover in the hamburger menu.
On first run, you need to define an index pattern to select the data source for kibana in elastic search.
Since we only have fluentd, you can just use *
as index pattern.
Select @timestamp
as time field
(see also).
Example of a kibana query to discover logs of a single service:
https://<kibana-url>/app/discover#/?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-15m,to:now))&_a=(columns:!(path,size,code,log),filters:!(),index:<index-id>,interval:auto,query:(language:kuery,query:'%20container_name:%20"<service-name>"'),sort:!())
Development service stacks keep their logging to stdout/stderr unless logging
dev stack is used.
On production machine, fluentd
is set as a logging driver for docker daemon by modifying /etc/docker/daemon.json
to
{
"log-driver": "fluentd",
"log-opts": {
"fluentd-sub-second-precision": "true"
}
}
setup sftp
The SFTP
image allow remote access into 2 logging folders, you can define (edit/add) users, passwords and (UID/GID) using docker config create
mentioned above.
In the below example the username is eox
, once the stack is deployed you can sftp into the logging folders through port 2222 (for vhr18
, emg
and dem
have 2223 and 2224 respectively) if you are running the dev stack localhost :
sftp -P 2222 eox@127.0.0.1
You will log in into/home/eox/data
directory which contains the 2 logging directories : to/panda
and from/fepd
NOTE: The mounted directory that you are directed into is /home/user
, where user
is the username, hence when setting / editing the username in configs, the sftp
mounted volumes path in docker-compose.<collection>.yml
must change respectively.
Documentation
Installation
python3 -m pip install sphinx recommonmark sphinx-autobuild
Generate html and synchronize with client/html/user-guide
make html
# For watched html automatic building
make html-watch
# For pdf output and sync it to client/html/
make latexpdf
# To shrink size of pdf
gs -sDEVICE=pdfwrite -dPDFSETTINGS=/ebook -dPrinted=false -q -o View-Server_-_User-Guide_small.pdf View-Server_-_User-Guide.pdf
# make latexpdf and make html combined
make build
The documentation is generated in the respective _build/html directory.
Create software releases
Release a new vs version
We use bump2version to increment versions of invividual docker images and create git tags. Tags after push trigger CI docker push
action of versioned images. It also updates used image versions in .ops
docker compose files.
Pushing to master
branch updates latest
images, while staging
branch push updates staging
images.
For versions in general, we use semantic versioning with format {major}.{minor}.{patch}-{release}.{build}.
First check deployed staging version on staging platform (TBD), then if no problems are found, proceed.
Following operation should be done on staging
or master
branch.
bump2version <major/minor/patch/release/build>
git push
git push --tags
If it was done on staging
branch, then it should be merged to master
, unless only a patch to previous major versions is made.
A hotfix to production is developed in a branch initiated from master, then merged to staging for verification. It is then merged to master for release.
Source code release
Create a TAR from source code:
git archive --prefix release-1.0.0/ -o release-1.0.0.tar.gz -9 master
Save Docker images:
docker save -o pvs_core.tar registry.gitlab.eox.at/esa/prism/vs/pvs_core
docker save -o pvs_cache.tar registry.gitlab.eox.at/esa/prism/vs/pvs_cache
docker save -o pvs_preprocessor.tar registry.gitlab.eox.at/esa/prism/vs/pvs_preprocessor
docker save -o pvs_client.tar registry.gitlab.eox.at/esa/prism/vs/pvs_client
docker save -o pvs_ingestor.tar registry.gitlab.eox.at/esa/prism/vs/pvs_ingestor
docker save -o fluentd.tar registry.gitlab.eox.at/esa/prism/vs/fluentd