EOX GitLab Instance

Skip to content
Snippets Groups Projects
README.md 14.4 KiB
Newer Older
Stephan's avatar
Stephan committed
# Introduction
Stephan's avatar
Stephan committed

Stephan's avatar
Stephan committed
This repository holds the configuration of the PRISM View Server (PVS).

The present README.md holds the architecture, conventions, relevant
configuration, installation instructions, as well as canonical references.

# Architecture

The PRISM View Server (PVS) uses various Docker images whereas `core`,
Lubomir Dolezal's avatar
Lubomir Dolezal committed
`cache`, `client`, `ingestor`, `fluentd` and `preprocessor` are build from this repository and
Stephan's avatar
Stephan committed
the others are pulled from docker hub.

## Prerequisites

### Object Storage (OBS)

Access keys to store preprocessed items and caches used by all services.

Access key to input items used by preprocessor.

## Networks

One internal and one external network per stack.

## Volumes

In base stack

* traefik-data

Lubomir Dolezal's avatar
Lubomir Dolezal committed
In logging stack

* logging_es-data

Stephan's avatar
Stephan committed
Per collection

* db-data used by database
* redis-data used by redis
* instance-data used by registrar and renderer
Lubomir Dolezal's avatar
Lubomir Dolezal committed
* report-data sftp output of reporting interface
* from-fepd - sftp input to **ingestor**
Stephan's avatar
Stephan committed

## Services

The following services are defined via docker compose files.

### reverse-proxy

* based on the external traefik image
* data stored in local volume on swarm master
* reads swarm changes from /var/run/docker.sock on swarm master
* provides the endpoint for external access
* configured via docker labels

* based on the external unicon/shibboleth-sp:3.0.4 Apache + Shibboleth SP3 image
* provides authentication and authorization via SAML2
* docker configuration files set access control rules
* traefik labels determine which services are protected via Shib

Stephan's avatar
Stephan committed
### database

* based on external postgis:10 image
* DB stored in local volume on swarm master
* provides database to all other services

### redis

* based on external redis image
* data stored in local volume on swarm master
* holds these keys
    * preprocessing
        * preprocess-md_queue
            * holds metadata in json including object path for image to be preprocessed
            * `lpush` by ingestor or manually
            * `brpop` by preprocessor
        * preprocess_queue
            * holds items (tar object path) to be preprocessed
            * `lpush` by ingestor or manually
            * `brpop` by preprocessor
        * preprocessing_set
            * holds ids for currently preprocessed items
            * `sadd` by preprocessor
        * preprocess-success_set
            * holds ids for successfully preprocessed items
            * `sadd` by preprocessor
        * preprocess-failure_set
            * holds ids for failed preprocessed items
            * `sadd` by preprocessor
    * registration
        * register_queue
            * holds items (metadata and data objects prefix - same as tar object path above) to be registered
            * `lpush` by preprocessor or manually
            * `brpop` by registrar
        * registering_set
            * holds ids for currently registered items
            * `sadd` by registrar
        * register-success_set
            * holds ids for successfully registered items
            * `sadd` by registrar
        * register-failure_set
            * holds ids for failed registered items
Stephan's avatar
Stephan committed
            * `sadd` by registrar
    * seeding
        * seed_queue
            * time intervals to pre-seed
            * `lpush` by registrar or manually
            * `brpop` by seeder
        * seed-success_set
        * seed-failure_set

Lubomir Dolezal's avatar
Lubomir Dolezal committed
### ingestor
Lubomir Dolezal's avatar
Lubomir Dolezal committed
* based on ingestor image
* by default a flask app listening on `/` endpoint for `POST` requests with reports
* or can be overriden to be used as inotify watcher on a configured folder for new appearance of reports
* accepts browse reports with references to images on Swift
* extracts the browse metadata (id, time, footprint, image reference)
* `lpush` metadata into a `preprocess-md_queue`
Stephan's avatar
Stephan committed

### TODO: seeder

* based on cache image
* connects to DB
* `brpop` time interval from seed_queue
* for each seed time and extent from DB
    * pre-seed using renderer

### preprocessor

* based on preprocessor image (GDAL 3.1)
* connects to OBS
* `brpop` item from preprocess_queue or preprocess-md_queue
    * `sadd` to preprocessing_set
    * downloads image or package from OBS
    * translates to COG
    * translates to GSC if needed
    * uploads COG & GSC to OBS
    * adds item (metadata and data object paths) to register_queue
    * `sadd` to preprocess-{success|failure}\_set
    * `srem` from preprocessing_set

### registrar

* based on core image
* connects to OBS & database
* uses instance-data volume
* `brpop` item from register_queue
    * `sadd` ...
    * register in DB
    * (optional) store time:start/time:end in seed_queue
    * `sadd/srem` ...

### cache

* based on cache image
* connects to OBS & database
* provides external service for WMS & WMTS
* either serves WMTS/WMS requests from cache or retrieves on-demand from
  renderer to store in cache and serve

### renderer

* based on core image
* connects to OBS & database
* provides external service for OpenSearch, WMS, & WCS
* renders WMS requests received from cache or seeder

Lubomir Dolezal's avatar
Lubomir Dolezal committed
### logging stack
Lubomir Dolezal's avatar
Lubomir Dolezal committed
* uses elasticsearch:7.9 & kibana:7.9 external images
* fluentd image is build and published to registry because of additional plugins
* ES data stored in local volume on swarm master
* external access allowed to kibana through traefik
* log parsing enabled for cache and core

### sftp
Lubomir Dolezal's avatar
Lubomir Dolezal committed
* uses external atmoz/sftp image
* provides sftp access to two volumes for report exchange on registration result xmls and ingest requirement xmls
Lubomir Dolezal's avatar
Lubomir Dolezal committed
* accessible on swarm master on port 2222-22xx
Lubomir Dolezal's avatar
Lubomir Dolezal committed
* credentials supplied via config
Stephan's avatar
Stephan committed

# Usage

## Test locally using docker swarm

Initialize swarm & stack:

```bash
docker swarm init                               # initialize swarm
Note we use **dev** tag for local development, so images need to be built locally
docker build core/ -t registry.gitlab.eox.at/esa/prism/vs/pvs_core:dev
docker build cache/ -t registry.gitlab.eox.at/esa/prism/vs/pvs_cache:dev
docker build preprocessor/ -t registry.gitlab.eox.at/esa/prism/vs/pvs_preprocessor:dev
docker build client/ -t registry.gitlab.eox.at/esa/prism/vs/pvs_client:dev
docker build fluentd/ -t registry.gitlab.eox.at/esa/prism/vs/fluentd:dev
docker build ingestor/ -t registry.gitlab.eox.at/esa/prism/vs/pvs_ingestor:dev
For production deployment, as registry is open to public, this part is done by a later step `Deploy the stack in production` as it will pull necessary images automatically.

Lubomir Dolezal's avatar
Lubomir Dolezal committed
Create external network for stack to run:
```
docker network create -d overlay vhr18-extnet
docker network create -d overlay emg-extnet
Lubomir Dolezal's avatar
Lubomir Dolezal committed
docker network create -d overlay dem-extnet
Lubomir Dolezal's avatar
Lubomir Dolezal committed
Add following .env files with credentials to the cloned copy of the repository /env folder: `vhr18_db.env`, `vhr18_obs.env`, `vhr18_django.env`.
Mussab Abdalla's avatar
Mussab Abdalla committed
create docker secrets:

Sensitive environment variables are not included in the .env files, and must be generated as docker secrets. All stacks currently share these secret names, therefore it must stay the same for all stacks. The same goes for sftp configuration values, To create docker secrets, and configs run:
Mussab Abdalla's avatar
Mussab Abdalla committed
```bash
# secret creation
Mussab Abdalla's avatar
Mussab Abdalla committed
# replace the "<variable>" with the value of the secret
printf "<OS_PASSWORD_DOWNLOAD>" | docker secret create OS_PASSWORD_DOWNLOAD -
printf "<DJANGO_PASSWORD>" | docker secret create DJANGO_PASSWORD -
printf "<OS_PASSWORD>" | docker secret create OS_PASSWORD -

# configs creation
Lubomir Dolezal's avatar
Lubomir Dolezal committed
printf "<user>:<password>:<UID>:<GID>" | docker config create sftp_users_<name> -
# for production base stack deployment, additional basic authentication credentials list need to be created
# format of such a list used by traefik are username:hashedpassword (MD5, SHA1, BCrypt)
sudo apt-get install apache2-utils
htpasswd -nb <username> <password> >> auth_list.txt
docker secret create BASIC_AUTH_USERS_AUTH auth_list.txt
docker secret create BASIC_AUTH_USERS_APIAUTH auth_list_api.txt
Mussab Abdalla's avatar
Mussab Abdalla committed
```

In case **shibauth** service will be used, for production deployment, two more secrets need to be created for each stack, where **shibauth** is deployed. These ensure that the SP is recognized and its identity confirmed by the IDP. They are configured as **stack-name-capitalized_SHIB_KEY** and **stack-name-capitalized_SHIB_CERT**. In order to create them, use the attached **keygen.sh** command-line tool in */config* folder. 
```bash
SPURL="https://emg.pass.copernicus.eu" # service initial access point made accessible by traefik
./config/keygen.sh -h $SPURL -y 20 -e https://$SPURL/shibboleth -n sp-signing -f
docker secret create EMG_SHIB_CERT sp-signing-cert.pem 
docker secret create EMG_SHIB_KEY sp-signing-key.pem 
```
Additionally a docker config `idp-metadata` containing the metadata of the used IDP needs to be added:
```bash
docker config create idp_metadata idp-metadata-received.xml 
```

Lubomir Dolezal's avatar
Lubomir Dolezal committed
Deploy the stack in dev environment:
docker stack deploy -c docker-compose.vhr18.yml -c docker-compose.vhr18.dev.yml -c docker-compose.logging.yml -c docker-compose.logging.dev.yml vhr18-pvs  # start VHR_IMAGE_2018 stack in dev mode, for example to use local sources
Lubomir Dolezal's avatar
Lubomir Dolezal committed
docker stack deploy -c docker-compose.emg.yml -c docker-compose.emg.dev.yml -c docker-compose.logging.yml -c docker-compose.logging.dev.yml emg-pvs # start Emergency stack in dev mode, for example to use local sources
```
Deploy base & logging stack in production environment:
Lubomir Dolezal's avatar
Lubomir Dolezal committed
```
docker stack deploy -c docker-compose.base.ops.yml base-pvs
Lubomir Dolezal's avatar
Lubomir Dolezal committed
docker stack deploy -c docker-compose.logging.yml -c docker-compose.logging.ops.yml logging
Deploy the stack in production environment:
Please note that in order to reuse existing database volumes, <stack-name> needs to be the same. Here we use `vhr18-pvs` but in operational service `vhr18-pdas` is used.
```
docker stack deploy -c docker-compose.vhr18.yml -c docker-compose.vhr18.ops.yml vhr18-pvs
Lubomir Dolezal's avatar
Lubomir Dolezal committed
First steps:
Lubomir Dolezal's avatar
Lubomir Dolezal committed
# To register first data, use the following command inside the registrar container:
UPLOAD_CONTAINER=<product_bucket_name> && python3 registrar.py --objects-prefix <product_object_storage_item_prefix>
# To see the catalog opensearch response in the attached web client, a browser CORS extension needs to be turned on.
Lubomir Dolezal's avatar
Lubomir Dolezal committed
```
Tear town stack including data:

```bash
docker stack rm vhr18-pvs                      # stop stack
docker volume rm vhr18-pvs_db-data                        # delete volumes
docker volume rm vhr18-pvs_redis-data
docker volume rm vhr18-pvs_traefik-data
docker volume rm vhr18-pvs_instance-data
### Setup logging
To access the logs, navigate to http://localhost:5601 . Ignore all of the fancy enterprise capabilities and select Kibana > Discover in the hamburger menu.

On first run, you need to define an index pattern to select the data source for kibana in elastic search.
Since we only have fluentd, you can just use `*` as index pattern.
Select `@timestamp` as time field
([see also](https://www.elastic.co/guide/en/kibana/current/tutorial-define-index.html)).
Lubomir Dolezal's avatar
Lubomir Dolezal committed
Example of a kibana query to discover logs of a single service: 
```
https://<kibana-url>/app/discover#/?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-15m,to:now))&_a=(columns:!(path,size,code,log),filters:!(),index:<index-id>,interval:auto,query:(language:kuery,query:'%20container_name:%20"<service-name>"'),sort:!())
```
Development service stacks keep their logging to stdout/stderr unless `logging` dev stack is used.
On production machine, `fluentd` is set as a logging driver for docker daemon by modifying `/etc/docker/daemon.json` to 
```
{
    "log-driver": "fluentd",
    "log-opts": {
        "fluentd-sub-second-precision": "true"
    }
}
```
Mussab Abdalla's avatar
Mussab Abdalla committed
### setup sftp

The `SFTP` image allow remote access into 2 logging folders, you can define (edit/add) users, passwords and (UID/GID) using `docker config create` mentioned above.
Mussab Abdalla's avatar
Mussab Abdalla committed

Lubomir Dolezal's avatar
Lubomir Dolezal committed
In the below example the username is `eox`, once the stack is deployed you can sftp into the logging folders through port 2222 (for ``vhr18``, ``emg`` and ``dem`` have 2223 and 2224 respectively) if you are running the dev stack localhost :
Mussab Abdalla's avatar
Mussab Abdalla committed

```bash
sftp -P 2222 eox@127.0.0.1
``` 
You will log in  into`/home/eox/data` directory which contains the 2 logging directories : `to/panda` and `from/fepd`

 **NOTE:**  The mounted directory that you are directed into is *`/home/user`*, where `user` is the username, hence when setting / editing  the username in configs, the `sftp` mounted volumes path in `docker-compose.<collection>.yml` must change respectively.
Mussab Abdalla's avatar
Mussab Abdalla committed
 
Stephan's avatar
Stephan committed
# Documentation

## Installation

```bash
Lubomir Dolezal's avatar
Lubomir Dolezal committed
python3 -m pip install sphinx recommonmark sphinx-autobuild
## Generate html and synchronize with client/html/user-guide
Stephan's avatar
Stephan committed

```bash
make html

# For watched html automatic building
make html-watch

# For pdf output and sync it to client/html/
Stephan's avatar
Stephan committed
make latexpdf
Stephan's avatar
Stephan committed
# To shrink size of pdf
gs -sDEVICE=pdfwrite -dPDFSETTINGS=/ebook -dPrinted=false -q -o View-Server_-_User-Guide_small.pdf View-Server_-_User-Guide.pdf
# make latexpdf and make html combined
make build
Stephan's avatar
Stephan committed
```

The documentation is generated in the respective *_build/html* directory.
# Create software releases

## Release a new vs version

We use [bump2version](https://github.com/c4urself/bump2version) to increment versions of invividual docker images and create git tags. Tags after push trigger CI `docker push` action of versioned images. It also updates used image versions in `.ops` docker compose files.
Lubomir Dolezal's avatar
Lubomir Dolezal committed

Pushing to `master` branch updates `latest` images, while `staging` branch push updates `staging` images.
Lubomir Dolezal's avatar
Lubomir Dolezal committed
For **versions** in general, we use semantic versioning with format {major}.{minor}.{patch}-{release}.{build}.
First check deployed staging version on staging platform (TBD), then if no problems are found, proceed.
Lubomir Dolezal's avatar
Lubomir Dolezal committed
Following operation should be done on `staging` or `master` branch.
```
bump2version <major/minor/patch/release/build>
git push
git push --tags
```
Lubomir Dolezal's avatar
Lubomir Dolezal committed
If it was done on `staging` branch, then it should be merged to `master`, unless only a patch to previous major versions is made.
A hotfix to production is developed in a branch initiated from master, then merged to staging for verification. It is then merged to master for release.
## Source code release

Create a TAR from source code:

```bash
git archive --prefix release-1.0.0/ -o release-1.0.0.tar.gz -9 master
```

Save Docker images:

```bash
docker save -o pvs_core.tar registry.gitlab.eox.at/esa/prism/vs/pvs_core
docker save -o pvs_cache.tar registry.gitlab.eox.at/esa/prism/vs/pvs_cache
docker save -o pvs_preprocessor.tar registry.gitlab.eox.at/esa/prism/vs/pvs_preprocessor
docker save -o pvs_client.tar registry.gitlab.eox.at/esa/prism/vs/pvs_client
Lubomir Dolezal's avatar
Lubomir Dolezal committed
docker save -o pvs_ingestor.tar registry.gitlab.eox.at/esa/prism/vs/pvs_ingestor
docker save -o fluentd.tar registry.gitlab.eox.at/esa/prism/vs/fluentd