Restructure initialization (!5) · Merge requests · View Server / harvester

Fabian Schindler-Strauss requested to merge restructure-initialization into main Dec 07, 2021

In this MR I'd like to suggest an alternative to the current harvesting run. The current situation is as such:

It is possible to either:

pass in a full configuration for a harvesting resource. This will be interpreted and run without any additional condition
pass in the name of an already configured harvesting resource. Here, it is not possible to change any of configured parameters (apart from dynamic field values like !now)

I think the real value of a harvester would be in a mixture of the two current approaches: have the basic resource configured, and pass in some values CLI or queue.

The following (pseudoconfig) example declares an OpenSearch service:

  - name: S2L2A_Element84
    endpoint:
      url: https://earth-search.aws.element84.com/v0/
      type: STACAPI

Note the lack of any filters/query.

Now a harvesting request would define those filters/query:

{
  "name": "S2L2A_Element84",
  "values": {
    "time": {
      "begin": "2021-08-01",
      "end": "2021-08-31",
      "property": "datetime"
    },
    "bbox": "14.9,47.7,16.4,48.7"
  }
}

Now the right harvester is identified with the name property and the values are deep-merged with the harvesters configuration.

Another example is harvesting from a STAC Catalog on an object storage. The config would look like this:

  - name: MyS3STACCatalogHarvester
    type: STACCatalog
    source:
      type: S3
      bucket: mybucket
      secret_access_key: xxx
      access_key_id: xxx
      endpoint_url: myendpoint.storage.com
      validate_bucket_name: False
      region_name: RegionA
      public: False

Now each STAC Catalog to be harvested is now passed in as this simple struct:

{
  "name": "MyS3STACCatalogHarvester",
  "values": {
    "root_path": "path/to/catalog.json"
  }
}

Edited Dec 09, 2021 by Fabian Schindler-Strauss

Restructure initialization

Merge request reports