EOX GitLab Instance

Skip to content

Restructure initialization

Fabian Schindler-Strauss requested to merge restructure-initialization into main

In this MR I'd like to suggest an alternative to the current harvesting run. The current situation is as such:

It is possible to either:

  • pass in a full configuration for a harvesting resource. This will be interpreted and run without any additional condition
  • pass in the name of an already configured harvesting resource. Here, it is not possible to change any of configured parameters (apart from dynamic field values like !now)

I think the real value of a harvester would be in a mixture of the two current approaches: have the basic resource configured, and pass in some values CLI or queue.

The following (pseudoconfig) example declares an OpenSearch service:

  - name: S2L2A_Element84
    endpoint:
      url: https://earth-search.aws.element84.com/v0/
      type: STACAPI

Note the lack of any filters/query.

Now a harvesting request would define those filters/query:

{
  "name": "S2L2A_Element84",
  "values": {
    "time": {
      "begin": "2021-08-01",
      "end": "2021-08-31",
      "property": "datetime"
    },
    "bbox": "14.9,47.7,16.4,48.7"
  }
}

Now the right harvester is identified with the name property and the values are deep-merged with the harvesters configuration.

Another example is harvesting from a STAC Catalog on an object storage. The config would look like this:

  - name: MyS3STACCatalogHarvester
    type: STACCatalog
    source:
      type: S3
      bucket: mybucket
      secret_access_key: xxx
      access_key_id: xxx
      endpoint_url: myendpoint.storage.com
      validate_bucket_name: False
      region_name: RegionA
      public: False

Now each STAC Catalog to be harvested is now passed in as this simple struct:

{
  "name": "MyS3STACCatalogHarvester",
  "values": {
    "root_path": "path/to/catalog.json"
  }
}
Edited by Fabian Schindler-Strauss

Merge request reports

Loading