Restructure initialization
In this MR I'd like to suggest an alternative to the current harvesting run. The current situation is as such:
It is possible to either:
- pass in a full configuration for a harvesting resource. This will be interpreted and run without any additional condition
- pass in the name of an already configured harvesting resource. Here, it is not possible to change any of configured parameters (apart from dynamic field values like
!now
)
I think the real value of a harvester would be in a mixture of the two current approaches: have the basic resource configured, and pass in some values CLI or queue.
The following (pseudoconfig) example declares an OpenSearch service:
- name: S2L2A_Element84
endpoint:
url: https://earth-search.aws.element84.com/v0/
type: STACAPI
Note the lack of any filters/query.
Now a harvesting request would define those filters/query:
{
"name": "S2L2A_Element84",
"values": {
"time": {
"begin": "2021-08-01",
"end": "2021-08-31",
"property": "datetime"
},
"bbox": "14.9,47.7,16.4,48.7"
}
}
Now the right harvester is identified with the name
property and the values
are deep-merged with the harvesters configuration.
Another example is harvesting from a STAC Catalog on an object storage. The config would look like this:
- name: MyS3STACCatalogHarvester
type: STACCatalog
source:
type: S3
bucket: mybucket
secret_access_key: xxx
access_key_id: xxx
endpoint_url: myendpoint.storage.com
validate_bucket_name: False
region_name: RegionA
public: False
Now each STAC Catalog to be harvested is now passed in as this simple struct:
{
"name": "MyS3STACCatalogHarvester",
"values": {
"root_path": "path/to/catalog.json"
}
}