harvester merge requestshttps://gitlab.eox.at/vs/harvester/-/merge_requests2023-05-30T10:37:17+02:00https://gitlab.eox.at/vs/harvester/-/merge_requests/21Cross dateline footprints2023-05-30T10:37:17+02:00Lubomir DoležalCross dateline footprintshttps://gitlab.eox.at/vs/harvester/-/merge_requests/20updated harvesting to do recursive searches2023-05-30T10:37:17+02:00Nikola Jankovicupdated harvesting to do recursive searchesupdated harvesting to do recursive searches in subdirectories
updated schemaupdated harvesting to do recursive searches in subdirectories
updated schemaNikola JankovicNikola Jankovichttps://gitlab.eox.at/vs/harvester/-/merge_requests/19Filematcher update2023-05-30T10:37:18+02:00Nikola JankovicFilematcher update- removed transient dependencies (handled by `vs-common` not sure if good idea though)
- improved filematcher to better match multiple files by walking the filesystem
- improved tests- removed transient dependencies (handled by `vs-common` not sure if good idea though)
- improved filematcher to better match multiple files by walking the filesystem
- improved testsNikola JankovicNikola Jankovichttps://gitlab.eox.at/vs/harvester/-/merge_requests/18Requests backoff and fix postprocessing2023-05-30T10:37:18+02:00Lubomir DoležalRequests backoff and fix postprocessinghttps://gitlab.eox.at/vs/harvester/-/merge_requests/17move env vars to Dockerfile2023-05-30T10:37:18+02:00Lubomir Doležalmove env vars to Dockerfilehttps://gitlab.eox.at/vs/harvester/-/merge_requests/16Fixed wrong index2023-05-30T10:37:18+02:00Anna RomanovaFixed wrong indexLubomir DoležalLubomir Doležalhttps://gitlab.eox.at/vs/harvester/-/merge_requests/15Adding documentation2023-05-30T10:37:18+02:00Anna RomanovaAdding documentationLubomir DoležalLubomir Doležalhttps://gitlab.eox.at/vs/harvester/-/merge_requests/14refactors and updates2022-12-06T12:59:10+01:00Nikola Jankovicrefactors and updates- replaced `source` package with `fsspec`
- added config model
- updated postprocess to be generic function
- added `vs-common` as dependency
- removed config handling in favor of `vs-common`
- added creation functions to further separat...- replaced `source` package with `fsspec`
- added config model
- updated postprocess to be generic function
- added `vs-common` as dependency
- removed config handling in favor of `vs-common`
- added creation functions to further separate object initialization
- redid model to support dicts rather than lists using `name`
- cli now handles dot notation config substitution `config.value.a1 = 42` as means of config overriding
- added different output handling options
- redid config schema
- refactored stac catalog logic with `pystac`
- added integration test for staccatalog
- updated test folder structure
- refactored oads unit tests
closes #6 #8Nikola JankovicNikola Jankovichttps://gitlab.eox.at/vs/harvester/-/merge_requests/13Adding filter context2023-05-30T10:37:17+02:00Fabian SchindlerAdding filter contextAdding "last_run" variable always updated to the last run of the harvester
Adding "context" function to access context variables
Improved loggingAdding "last_run" variable always updated to the last run of the harvester
Adding "context" function to access context variables
Improved logginghttps://gitlab.eox.at/vs/harvester/-/merge_requests/12Added unitests2023-05-30T10:37:18+02:00Anna RomanovaAdded unitestsNikola JankovicNikola Jankovichttps://gitlab.eox.at/vs/harvester/-/merge_requests/11general fixes2023-05-30T10:37:17+02:00Nikola Jankovicgeneral fixes- updated stacapi params
- moved temporal handling from opensearch to more generic function
- added filter function now for pygeofiltering
- removed one instance of abc for postprocessor
- added back error handling from other branch to m...- updated stacapi params
- moved temporal handling from opensearch to more generic function
- added filter function now for pygeofiltering
- removed one instance of abc for postprocessor
- added back error handling from other branch to match recent updates
closes #2 #3 and !8Nikola JankovicNikola Jankovichttps://gitlab.eox.at/vs/harvester/-/merge_requests/10OADS Harvesting2023-05-30T10:37:17+02:00Fabian SchindlerOADS HarvestingAdding implementation to harvest from OADS systems and generate STAC Items.Adding implementation to harvest from OADS systems and generate STAC Items.https://gitlab.eox.at/vs/harvester/-/merge_requests/9adding custom assets2022-03-29T12:30:23+02:00Nikola Jankovicadding custom assets- adding custom assets
- updated fileharvester with some additional config
- added structured logging
- updated running daemon- adding custom assets
- updated fileharvester with some additional config
- added structured logging
- updated running daemonFabian SchindlerFabian Schindlerhttps://gitlab.eox.at/vs/harvester/-/merge_requests/8Added proper exception handling to daemon and postprocessing2022-04-27T09:10:52+02:00Fabian SchindlerAdded proper exception handling to daemon and postprocessinghttps://gitlab.eox.at/vs/harvester/-/merge_requests/7fixed s3 default endpoint url2022-01-13T14:54:36+01:00Nikola Jankovicfixed s3 default endpoint urlFabian SchindlerFabian Schindlerhttps://gitlab.eox.at/vs/harvester/-/merge_requests/6Postprocessing2021-12-09T12:43:21+01:00Fabian SchindlerPostprocessingAllow to configure a postprocessor for a given harvester.
A `Postprocessor`s `postprocess` method receives a harvested item and can perform any postprocessing action upon it.
This is used e.g: in EOEPCA where the harvested OpenSearch r...Allow to configure a postprocessor for a given harvester.
A `Postprocessor`s `postprocess` method receives a harvested item and can perform any postprocessing action upon it.
This is used e.g: in EOEPCA where the harvested OpenSearch records are looked up and enriched with metadata.
There is currently no default preprocessor, but maybe the CREODIAS one can be put here (or somehow be registered in the registry of preprocessors).https://gitlab.eox.at/vs/harvester/-/merge_requests/5Restructure initialization2021-12-09T16:49:30+01:00Fabian SchindlerRestructure initializationIn this MR I'd like to suggest an alternative to the current harvesting run. The current situation is as such:
It is possible to either:
* pass in a full configuration for a harvesting resource. This will be interpreted and run without ...In this MR I'd like to suggest an alternative to the current harvesting run. The current situation is as such:
It is possible to either:
* pass in a full configuration for a harvesting resource. This will be interpreted and run without any additional condition
* pass in the name of an already configured harvesting resource. Here, it is not possible to change any of configured parameters (apart from dynamic field values like `!now`)
I think the real value of a harvester would be in a mixture of the two current approaches: have the basic resource configured, and pass in some values CLI or queue.
The following (pseudoconfig) example declares an OpenSearch service:
```yaml
- name: S2L2A_Element84
endpoint:
url: https://earth-search.aws.element84.com/v0/
type: STACAPI
```
Note the lack of any filters/query.
Now a harvesting request would define those filters/query:
```json
{
"name": "S2L2A_Element84",
"values": {
"time": {
"begin": "2021-08-01",
"end": "2021-08-31",
"property": "datetime"
},
"bbox": "14.9,47.7,16.4,48.7"
}
}
```
Now the right harvester is identified with the `name` property and the `values` are deep-merged with the harvesters configuration.
Another example is harvesting from a STAC Catalog on an object storage. The config would look like this:
```yaml
- name: MyS3STACCatalogHarvester
type: STACCatalog
source:
type: S3
bucket: mybucket
secret_access_key: xxx
access_key_id: xxx
endpoint_url: myendpoint.storage.com
validate_bucket_name: False
region_name: RegionA
public: False
```
Now each STAC Catalog to be harvested is now passed in as this simple struct:
```json
{
"name": "MyS3STACCatalogHarvester",
"values": {
"root_path": "path/to/catalog.json"
}
}
```https://gitlab.eox.at/vs/harvester/-/merge_requests/4Reorganizing Resources2021-12-10T10:02:18+01:00Fabian SchindlerReorganizing ResourcesWith this MR I'd like to introduce a reorganization of the current resources to be harvested. The proposals are as follows:
1. to remove "Endpoint" as a base class, as it did not really add any benefit.
2. make `Source` a pure file syst...With this MR I'd like to introduce a reorganization of the current resources to be harvested. The proposals are as follows:
1. to remove "Endpoint" as a base class, as it did not really add any benefit.
2. make `Source` a pure file system abstraction, but no harvesting capabilities by themselves
3. add a new class (I called them `FileSchemes` but I'm not too happy about that name) that _are_ resources and use `Sources` to create registerable items. Currently theres the "single file" thingy, which uses the filename to generate an identifier, and the STAC catalog harvester that recursively registers all items from a catalog stored on a `Source`.
This MR shall be used as a discussion basis, and will remain a draft until we have a clear idea to where we want to go.https://gitlab.eox.at/vs/harvester/-/merge_requests/3set optional to alpha2021-12-01T12:18:42+01:00Nikola Jankovicset optional to alphaignore_missing_imports globalignore_missing_imports globalhttps://gitlab.eox.at/vs/harvester/-/merge_requests/2Streaming data2021-11-30T19:23:21+01:00Nikola JankovicStreaming dataupdated handling of lists to generators
closes #1updated handling of lists to generators
closes #1Fabian SchindlerFabian Schindler