Generator approach in harvesting
For larger harvests, it makes sense to push item to the queue iteratively, instead of waiting for the whole result set.
This can be achieved quite easily by transforming the endpoint.harvest
method to yield
items instead of return
them. Also the cql_filter
function must be adapted. The stringify
function could look like this:
def stringify(
result: Generator[dict], mode: str = "item", extract_property: Optional[str] = None
):
if mode == "item":
yield from (json.dumps(item, default=str) for item in result)
elif mode == "property":
yield from (item["properties"][extract_property] for item in result)
The harvester main
looks quite unchanged, only the client.lpush(harvest_config["queue"], *encoded)
must be done in a for-loop.