EOX GitLab Instance
Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Sign in
Toggle navigation
Menu
Open sidebar
View Server 2
harvester
Commits
a9d70139
Commit
a9d70139
authored
Dec 10, 2021
by
Fabian Schindler
Browse files
Moving Postprocessor ABC to abc
Adding docs to ABCs
parent
f77fd158
Pipeline
#19542
passed with stages
in 2 minutes
Changes
2
Pipelines
1
Show whitespace changes
Inline
Side-by-side
harvester/abc.py
View file @
a9d70139
...
...
@@ -12,30 +12,76 @@ class Resource(ABC):
@
abstractmethod
def
harvest
(
self
)
->
Iterator
[
dict
]:
"""
Starts the harvesting of the resource, returning an iterator of the
harvested items.
"""
...
@
dataclass
class
Stat
:
"""
Rudimentary file metadata
"""
mtime
:
datetime
size
:
int
class
Source
(
ABC
):
"""
Remote file-system abstraction to list, open and
get metadata for files.
"""
@
abstractmethod
def
open
(
self
,
path
:
str
)
->
IO
[
AnyStr
]:
"""
Open a file on the source identified by a path.
Returns a Python file-like object to directly read from.
"""
...
@
abstractmethod
def
listdir
(
self
,
path
:
str
)
->
List
[
str
]:
"""
Lists the filenames of the storage for the given base path (or prefix).
The returned paths include the given base path and are not relative to it.
"""
...
@
abstractmethod
def
listdir
(
self
,
path
)
->
List
[
str
]:
def
stat
(
self
,
path
:
str
)
->
Stat
:
"""
Returns the file metadata for the specified file
"""
...
class
Endpoint
(
Resource
):
"""
Endpoints are resources that use a search protocol (or something similar)
to harvest items. Thus, they are always associated with a specific URL.
"""
def
__init__
(
self
,
url
:
str
):
self
.
url
=
url
class
FileScheme
(
Resource
):
"""
FileSchemes are resources that operate on a file basis on a given file source.
"""
def
__init__
(
self
,
source
:
Source
):
self
.
source
=
source
class
Postprocessor
(
ABC
):
"""
Postprocessors can alter all harvested items.
"""
def
__init__
(
self
,
**
kwargs
):
...
@
abstractmethod
def
postprocess
(
self
,
item
:
dict
)
->
dict
:
...
harvester/postprocess.py
View file @
a9d70139
from
abc
import
ABC
,
abstractmethod
from
typing
import
Dict
,
Type
from
.abc
import
Postprocessor
from
.utils
import
import_by_path
class
Postprocessor
(
ABC
):
def
__init__
(
self
,
**
kwargs
):
...
@
abstractmethod
def
postprocess
(
self
,
item
:
dict
)
->
dict
:
pass
POSTPROCESSORS
:
Dict
[
str
,
Type
[
Postprocessor
]]
=
{
}
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment