ResourceSync¶
module: rspub.core.rs
Publish resources under the ResourceSync Framework
The class ResourceSync
is the main entrance to the rspub-core library. It is in essence a one-method
class, its main method: execute()
. This method takes as argument filenames
:
an iterable of files and/or directories to process. (List and i.e. Selector
are iterables.)
Upon execution ResourceSync
will call the correct Executor
that
will walk all the files and directories named in filenames
and that takes care of creating the right type
of sitemap: resourcelist, changelist etc. and complete the corresponding
sitemaps as capabilitylist and description.
Before you call execute()
on ResourceSync
it may be advisable to set the proper
parameters for your synchronization. ResourceSync
is a subclass of RsParameters
and the description of parameters
in that class is a good starting point to learn about the type, meaning and
function of these parameters. Here we will highlight some and discuss aspects of these parameters.
Selecting resources¶
The algorithm for selecting resources can be shaped by you, the user of this library. If the default algorithm suites you - so much for the better - then you don’t have to do anything and you can safely skip this paragraph.
The default algorithm is implemented
by the GateBuilder
class ResourceGateBuilder
. This
default class builds a gate()
that allows any file that is encountered in the list
of files and directories of the filenames
argument. It will exclude however any file that
is not in resource_dir()
or any of its subdirectories, hidden files and
files from the directories metadata_dir()
,
description_dir()
and plugin_dir()
in case any of these directories are situated on the search-paths described in filenames
.
You can implement your own resource gate()
by supplying a class named
ResourceGateBuilder in a directory you specify under the
plugin_dir()
parameter
. Your ResourceGateBuilder should subclass
ResourceGateBuilder
or at least implement the methods
build_includes()
and build_excludes()
.
A detailed description of how to create your own ResourceGateBuilder can be found in
rspub.pluggable.gate.
By shaping your own selection algorithm you could for instance say “include all the files from directory x but exclude the subdirectory y and from directory z choose only those files whose filenames start with ‘abc’ and from directory z/b choose only xml-files where the x-path expression //such/and/so yields ‘foo’ or ‘bar’.” Anything goes, as long as you can express it as a predicate, that is, say ‘yes’ or ‘no’ to a resource, given the filename of the resource.
See also
Strategies and executors¶
The Strategy
tells ResourceSync
in what way you want your resources processed.
Or better: ResourceSync
will choose the Executor
that fits your chosen strategy.
Do you want new resourcelists every time you call ResourceSync.execute()
, do you want
new changelists or perhaps an incremental changelist. There are slots for other strategies in rspub-core,
such as resourcedump and changedump, but these strategies are not yet implemented.
If new changelist or incremental changelist is your strategy and there is no resourcelist.xml yet in your
metadata_dir()
then ResourceSync
will create a resourcelist.xml
the first time you call execute()
.
The Strategy
resourcelist
does not require much system resources. Resources will
be processed one after the other and sitemap documents are written to disk once they are processed and
these sitemaps will at most take 50000 records. The strategies new_changelist
and inc_changelist
will
compare previous and present state of all your selected resources. In order to do so they collect metadata from
all the present resources in your selection and compare it to the previous state as recorded in resourcelists
and subsequent changelists.
This will be perfectly OK in most situations, however if the number of resources is very large this
comparison might be undoable. Anyway, large amounts of resources will probably be managed by some kind of
repository system that enables to query for the requested data. It is perfectly alright to write your own
Executor
that handles the synchronisation of resources in your repository system
and you are invited to share these executors. A suitable plugin mechanism to accommodate such extraterrestrial
executors could be accomplished in a next version of rspub-core.
Multiple collections¶
ResourceSync
is a subclass of RsParameters
and so the parameters set on
ResourceSync
can be saved and reinstituted later on. Configurations
has
methods for listing and removing previously saved configurations. Multiple collections of resources
could be synchronized, each collection with its own configuration. Synchronizing the collection ‘spam’ could
go along these lines:
# get a list of previously saved configurations
[print(x) for x in Configurations.list_configurations()]
# rspub_core
# spam_config
# eggs_config
# prepare for synchronization of collection 'all about spam'
resourcesync = ResourceSync(config_name="spam_config")
# spam resources are in two directories
filenames = ["resources/green_spam", "resources/blue_spam"]
# do the synchronization
resourcesync.execute(filenames)
Observe execution¶
ResourceSync
is a subclass of Observable
. The executor to which the execution
is delegated inherits all observers registered with ResourceSync
. ResourceSync
it self does not
fire events.
-
class
rspub.core.rs.
ResourceSync
(**kwargs)[source]¶ Bases:
rspub.util.observe.Observable
,rspub.core.rs_paras.RsParameters
Main class for ResourceSync publishing
-
__init__
(**kwargs)[source]¶ Initialization
Parameters: - config_name (str) – the name of the configuration to read. If given, sets the current configuration.
- kwargs – see
rspub.core.rs_paras.RsParameters.__init__()
See also
-
execute
(filenames: <built-in function iter> = None, start_new=False)[source]¶ Publish ResourceSync documents under conditions of current parameters
Call appropriate executor and publish sitemap documents on the resources found in filenames.
If no file/files ‘resourcelist_*.xml’ are found in metadata directory will always dispatch to strategy (new)
resourcelist
.If
parameter
is_saving_sitemaps()
isFalse
will do a dry run: no existing sitemaps will be changed and no new sitemaps will be written to disk.Parameters: - filenames – filenames and/or directories to scan
- start_new – erase metadata directory and create new resourcelists
-