ResourceSync

module: rspub.core.rs

Publish resources under the ResourceSync Framework

The class ResourceSync is the main entrance to the rspub-core library. It is in essence a one-method class, its main method: execute(). This method takes as argument filenames: an iterable of files and/or directories to process. (List and i.e. Selector are iterables.) Upon execution ResourceSync will call the correct Executor that will walk all the files and directories named in filenames and that takes care of creating the right type of sitemap: resourcelist, changelist etc. and complete the corresponding sitemaps as capabilitylist and description.

Before you call execute() on ResourceSync it may be advisable to set the proper parameters for your synchronization. ResourceSync is a subclass of RsParameters and the description of parameters in that class is a good starting point to learn about the type, meaning and function of these parameters. Here we will highlight some and discuss aspects of these parameters.

Selecting resources

The algorithm for selecting resources can be shaped by you, the user of this library. If the default algorithm suites you - so much for the better - then you don’t have to do anything and you can safely skip this paragraph.

The default algorithm is implemented by the GateBuilder class ResourceGateBuilder. This default class builds a gate() that allows any file that is encountered in the list of files and directories of the filenames argument. It will exclude however any file that is not in resource_dir() or any of its subdirectories, hidden files and files from the directories metadata_dir(), description_dir() and plugin_dir() in case any of these directories are situated on the search-paths described in filenames.

You can implement your own resource gate() by supplying a class named ResourceGateBuilder in a directory you specify under the plugin_dir() parameter. Your ResourceGateBuilder should subclass ResourceGateBuilder or at least implement the methods build_includes() and build_excludes(). A detailed description of how to create your own ResourceGateBuilder can be found in rspub.pluggable.gate.

By shaping your own selection algorithm you could for instance say “include all the files from directory x but exclude the subdirectory y and from directory z choose only those files whose filenames start with ‘abc’ and from directory z/b choose only xml-files where the x-path expression //such/and/so yields ‘foo’ or ‘bar’.” Anything goes, as long as you can express it as a predicate, that is, say ‘yes’ or ‘no’ to a resource, given the filename of the resource.

Strategies and executors

The Strategy tells ResourceSync in what way you want your resources processed. Or better: ResourceSync will choose the Executor that fits your chosen strategy. Do you want new resourcelists every time you call ResourceSync.execute(), do you want new changelists or perhaps an incremental changelist. There are slots for other strategies in rspub-core, such as resourcedump and changedump, but these strategies are not yet implemented.

If new changelist or incremental changelist is your strategy and there is no resourcelist.xml yet in your metadata_dir() then ResourceSync will create a resourcelist.xml the first time you call execute().

The Strategy resourcelist does not require much system resources. Resources will be processed one after the other and sitemap documents are written to disk once they are processed and these sitemaps will at most take 50000 records. The strategies new_changelist and inc_changelist will compare previous and present state of all your selected resources. In order to do so they collect metadata from all the present resources in your selection and compare it to the previous state as recorded in resourcelists and subsequent changelists. This will be perfectly OK in most situations, however if the number of resources is very large this comparison might be undoable. Anyway, large amounts of resources will probably be managed by some kind of repository system that enables to query for the requested data. It is perfectly alright to write your own Executor that handles the synchronisation of resources in your repository system and you are invited to share these executors. A suitable plugin mechanism to accommodate such extraterrestrial executors could be accomplished in a next version of rspub-core.

Multiple collections

ResourceSync is a subclass of RsParameters and so the parameters set on ResourceSync can be saved and reinstituted later on. Configurations has methods for listing and removing previously saved configurations. Multiple collections of resources could be synchronized, each collection with its own configuration. Synchronizing the collection ‘spam’ could go along these lines:

# get a list of previously saved configurations
[print(x) for x in Configurations.list_configurations()]
# rspub_core
# spam_config
# eggs_config

# prepare for synchronization of collection 'all about spam'
resourcesync = ResourceSync(config_name="spam_config")
# spam resources are in two directories
filenames = ["resources/green_spam", "resources/blue_spam"]
# do the synchronization
resourcesync.execute(filenames)

Observe execution

ResourceSync is a subclass of Observable. The executor to which the execution is delegated inherits all observers registered with ResourceSync. ResourceSync it self does not fire events.

class rspub.core.rs.ResourceSync(**kwargs)[source]

Bases: rspub.util.observe.Observable, rspub.core.rs_paras.RsParameters

Main class for ResourceSync publishing

__init__(**kwargs)[source]

Initialization

Parameters:
execute(filenames: <built-in function iter> = None, start_new=False)[source]

Publish ResourceSync documents under conditions of current parameters

Call appropriate executor and publish sitemap documents on the resources found in filenames.

If no file/files ‘resourcelist_*.xml’ are found in metadata directory will always dispatch to strategy (new) resourcelist.

If parameter is_saving_sitemaps() is False will do a dry run: no existing sitemaps will be changed and no new sitemaps will be written to disk.

Parameters:
  • filenames – filenames and/or directories to scan
  • start_new – erase metadata directory and create new resourcelists
class rspub.core.rs.ExecutionHistory(history_dir)[source]

Bases: rspub.util.observe.EventObserver

Execution report creator

Currently not in use.

__init__(history_dir)[source]
pass_inform(*args, **kwargs)[source]
inform_execution_start(*args, **kwargs)[source]