Parameters

module: rspub.core.rs_paras

Parameters for ResourceSync publishing

The class RsParameters validates parameters for ResourceSync publishing that are used throughout the application. RsParameters can be persisted as configuration.

Multiple sets of parameters can be saved and reused as named configurations. This enables configuring rspub-core to publish metadata on different sets of resources. Each configuration can have its own selection mechanism, metadata directory, strategy etc. Each set of resources can than be published in its own capability list.

The class RsParameters in this module and the class rspub.core.config.Configurations are important assets in this endeavour. RsParameters can be associated with a saved rspub.core.selector.Selector.

class rspub.core.rs_paras.RsParameters(config_name=None, resource_dir=None, metadata_dir=None, description_dir=None, url_prefix=None, strategy=None, selector_file=None, simple_select_file=None, select_mode=None, plugin_dir=None, history_dir=None, max_items_in_list=None, zero_fill_filename=None, is_saving_pretty_xml=None, is_saving_sitemaps=None, has_wellknown_at_root=None, exp_scp_server=None, exp_scp_port=None, exp_scp_user=None, exp_scp_document_root=None, zip_filename=None, imp_scp_server=None, imp_scp_port=None, imp_scp_user=None, imp_scp_remote_path=None, imp_scp_local_path=None, **kwargs)[source]

Bases: object

Class capturing the core parameters for ResourceSync publishing

Parameters can be set in the __init__() method of this class and as properties. Each parameter gets a screening on validity and a ValueError will be raised if it is not valid. Parameters can be saved collectively as a configuration. Multiple named configurations can be stored by using the method save_configuration_as(). Named configurations can be restored by giving the config_name at initialisation:

# paras is an instance of RsParameters with configuration adequately set for collection 1
# it is saved as 'collection_1_config':
paras.save_configuration_as("collection_1_config")

# ...
# Later on it is restored...
paras = RsParameters(config_name="collection_1_config")

Note that the class rspub.core.Configurations has a method for listing saved configurations by name.

RsParameters can be cloned:

# paras1 is an instance of RsParameters
paras2 = RsParameters(**paras1.__dict__)
paras1 == paras2    # False
paras1.__dict__ == paras2.__dict__  # True

Besides parameters the RsParameters class also has methods for derived properties.

__init__(config_name=None, resource_dir=None, metadata_dir=None, description_dir=None, url_prefix=None, strategy=None, selector_file=None, simple_select_file=None, select_mode=None, plugin_dir=None, history_dir=None, max_items_in_list=None, zero_fill_filename=None, is_saving_pretty_xml=None, is_saving_sitemaps=None, has_wellknown_at_root=None, exp_scp_server=None, exp_scp_port=None, exp_scp_user=None, exp_scp_document_root=None, zip_filename=None, imp_scp_server=None, imp_scp_port=None, imp_scp_user=None, imp_scp_remote_path=None, imp_scp_local_path=None, **kwargs)[source]

Construct an instance of RsParameters

All parameters will get their value from

  1. the _named argument in **kwargs. (this is for cloning instances of RsParameters). If not available:
  2. the named argument. If not available:
  3. the parameter as saved in the current configuration. If not available:
  4. the default configuration value.
Parameters:
Raises:

ValueError if a parameter is not valid or if the configuration with the given config_name is not found

resource_dir

parameter The local root directory for ResourceSync publishing (str)

The given value should point to an existing directory. A relative path will be made absolute, calculated from the current working directory (os.getcwd()).

The resource_dir acts as the root of the resources to be published. The urls to the resources are calculated relative to the resource_dir. Example:

resourece_dir:  /abs/path/to/resource_dir
resource:       /abs/path/to/resource_dir/sub/path/to/resource
url:                        url_prefix + /sub/path/to/resource

default: user home directory

See also: url_prefix()

metadata_dir

parameter The directory for ResourceSync documents (str)

The metadata_dir is the directory where sitemap documents will be saved. Names and relative path names are allowed. An absolute path will raise a ValueError.

The metadata directory will be calculated relative to the resource_dir().

If the metadata directory does not exist it will be created during execution of a synchronization.

default: ‘metadata’

See also: abs_metadata_dir()

description_dir

parameter Directory where a version of the description document is kept (str)

The description document, also known as .well-known/resourcesync, is keeping links to the capability list(s) at the site. A local copy of the description document (or the real description document if synchronization takes place at the server) will be updated with newly created capability lists. The description_dir should point to a directory where the .well-known/resourcesync document can be found.

If description_dir is None the abs_metadata_dir() will be taken as description_dir.

If the document {description_dir}/.well-known/resourcesync does not exist it will be created.

default: None

See also: abs_description_path()

url_prefix

parameter The URL-prefix for ResourceSync publishing (str)

The url_prefix substitutes resource_dir() when calculating urls to resources. The url_prefix should be the host name of the server or host name + path that points to the root directory of the resources. url_prefix + relative/path/to/resource should yield a valid url.

Example. Paths to resources are relative to the server host:

path to resource:           {resource_dir}/path/to/resource
url_prefix:         http://www.example.com
url to resource:    http://www.example.com/path/to/resource

Example. Paths to resources are relative to some directory on the server:

path to resource:                        {resource_dir}/path/to/resource
url_prefix:         http://www.example.com/my/resources
url to resource:    http://www.example.com/my/resources/path/to/resource

default:http://www.example.com

See also: resource_dir()

strategy

parameter Strategy for ResourceSync publishing (str | int | Strategy)

The strategy determines what will be done by ResourceSync upon execution. At the moment valid values for strategy are:

  • 0 resourcelist - new resourcelist: create new resourcelist(s)
  • 1 new_changelist - new changelist: create a new changelist on every execution
  • 2 inc_changelist - incremental changelist: add changes to an existing changelist

If strategies new resourcelist or incremental changelist are chosen and there is no previous resourcelist found in the metadata directory the strategy resourcelist will be executed.

default: rspub.core.rs_enum.Strategy.resourcelist

selector_file

parameter Location of file to construct a Selector (str)

A rspub.core.selector.Selector can be used as input for the execute methods. The selector_file specifies the location of the selector file.

default: None

simple_select_file
select_mode
history_dir

parameter Directory for storing reports on executed synchronisations (str)

Currently not in use.

plugin_dir

parameter Directory where plugins can be found (str)

The given value should point to an existing directory. A relative path will be made absolute, calculated from the current working directory (os.getcwd()).

At the moment plugins for ResourceGateBuilder can be provided.

default: None

See also: rspub.util.gates

max_items_in_list

parameter The maximum amount of records in a sitemap (int, 1 - 50000)

The ‘community defined’ maximum amount of records in a sitemap document is 50000. If on execution the maximum amount is reached, new sitemaps of the same category will be created with the remaining records.

default: 50000

zero_fill_filename

parameter The amount of digits in a sitemap filename (int, 1 - 10)

Filenames of resourcelist, changelist etc. are numbered and are post-fixed with this number filled with zero’s up to zero_fill_filename. Examples of filenames with zero_fill_filename set at 4:

changelist_0002.xml
changelist_0003.xml

default: 4

is_saving_pretty_xml

parameter Determines appearance of sitemap xml (bool)

If no humans need to read or inspect sitemaps there is no need for linebreaks etc.

default: True, with linebreaks

is_saving_sitemaps

parameter Determines if sitemaps will be written to disk (bool)

An execution can be a dry-run. With this parameter set to False sitemaps will be generated, but not written to disk.

default: True, write sitemaps to disk

has_wellknown_at_root

parameter Where is the description document .well-known/resourcesync on the server (bool)

The description document is the main entry point for third parties trying to discover resources at a source. Capability lists point toward this document in their rel:up attribute. If for some reason the .well-known/resourcesync cannot be at the root of the server the rel:up link in capability lists will be made to be pointing at .well-known/resourcesync relative to abs_metadata_dir().

default: True, the .well-known/resourcesync is at the root of the server

exp_scp_server
exp_scp_port
exp_scp_user
exp_scp_document_root

parameter The directory from which the web server will serve files (str)

Example. Paths to resources are relative to the server host:

url_prefix:         http://www.example.com
url to resource:    http://www.example.com/path/to/resource
scp_document_root:           /var/www/html/
scp_document_path:
path on server:              /var/www/html/path/to/resource

Example. Paths to resources are relative to some directory on the server:

url_prefix:         http://www.example.com/my/resources
url to resource:    http://www.example.com/my/resources/path/to/resource
scp_document_root:           /var/www/html/
scp_document_path:                         my/resources
path on server:              /var/www/html/my/resources/path/to/resource

default: ‘/var/www/html/’

zip_filename
imp_scp_server
imp_scp_port
imp_scp_user
imp_scp_remote_path

parameter The directory at the remote server from which to import files (str)

default: ‘~’

imp_scp_local_path
save_configuration(on_disk=True)[source]

function Save current configuration

Save the current values of parameters to configuration. If on_disk is True (the default) persist the configuration to disk under the current configuration name.

Parameters:on_diskTrue if configuration should be saved to disk, False otherwise

See also: current_configuration_name()

save_configuration_as(name: str)[source]

function Save current configuration under name

Save the current configuration under the given name. If a configuration under the given name already exists it will be overwritten without warning.

Parameters:name (str) – the name under which the configuration will be saved

See also: load_configuration()

reset()[source]
abs_metadata_dir() → str[source]

derived The absolute path to metadata directory

Returns:absolute path to metadata directory
abs_metadata_path(filename)[source]

derived The absolute path to file in the metadata directory

Parameters:filename (str) – the filename to position relative to the abs_metadata_dir()
Returns:absolute path to file in the metadata directory
abs_description_path()[source]

derived The absolute path to (the local copy of) the file .well-known/resourcesync

Returns:absolute path to (the local copy of) the file .well-known/resourcesync
server_root()[source]

derived The server root (of the web server) as derived from url_prefix

Returns:server root
server_path()[source]

derived The server path as derived from url_prefix

Returns:server path
description_url()[source]

derived The current description url

The current description url either points to {server root}/.well-known/resourcesync or to a file in the metadata directory.

Returns:current description url

See also: has_wellknown_at_root()

capabilitylist_url() → str[source]

derived The current capabilitylist url

The current capabilitylist url points to ‘capabilitylist.xml’ in the metadata directory.

Returns:current capabilitylist url
uri_from_path(path)[source]

derived Calculate the url of a path relative to resource_dir

Parameters:path (str) – the path to calculate the url from
Returns:the url of the path relative to resource_dir
abs_history_dir()[source]

derived The absolute path to directory for reports on synchronizations

Currently not in use.

Returns:absolute path to directory for reports
static configuration_name()[source]

function Current configuration name

Returns:current configuration name
example_filename(ordinal)[source]
describe(as_string=False, fill=23)[source]

function List parameters and derived values

List parameters, values and derived values as a list of tuples. Each tuple contains:

n field contents
0 bool True for parameter, False for derived value
1 name The name of the parameter or derived value
2 value The value of the parameter or derived value
3.. ... Anything else
Parameters:
  • as_string – return contents as a printable string
  • fill – if as_string: fill column ‘name’ with fill spaces
Returns:

list[list] or str