undr.json_index_tasks#

Implementation of operations based on index files (recursive download, recursive action dispatch…).

Overview#

Classes#

DirectoryScanned

Reports information on a local directory.

Doi

Message dispatched when a DOI is found in the index.

Index

Downloads an index file (-index.json).

IndexLoaded

Message indicating that the given index file has been loaded.

IndexProgress

Represents download or process progress.

InstallFilesRecursive

Downloads (and possibly decompresses) a directories’ files recursively.

ProcessFile

Generic task for file processing.

ProcessFilesRecursive

Spawns a processing task for each file in the given directory.

Selector

Delegate called to pick an action for each file.

UncompressedDecodeProgress

Dummy task that repoorts progress on “decompression” for uncompressed resources.

Attributes#

ProcessFileType

Generic parameter representing the file type.

Module Contents#

class undr.json_index_tasks.DirectoryScanned#

Reports information on a local directory.

download_bytes: IndexProgress#

Total size of the compressed files in this directory, in bytes.

This size does not include -index.json.

final_count: int#

Total number of files in this directory (“files” and “other_files”).

This count does not include -index.json.

index_bytes: IndexProgress#

Size of the index file (-index.json) in bytes.

initial_download_count: int#

Number of files already downloaded when the action started (“files” and “other_files”).

This count does not include -index.json.

initial_process_count: int#

Number of files already processed when the action started (“files” and “other_files”).

This count does not include -index.json.

path_id: pathlib.PurePosixPath#

Path ID of the directory.

process_bytes: IndexProgress#

Total size of the files in this directory, in bytes.

This size does not include -index.json.

class undr.json_index_tasks.Doi#

Message dispatched when a DOI is found in the index.

path_id: pathlib.PurePosixPath#

Path ID of the associated resource.

value: str#

Digital object identifier (DOI) string starting with 10..

class undr.json_index_tasks.Index(path_root: pathlib.Path, path_id: pathlib.PurePosixPath, server: undr.remote.Server, selector: Selector, priority: int, force: bool, directory_doi: bool)#

Bases: undr.remote.DownloadFile

Downloads an index file (-index.json).

Parameters:
  • path_root (pathlib.Path) – The root path used to generate local file paths.

  • path_id (pathlib.PurePosixPath) – The path ID of the directory that will be seached recursively.

  • server (remote.Server) – The remote server to download resources.

  • selector (Selector) – A selector that defines the files to process.

  • priority (int) – Priority of this task (tasks with lower priorities are scheduled first).

  • force (bool) – Download the index file even if it is already present locally.

  • directory_doi (bool) – Whether to dispatch Doi messages while reading the index.

run(session: requests.Session, manager: undr.task.Manager)#
class undr.json_index_tasks.IndexLoaded#

Message indicating that the given index file has been loaded.

children: int#

Number of subdirectories that will subsequently be loaded.

path_id: pathlib.PurePosixPath#

Path ID of the directory whose index has been loaded.

class undr.json_index_tasks.IndexProgress#

Represents download or process progress.

final: int#

Total number of bytes to download or process.

initial: int#

Number of bytes already downloaded or processed when the action started.

class undr.json_index_tasks.InstallFilesRecursive(path_root: pathlib.Path, path_id: pathlib.PurePosixPath, server: undr.remote.Server, selector: Selector, priority: int, force: bool)#

Bases: undr.task.Task

Downloads (and possibly decompresses) a directories’ files recursively.

The actual action is controlled by the selector and may be different for different files. Child directories are installed recursively.

Parameters:
  • path_root (pathlib.Path) – The root path used to generate local file paths.

  • path_id (pathlib.PurePosixPath) – The path ID of the directory that will be seached recursively.

  • server (remote.Server) – The remote server to download resources.

  • selector (Selector) – A selector that defines the files to process.

  • priority (int) – Priority of this task (tasks with lower priorities are scheduled first).

  • force (bool) – Download files even if they already present locally.

run(session: requests.Session, manager: undr.task.Manager)#
class undr.json_index_tasks.ProcessFile(file: undr.path.File)#

Bases: undr.task.Task

Generic task for file processing.

Parameters:

file (path.File) – The file (remote or local) to process.

undr.json_index_tasks.ProcessFileType#

Generic parameter representing the file type.

Used by ProcessFilesRecursive.

class undr.json_index_tasks.ProcessFilesRecursive(path_root: pathlib.Path, path_id: pathlib.PurePosixPath, server: undr.remote.Server, selector: Selector, process_file_class: Type[ProcessFileType], process_file_args: Iterable[Any], process_file_kwargs: Mapping[str, Any], priority: int)#

Bases: undr.task.Task

Spawns a processing task for each file in the given directory.

Subdirectories are recursively searched as well.

Parameters:
  • path_root (pathlib.Path) – The root path used to generate local file paths.

  • path_id (pathlib.PurePosixPath) – The path ID of the directory that will be scanned recursively.

  • server (remote.Server) – The remote server used to download resources.

  • selector (Selector) – A selector that defines the files to process.

  • process_file_class (Type[ProcessFileType]) – The class of the task to run on each selected file. Must be a subclass of ProcessFile.

  • process_file_args (Iterable[Any]) – Positional arguments passed to the constructor of process_file_class.

  • process_file_kwargs (Mapping[str, Any]) – Keyword arguments passed to the constructor of process_file_class. The keyword argument file is automatically added by ProcessFilesRecursive after the positional arguments and before other keyword arguments.

  • priority (int) – Priority of this task and all recursively created tasks (tasks with lower priorities are scheduled first).

run(session: requests.Session, manager: undr.task.Manager) None#
class undr.json_index_tasks.Selector#

Delegate called to pick an action for each file.

Selectors are used during the indexing phase to calculate the number of bytes to download and/or process, and during the processing phase to choose the action to perform.

class Action(*args, **kwds)#

Bases: enum.Enum

Specifies the operation to perform for a given file.

The action also determines whether the file’s bytes should be accounted for during the indexing phase. This is useful to report non-zero progress after resuming a job, but skip the actual processing.

DECOMPRESS = 5#

Downloads, decompresses, and reports.

DOI = 1#

Skips this file, does not report it, but publishes own DOIs.

DOWNLOAD = 4#

Downloads and reports.

DOWNLOAD_SKIP = 3#

Skips operations on this file but reports it as downloaded.

IGNORE = 0#

Skips this file and does not report it.

PROCESS = 6#

Downloads, decompresses, processes, and reports.

SKIP = 2#

Skips this file but reports it as downloaded and processed.

INSTALL_IGNORE_ACTIONS#

The set of actions that ignore the file for reporting purposes.

REPORT_DOWNLOAD_ACTIONS#

The set of actions that (at least) download the file.

REPORT_PROCESS_ACTIONS#

The set of actions that download and process the file.

SKIP_ACTIONS#

The set of actions that skip all operations on the file.

action(file: undr.path.File) Selector#

Returns the action to apply to the given file.

Called by Index, InstallFilesRecursive and ProcessFilesRecursive. The default implementation returns Selector.Action.PROCESS.

scan_filesystem(directory: undr.path_directory.Directory) bool#

Whether to scan the filesystem.

Called by Index to decide whether it needs to scan the file system. This function may return False if action() returns one of the following for every file in the directory:

class undr.json_index_tasks.UncompressedDecodeProgress(path_id: pathlib.PurePosixPath, size: int)#

Bases: undr.task.Task

Dummy task that repoorts progress on “decompression” for uncompressed resources.

Resources that are not compressed are directly downloaded in raw format. The conversion from “local” to “raw” (decompression) requires no further action for such resources. This action dispatches decompression progress as if they were compressed, to simplify the architecture of progress trackers.

run(session: requests.Session, manager: undr.task.Manager)#